Ask Your Question
0

Problems scheduling across zones

asked 2011-10-28 15:45:27 -0500

ryan-tidwell gravatar image

I have been unable to get the distributed scheduler to successfully activate instances across zones. Looking at the logs for nova-scheduler, I see some errors that look suspicious, but I don't know what to make of them. After running "nova zone-boot --flavor 1 --image 5 zone-kvm1", I see the following in nova-scheduler.log:

2011-10-28 09:31:38,476 DEBUG nova.rpc [-] unpacked context: {'user_id': u'ostack', 'roles': [], 'timestamp': u'2011-10-28T15:31:38.035426', 'auth_t oken': None, 'msg_id': None, 'remote_address': u'127.0.0.1', 'strategy': u'noauth', 'is_admin': True, 'request_id': u'07c6275f-d1d2-42b4-a4f5-063d5b da2d6e', 'project_id': u'mgmt', 'read_deleted': False} from (pid=12030) _unpack_context /usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py:646 2011-10-28 09:31:38,478 WARNING nova.scheduler.manager [-] Driver Method schedule_run_instance missing: 'ZoneScheduler' object has no attribute 'schedule_run_instance'.Reverting to schedule() 2011-10-28 09:31:38,482 ERROR nova.rpc [-] Exception during message handling (nova.rpc): TRACE: Traceback (most recent call last): (nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 620, in _process_data (nova.rpc): TRACE: rval = node_func(context=ctxt, *node_args) (nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/nova/scheduler/manager.py", line 103, in _schedule (nova.rpc): TRACE: host = real_meth(args, **kwargs) (nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/nova/scheduler/zone.py", line 55, in schedule (nova.rpc): TRACE: raise driver.NoValidHost(_("Scheduler was unable to locate a host" (nova.rpc): TRACE: NoValidHost: Scheduler was unable to locate a host for this request. Is the appropriate service running? (nova.rpc): TRACE:

The child zone has been discovered successfully:

root@vela:/var/log/nova# nova zone-list +----+-------+-----------+---------------------------------+---------------+--------------+ | ID | Name | Is Active | API URL | Weight Offset | Weight Scale | +----+-------+-----------+---------------------------------+---------------+--------------+ | 1 | zone1 | True | http://192.168.1.20:8774/v1.1/ | | | +----+-------+-----------+---------------------------------+---------------+--------------+

Version information and config file:

root@vela:/var/log/nova# nova-manage version list 2011.3 (2011.3-nova-milestone-tarball:tarmac-20110922115702-k9nkvxqzhj130av2)

root@vela:/var/log/nova# more /etc/nova/nova.conf --dhcpbridge_flagfile=/etc/nova/nova.conf --dhcpbridge=/usr/bin/nova-dhcpbridge --flat_network_dhcp_start=10.1.2.1 --network_host=10.1.0.1 --flat_network_bridge=br100 --flat_injected=False --public_interface=eth1 --logdir=/var/log/nova --state_path=/var/lib/nova --lock_path=/var/lock/nova --verbose --sql_connection=mysql://nova@localhost/nova --ec2_api=192.168.1.10 --ec2_url=http://192.168.1.10:8773/services/Cloud --network_manager=nova.network.manager.FlatManager --rabbit_host=192.168.1.10 --glance_api_servers=192.168.1.10:9292 --image_service=nova.image.glance.GlanceImageService --zone_name=master --allow_admin_api=true --scheduler_driver=nova.scheduler.zone.ZoneScheduler #--scheduler_driver=nova.scheduler.base_scheduler.BaseScheduler --enable_zone_routing=true #--zone_capabilties

Has anyone encountered this before? I've seen this same behavior in the Diablo packages shipped with Ubuntu 11.10.

edit retag flag offensive close merge delete

6 answers

Sort by ยป oldest newest most voted
0

answered 2011-10-28 17:19:21 -0500

You have to wait about 30s for the compute nodes to send their first updates to the schedulers. Otherwise they won't know they exist.

That's resolved in a pending branch. Some big scheduler changes.

https://review.openstack.org/#change,1192 (https://review.openstack.org/#change,...)

edit flag offensive delete link more
0

answered 2011-10-28 17:33:35 -0500

ryan-tidwell gravatar image

Thanks for the quick response. Could you elaborate a little further? My setup has been online for several days. I'm trying to force the parent zone (master in this case) to activate an instance in my child zone (zone1). For the moment, I have disabled all nova-compute hosts in the master zone so that the scheduler will delegate to zone1. The error message I attached shows up only after attempting a zone-boot operation. Maybe I'm not understanding how the distributed scheduler works, but I'm assuming that it will see no available resources and delegate through a Nova API call to zone1. Not only do I immediately see this error appear in the nova-scheduler log, I see no trace of any attempt being made to activate an instance on zone1. From the description of the attached bug, I'm not seeing how it addresses my issue and I'm still not seeing what I'm doing wrong.

edit flag offensive delete link more
0

answered 2011-10-28 21:05:13 -0500

if you do a 'nova zone-list' in the parent zone is it showing the child as being active?

Could be that it thinks it's offline (usually due to a novaclient versioning problem)

edit flag offensive delete link more
0

answered 2011-10-28 21:23:12 -0500

ryan-tidwell gravatar image

Yes, the parent zone sees the child as active:

root@vela:/var/log/nova# nova zone-list +----+-------+-----------+---------------------------------+---------------+--------------+ | ID | Name | Is Active | API URL | Weight Offset | Weight Scale | +----+-------+-----------+---------------------------------+---------------+--------------+ | 1 | zone1 | True | http://192.168.1.20:8774/v1.1/ | | | +----+-------+-----------+---------------------------------+---------------+--------------+

Am I using the correct scheduler driver? I came across some documentation that leads me to believe that the ZoneScheduler is used for availability zones, which are different than just "zones". Should I be using BaseScheduler instead?

As a side note, things initially seem better when using the BaseScheduler, but still no invocation of the scheduler in the child zone, and interestingly after about ~20-30 minutes after restarting services using the BaseScheduler, invocations of "nova zone-list" begin to hang indefinitely.

edit flag offensive delete link more
0

answered 2011-10-28 23:40:40 -0500

Ok, that's good, so they're talking.

No, ZoneScheduler is a different thing altogether (sadly, bad choice of names) ... it has to do with EC2 zones. Try the abstract scheduler. You should see /zones/select or /zones/info calls coming into the child zone API logs.

/zones/info is from the parent polling the children /zones/select is done before the parent decides where to provision (if chosen you'll see it followed by POST /server/)

The scheduler logs should show you the decision making process.

There are some (somewhat older) docs here on how the dist scheduler works: http://nova.openstack.org/devref/index.html (http://nova.openstack.org/devref/inde...)

edit flag offensive delete link more
0

answered 2011-11-01 15:39:19 -0500

ryan-tidwell gravatar image

Thanks for the information. I'm seeing a /zones/info call on my child zone, but no call to /zones/select or POST /server/. Still not sure why this isn't working. You referenced a bug ( https://review.openstack.org/#change,1192 (https://review.openstack.org/#change,...) ) in an earlier reply, what exactly is it about that bug that would cause provisioning across zones to fail?

edit flag offensive delete link more

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2011-10-28 15:45:27 -0500

Seen: 44 times

Last updated: Nov 01 '11