Ask Your Question
0

Nova-scheduler overloads compute host when deploying multiple instances

asked 2013-04-16 15:39:40 -0500

ldbragst gravatar image

Hello all,

Running into an issue with nova-scheduler service (Filter Scheduler enabled by default) and the Hyper-V compute driver. I am deploying several instances with one 'nova boot' command to several nova-compute hosts (five in total). The scheduler starts handing out the instances to separate hosts and they start to come up. After a while I notice that some instances don't ever boot on the scheduled host. Those instances have been rescheduled 3 times by the scheduler and I would assume they would report back to the control node in an 'ERROR' state. On the control they hang in a 'Building' state when they are never actually building on the compute nodes.

The part that I am curious about is in this case (we will use compute-1, compute-2, and compute-3 for the various compute node names) compute-1 and compute-2 have enough resources to handle the instances that don't get booted on compute-3, but they never get scheduled there. That or the 3 reschedule attempt always happens on compute-3 where it fails to schedule (since it is out of resources) and fails out there, after the 3rd attempt. I checked the scheduler and compute logs and found out that some of the resources reported back from the compute services are negative (such as a negative disk value). After the scheduler weights the host it always chooses compute-3 (even though it isn't the best option).

Part of the problem looks like the scheduler is somehow getting faulty information about the resources on the Hyper-V compute nodes. I enabled verbose logging on the controller, and the scheduler log shows this during the periodic updates of resource info:

2013-04-15 16:06:05.460 6613 DEBUG nova.openstack.common.rpc.amqp [-] received {u'contextroles': [], u'contextrequestid': u'req-4ace290b-c5a9-4bf8-a706-2af1bbe37b50', u'contextquotaclass': None, u'contextprojectname': None, u'contextservicecatalog': [], u'contextusername': None, u'contextauthtoken': '<sanitized>', u'args': {u'servicename': u'compute', u'host': u'CN10.private.cloud.com', u'capabilities': [{u'hostmemoryfreecomputed': 4668, u'diskavailable': 241, u'supportedinstances': [[u'i686', u'hyperv', u'hvm'], [u'x8664', u'hyperv', u'hvm']], u'hostmemoryoverhead': 191912, u'hostip': u'127.0.0.1', u'hypervisorhostname': u'CN10', u'hostmemoryfree': 4668, u'disktotal': 558, u'hostmemorytotal': 196580, u'diskused': 317}]}, u'contexttenant': None, u'contextinstancelockchecked': False, u'contexttimestamp': u'2013-04-15T21:06:12.356000', u'contextisadmin': True, u'version': u'2.4', u'contextprojectid': None, u'contextuser': None, u'contextreaddeleted': u'no', u'contextuserid': None, u'method': u'updateservicecapabilities', u'contextremoteaddress': None} safelog /usr/lib/python2.6/site-packages/nova/openstack/common/rpc/common.py:272 2013-04-15 16:06:05.461 6613 DEBUG nova.openstack.common.rpc.amqp [-] unpacked context: {'readdeleted': u'no', 'projectname': None, 'userid': None, 'roles': [], 'timestamp ... (more)

edit retag flag offensive close merge delete

3 answers

Sort by ยป oldest newest most voted
0

answered 2013-05-12 15:50:11 -0500

koolhead17 gravatar image

Is all the compute nodes synced with some time server like NTP?

edit flag offensive delete link more
0

answered 2013-05-21 17:08:06 -0500

Why you're instances end up in build state rather than going to Error is fixed by this change (currently under review)

https://review.openstack.org/#/c/29780/

As to why the scheduler picks the same host on each of three runs, it sounds like you don;t have the retry filter configured. This is a filter who's role is to stop retrys going back to the same host.

Hope that helps, Phil

edit flag offensive delete link more
0

answered 2013-05-22 07:46:16 -0500

RomilGupta gravatar image

updated 2013-05-22 07:46:34 -0500

Hi , Its really a good question once I also faced such issue. It happens when you request the multiple instances at the same time. To overcome this you could set some filter for schedulerdefaultfilters= ComputeFilter , AllHostFilter , RetryFilter, ComputeCapabilitiesFilter.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

Stats

Asked: 2013-04-16 15:39:40 -0500

Seen: 670 times

Last updated: May 22 '13