Ask Your Question

Nova-scheduler overloads compute host when deploying multiple instances

asked 2013-04-16 15:39:40 -0500

ldbragst gravatar image

Hello all,

Running into an issue with nova-scheduler service (Filter Scheduler enabled by default) and the Hyper-V compute driver. I am deploying several instances with one 'nova boot' command to several nova-compute hosts (five in total). The scheduler starts handing out the instances to separate hosts and they start to come up. After a while I notice that some instances don't ever boot on the scheduled host. Those instances have been rescheduled 3 times by the scheduler and I would assume they would report back to the control node in an 'ERROR' state. On the control they hang in a 'Building' state when they are never actually building on the compute nodes.

The part that I am curious about is in this case (we will use compute-1, compute-2, and compute-3 for the various compute node names) compute-1 and compute-2 have enough resources to handle the instances that don't get booted on compute-3, but they never get scheduled there. That or the 3 reschedule attempt always happens on compute-3 where it fails to schedule (since it is out of resources) and fails out there, after the 3rd attempt. I checked the scheduler and compute logs and found out that some of the resources reported back from the compute services are negative (such as a negative disk value). After the scheduler weights the host it always chooses compute-3 (even though it isn't the best option).

Part of the problem looks like the scheduler is somehow getting faulty information about the resources on the Hyper-V compute nodes. I enabled verbose logging on the controller, and the scheduler log shows this during the periodic updates of resource info:

2013-04-15 16:06:05.460 6613 DEBUG nova.openstack.common.rpc.amqp [-] received {u'contextroles': [], u'contextrequestid': u'req-4ace290b-c5a9-4bf8-a706-2af1bbe37b50', u'contextquotaclass': None, u'contextprojectname': None, u'contextservicecatalog': [], u'contextusername': None, u'contextauthtoken': '<sanitized>', u'args': {u'servicename': u'compute', u'host': u'', u'capabilities': [{u'hostmemoryfreecomputed': 4668, u'diskavailable': 241, u'supportedinstances': [[u'i686', u'hyperv', u'hvm'], [u'x8664', u'hyperv', u'hvm']], u'hostmemoryoverhead': 191912, u'hostip': u'', u'hypervisorhostname': u'CN10', u'hostmemoryfree': 4668, u'disktotal': 558, u'hostmemorytotal': 196580, u'diskused': 317}]}, u'contexttenant': None, u'contextinstancelockchecked': False, u'contexttimestamp': u'2013-04-15T21:06:12.356000', u'contextisadmin': True, u'version': u'2.4', u'contextprojectid': None, u'contextuser': None, u'contextreaddeleted': u'no', u'contextuserid': None, u'method': u'updateservicecapabilities', u'contextremoteaddress': None} safelog /usr/lib/python2.6/site-packages/nova/openstack/common/rpc/ 2013-04-15 16:06:05.461 6613 DEBUG nova.openstack.common.rpc.amqp [-] unpacked context: {'readdeleted': u'no', 'projectname': None, 'userid': None, 'roles': [], 'timestamp ... (more)

edit retag flag offensive close merge delete

3 answers

Sort by ยป oldest newest most voted

answered 2013-05-12 15:50:11 -0500

koolhead17 gravatar image

Is all the compute nodes synced with some time server like NTP?

edit flag offensive delete link more

answered 2013-05-21 17:08:06 -0500

Why you're instances end up in build state rather than going to Error is fixed by this change (currently under review)

As to why the scheduler picks the same host on each of three runs, it sounds like you don;t have the retry filter configured. This is a filter who's role is to stop retrys going back to the same host.

Hope that helps, Phil

edit flag offensive delete link more

answered 2013-05-22 07:46:16 -0500

RomilGupta gravatar image

updated 2013-05-22 07:46:34 -0500

Hi , Its really a good question once I also faced such issue. It happens when you request the multiple instances at the same time. To overcome this you could set some filter for schedulerdefaultfilters= ComputeFilter , AllHostFilter , RetryFilter, ComputeCapabilitiesFilter.

edit flag offensive delete link more

Get to know Ask OpenStack

Resources for moderators

Question Tools


Asked: 2013-04-16 15:39:40 -0500

Seen: 789 times

Last updated: May 22 '13