# Insufficient compute resources, free memory less than expected

we are seeing an issue with our Newton environment that I hope someone could shed some light on, please.

Often, not always, when creating a new instance we get an error saying memory of the selected compute node is less than the flavor requested. The big question is why then was this compute node selected? The scheduler filter should have eliminated this compute node as choice.

We have over commit for ram left at the default of 1.5 physical ram. And the nodes selected to build the instance on are close to fully over committed, no where near the ram needed to support the instance, even with over commit factored in.

Often if you retry the instance creation 2 or 3 more times it will work (In those working cases I assume the scheduler picks a different compute node that actually has enough resources to cover the flavor).

At times it also could be that there are 0 compute nodes that can cover the new instance's flavor, but in that case I would expect the error to be something like "No valid host was found. There are not enough hosts available".

I tried to add the AggregateRamFilter and it did not seem to help. Why are instances given to build on compute nodes that don't have the available ram, including the over commit allowance?

Logs, below, may shed a little light on what happens, I hope.

Example: Horizon shows this error:

Message Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instance 8e745b29-be1f-42b5-8990-bcc07cb00337. Last exception: Insufficient compute resources: Free memory   3896.50 MB < requested 16384 MB.
Code    500
Details File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 492, in build_instances filter_properties, instances[0].uuid) File "/usr/lib/python2.7/dist-    packages/nova/scheduler/utils.py", line 184, in populate_retry raise exception.MaxRetriesExceeded(reason=msg)
Created    June 13, 2019, 4:19 p.m.


The controllers conductor.log shows this same error. first for node-170 and then for node-169.

2019-06-13 16:19:57.676 7769 ERROR nova.scheduler.utils [req-b0e5d9f4-0429-41e3-a1c7-f71f818f0ba8 b531c96d3755453f9fb02480aeca9554 6ac3d8f7357e4cc7b5dc5ddbea6a48f4 - - -] [instance: 8e745b29-be1f-42b5-8990-bcc07cb00337] Error from last host: node-170.mysite.com (node node-170.mysite.com): [u'Traceback (most recent call last):\n', u'  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1783, in _do_build_and_run_instance\n    filter_properties)\n', u'  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1938, in _build_and_run_instance\n    instance_uuid=instance.uuid, reason=e.format_message())\n', u'RescheduledException: Build of instance 8e745b29-be1f-42b5-8990-bcc07cb00337 was re-scheduled: Insufficient compute resources: Free memory 1848.50 MB < requested 16384 MB.\n']
2019-06-13 16:19:59.323 7760 ERROR nova.scheduler.utils [req-b0e5d9f4-0429-41e3-a1c7-f71f818f0ba8 b531c96d3755453f9fb02480aeca9554 6ac3d8f7357e4cc7b5dc5ddbea6a48f4 - - -] [instance: 8e745b29-be1f-42b5-8990-bcc07cb00337] Error from last host: node-169.mysite.com (node node-169.mysite.com): [u'Traceback (most recent call last):\n', u'  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1783, in _do_build_and_run_instance\n    filter_properties)\n', u'  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1938, in _build_and_run_instance\n    instance_uuid=instance.uuid, reason ...
edit retag close merge delete

is ram_filter supposed to check only total RAM a compute node has? or is it supposed to check the total free RAM a compute node has?

( 2019-06-17 12:55:20 -0500 )edit

is this related to this bug? https://bugs.launchpad.net/mos/+bug/1...

( 2019-06-17 13:42:44 -0500 )edit

Sort by » oldest newest most voted

i believe i solved this issue with this bug and patch https://review.opendev.org/#/c/475057/1

as the patch indicates in that bug I commented out these lines on each of our Newton compute nodes:

/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py

#self.compute_node.free_ram_mb = max(0, self.compute_node.free_ram_mb)
#self.compute_node.free_disk_gb = max(0, self.compute_node.free_disk_gb)


basically nova was reporting mem and hd as 0 instead of the real negative (overcommit) value. this caused the schedulers to think there was over commit room on the compute node. while in reality the compute node was full and almost totally over committed, too over committed for the flavor being requested.

more

# Get to know Ask OpenStack

Resources for moderators