No valid host was found during instance migrate

asked 2018-07-18 01:32:52 -0500

tonyp12 gravatar image

I am not sure what I am missing here. I am trying to migrate instances from one host to another but now stuck at "No valid host was found" after migrating some instances successfully. It seems though nova thinks there are no resources but there are no logs at all pertaining to the destination host.

I have a 3-node set up: 1 x controller and 2 x compute nodes. I have call the compute hosts compute-0 and compute-1 hosts. The compute nodes have shared iscsi storage from cinder. Before live migrating actions, we have instances running on both hosts without issue.

Version is Pike.

I was initially trying to migrate instances using the "openstack server migrate" command, but I was getting errors about there not being enough free disk on the destination. This is due to a bug with that specific command and using "nova live-migration" command instead was able to immediately live migrate. But now I have run into another issue and cannot work through it.

I am checking logs in: nova-scheduler.log nova-api.log nova-conductor.log nova-placement-api.log nova-manage.log nova-rowsflush.log

I am trying to migrate an instance from compute-0 to compute-1. I ran a tail on all of those logs and receive this output below. I cannot see any detail pertaining to compute-1 but the logs state no available hosts. Compute-0 host has the compute service disabled to prevent new VMs from being scheduled there.

I am unable to spawn new instances also.

I am trying to do this with the command: nova live-migration 8b51e704-b2fb-4674-8d82-debe0c9ea9d2 --block-migrate

Log output:

2018-07-18 06:06:37.289 129727 DEBUG nova.api.openstack.wsgi [req-1cc7ff66-7f96-41e1-ad50-ee1b2040e0e2 8581214b931a55344f5d9b39916ac246b0ef3bc441914e5711f846c34d50c731 a1c09ae7084b4fbe9de5d7a17112b4c0 - c1fbcd738b6b4b40a82d82e5e010aa4d c1fbcd738b6b4b40a82d82e5e010aa4d] Calling method '<bound method Versions.index of <nova.api.openstack.compute.versions.Versions object at 0x7f74aaf45e10>>' _process_stack /usr/lib/python2.7/site-packages/nova/api/openstack/wsgi.py:612
2018-07-18 06:06:37.291 129727 INFO nova.api.openstack.requestlog [req-1cc7ff66-7f96-41e1-ad50-ee1b2040e0e2 8581214b931a55344f5d9b39916ac246b0ef3bc441914e5711f846c34d50c731 a1c09ae7084b4fbe9de5d7a17112b4c0 - c1fbcd738b6b4b40a82d82e5e010aa4d c1fbcd738b6b4b40a82d82e5e010aa4d] 192.168.18.102 "OPTIONS /" status: 200 len: 439 microversion: - time: 0.002323

==> nova-placement-api.log <==
2018-07-18 06:06:37.481 129735 DEBUG nova.api.openstack.placement.requestlog [req-a08d6981-96d0-4f79-aaaf-630e03995fcb 5418126b8be04f8cb00ead0e9714df3b b7fd1b7ff742439f96c8b46d13f3f963 - default default] Starting request: 192.168.18.102 "GET /placement/resource_providers/0d2f7450-c02c-4a12-a29a-54cb34ed07d0/aggregates" __call__ /usr/lib/python2.7/site-packages/nova/api/openstack/placement/requestlog.py:38
2018-07-18 06:06:37.503 129735 INFO nova.api.openstack.placement.requestlog [req-a08d6981-96d0-4f79-aaaf-630e03995fcb 5418126b8be04f8cb00ead0e9714df3b b7fd1b7ff742439f96c8b46d13f3f963 - default default] 192.168.18.102 "GET /placement/resource_providers/0d2f7450-c02c-4a12-a29a-54cb34ed07d0/aggregates" status: 200 len: 18 microversion: 1.1
2018-07-18 06:06:37.515 129735 DEBUG nova.api.openstack.placement.requestlog [req-eeaa41ec-44d2-4cdd-b25b-d2af28279861 5418126b8be04f8cb00ead0e9714df3b b7fd1b7ff742439f96c8b46d13f3f963 - default default] Starting request: 192.168.18.102 "GET /placement/resource_providers/0d2f7450-c02c-4a12-a29a-54cb34ed07d0/inventories" __call__ /usr/lib/python2.7/site-packages/nova/api/openstack/placement/requestlog.py:38
2018-07-18 06:06:37.538 129735 INFO nova.api.openstack.placement.requestlog [req-eeaa41ec-44d2-4cdd-b25b-d2af28279861 5418126b8be04f8cb00ead0e9714df3b b7fd1b7ff742439f96c8b46d13f3f963 - default default] 192.168.18.102 "GET /placement/resource_providers/0d2f7450-c02c-4a12-a29a-54cb34ed07d0/inventories" status: 200 len: 406 microversion: 1.0

==> nova-api.log <==
2018-07-18 06:06:37.951 4617 INFO nova.metadata.wsgi.server [-] 192.168 ...
(more)
edit retag flag offensive close merge delete

Comments

i think this issue might have something to do with the fact that the instance flavours have defined swap values which result in consuming disk space on the host for swap only.

  1. I havent confirmed this
  2. if this is true, there's plenty of disk space free on the hosts, how can I work around?
tonyp12 gravatar imagetonyp12 ( 2018-07-18 01:37:12 -0500 )edit

The scheduler log doesn't mention compute-1 at all. It filters out compute-0, then says that there are no more hosts. Does Nova know about compute-1? Run a command like openstack compute service list or openstack hypervisor list to check.

Bernd Bausch gravatar imageBernd Bausch ( 2018-07-18 05:04:49 -0500 )edit

Sorry, I forgot to add that yes it does.

I have since managed a workaround. (ps this website stops working sometimes, seems like db connection issue). The workaround was to set disk overcommit higher than 1.0. My theory about disk seems correct. BUT the value I changed has a warning that it...

tonyp12 gravatar imagetonyp12 ( 2018-07-18 06:26:44 -0500 )edit

...is deprecated in Pike, but it works. Another thing which helped me was to launch a VM with a flavour that has all storage values as '0' (swap and disk etc). This successfully launched.

After increasing nova.conf host disk overcommit, live migration works.

tonyp12 gravatar imagetonyp12 ( 2018-07-18 06:28:21 -0500 )edit

I spent 6 hours looking for ghost logs today lol.

openstack compute service list 8 | nova-compute | overcloud-novacompute-1.company.com | nova | enabled | up

It's back and running again.

tonyp12 gravatar imagetonyp12 ( 2018-07-18 06:30:36 -0500 )edit