Ask Your Question
0

instance cannot be deleted from the hypervisor

asked 2016-09-11 23:41:45 -0600

roymaztang gravatar image

updated 2016-09-12 08:36:49 -0600

I installed OpenStack Kilo on Ubuntu 14.04. However the compute nodes are in down state and restarting nova-compute does not help.

I then looked into nova compute log, and found the following messages appearing every time before the compute node went down:

2016-09-11 20:04:45.626 6275 INFO nova.compute.manager [req-BLABLA] [instance: INSTANCE_ID] Terminating instance

2016-09-11 20:04:45.630 6275 INFO nova.virt.libvirt.driver [-] [instance: INSTANCE_ID] Instance destroyed successfully.

The same message repeated for the SAME INSTANCE_ID every time when I restarted nova-compute (just 30 minutes later after restarting nova-compute, the node went down and there was no other message showing up), which means that instance didn't get destroyed as reported. However virsh list returns nothing. And there is no instances in /var/lib/nova/instances/. It seems that the instances have been deleted from both nova database and the hypervisor, but why does nova keeps terminating an instance that doesn't exist? Where does nova get the instance id?

BTW, I already set running_deleted_instance_action=reap in nova.conf, but it seems not work.

This problem seems to be related to an unresolved bug report in https://bugs.launchpad.net/nova/+bug/1520396

edit retag flag offensive close merge delete

3 answers

Sort by ยป oldest newest most voted
0

answered 2016-09-16 09:22:26 -0600

roymaztang gravatar image

I figured out the solution by myself. The problem is caused by the unhealthy status of Ceph, so nova compute service cannot access the instance data, marking deleted instances as still existing.

Some Ceph OSD's were down because of disk failure and network issue, causing several PG's (placement groups) unclean and inactive. So I manually deleted these failed OSD's and marked them as lost, restarted online OSD's, abandoning lost data, and fixed all the failed PG's. After doing all of these, the compute nodes are brought back to 'up' state, and instances can be deleted properly.

edit flag offensive delete link more
0

answered 2016-09-13 12:15:19 -0600

fifi gravatar image

updated 2016-09-13 12:16:38 -0600

Restart your libvirt on the compute node: service libvirt-bin restart

Then restart your nova-compute on the compute : service nova-compute restart

edit flag offensive delete link more

Comments

I restarted libvirt and nova-compute, but nothing changes...

roymaztang gravatar imageroymaztang ( 2016-09-13 22:23:15 -0600 )edit

Reinstall glance and python-glance client in your controller. Then at /etc/glance/glance-api.cfg add this line enable_v2_api = True. Reinstall nova-api, nova-conductor and python-novaclient on controller.

fifi gravatar imagefifi ( 2016-09-14 15:02:38 -0600 )edit

On your compute node, uninstall nova-compute and then reinstall it as follow:

apt-get update -y && apt-get  dist-upgrade -y

apt-get install -y nova-compute sysfsutils
fifi gravatar imagefifi ( 2016-09-14 15:04:35 -0600 )edit

finally restart all above mentioned services and by running nova service-list, make sure all of them are up and running. Then you can lunch your vm and delete them without any problem.

fifi gravatar imagefifi ( 2016-09-14 15:06:40 -0600 )edit

some times, reinstalling some services may affect other services. So do not get panic if you get some weird error messages. reinstalling affected services solve your problem.

fifi gravatar imagefifi ( 2016-09-14 15:07:56 -0600 )edit
0

answered 2016-09-12 00:51:38 -0600

tahder gravatar image

try to restart your nova and neutron

service neutron-openvswitch-agent restart service openstack-nova-compute restart

check also the nova/neutron versions between the compute and the controller.

edit flag offensive delete link more

Comments

Restarting doesn't work (they are named openvswitch-switch and nova-compute on my machine). The nova/neutron versions match between the controller and the compute (2.23.0/2.4.0)

roymaztang gravatar imageroymaztang ( 2016-09-12 07:09:27 -0600 )edit

try to restart all your nova services if this will solve or try to reboot the controller and the compute.

    for serverror in api conductor scheduler cert consoleauth compute novncproxy network; do
systemctl restart nova-$serverror
done

or check the /var/lib/nova/instances of compute

tahder gravatar imagetahder ( 2016-09-12 22:50:42 -0600 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2016-09-11 23:41:45 -0600

Seen: 368 times

Last updated: Sep 16 '16