After host crash ceph vm volumes locked

asked 2019-08-13 04:16:19 -0500

ecki gravatar image

We had a crash of one of our OpenStack hosts. After the reboot the VMs on that machine have not been able to start because of filesystem errors. Some investigations later we noticed that the VM ephemeral volumes (which we store in Ceph RBD) had still a write lock.

Manually remove the lock allowed us to start the VMs again, but I wonder if there should be an automated process for this. Should the ceph client normally detect that it is the same host retrying the lock? (We do use some docker containers for the OpenStack and Ceph, so it might be a problem with new „host“ names)

edit retag flag offensive close merge delete


I recently got bit by this. I'm thinking of a boot-time/one-time script which walks through the instance confs on the hypervisor but this would require the hypervisors to have the cephx creds on the host. Maybe this isn't a heavy lift of security risk.

Had you come up with a solution for this?

peter eisch gravatar imagepeter eisch ( 2020-03-31 09:33:11 -0500 )edit