How to debug volume stuck in deleting state issue?
Today I ran into a scenario where my IceHouse openstack installation had existing cinder volumes that became stuck in the deleting state when attempts were made to delete them.
After several hours of probing the system, I can to identify the root cause. It seems that when the cinder volumes were created, the cinder volume service was associated with one hostname, but somewhere along the line, the hostname of processor hosting the cinder service changed (e.g. when from "cinderHost" to "cinderHost.local") I don't know when this happened, but when the cinder service was restarted at that point, the cinder service name was changed to reflect the new host name.
By itself, this is not horrible, new volumes could be created/deleted with the new cinder service. The problem was that the existing volumes have data stored in the cinder database that reflects the cinder service name used when creating the volume. Since that old cinder volume service name was not in service when the volume delete was attempted, the volume became stuck in the deleting state.
My question is, how would you debug this efficiently if it were to happen again. I was surprised to not find any ERROR messages in any of the cinder logs. Enabling debug cinder logging didn't seem to really help since it didn't print anything that looked alarming. nothing about unable to reach the desired cinder volume service.
Is the design of cinder such that it just queues requests targeted for a disabled volume service and it does not view the unavailability of the service as an error?