nova rebuild fails when the instance has ceph snapshot

asked 2019-11-19 04:58:18 -0600

Prasanna Ram gravatar image

Hi Team,

We have openstack with ceph storage backend. We use ceph snapshot as our backup strategy for backing up root and additional drives which are attached. But what we have observed is whenever we do nova rebuild on a server which is having a snapshot created in ceph, the rebuld completes without error but actual rebuild doesn't happen at all. Post rebuild we still have the same corrupted VM or at times we rebuild the vm to a fresh new OS like from Ubuntu to centos, but after rebuild we still would have same Ubuntu. In all the cases what we have in common is ceph is having a snapshot for the instance root drives. For instances which are not having ceph snap, the rebuild just works like a charm.

Has anyone faced this kind of issue. Please guide how to resolve this.

Regards, Ram.

edit retag flag offensive close merge delete


I haven't done much rebuilding, so my experience is limited here. But have you turned on debug logs for nova? I'd expect to see the commands it's trying to execute, maybe there's a hint what could have gone wrong. If I have the time I'll try to reproduce that.

eblock gravatar imageeblock ( 2019-11-22 02:07:22 -0600 )edit

Your description is accurate, I was able to reproduce this. I'll try to find out more.

eblock gravatar imageeblock ( 2019-11-22 02:32:51 -0600 )edit

2 answers

Sort by ยป oldest newest most voted

answered 2019-11-22 02:48:15 -0600

eblock gravatar image

Alright, I got it. So if you rebuild an instance the underlying rbd image has to be deleted to be able to reuse the same ID. You can see this for a really short time if you have running watch -n 0.2 rbd info pool/image_disk while the instance is rebuilding (an instance without a snapshot). But since it's not possible to delete an instance that has rbd snapshots (via horizon or CLI) the rebuild fails and the instance is reverted to a working state.

edit flag offensive delete link more


Many thanks for your comments. I wrote a job for instances which has ceph snaps which will first shutoff the vm and deletes snaps at ceph backend and then trigger a rebuild from openstack and it works.

Is there a simple way to find total space consumed by snapshots alone in ceph?

Prasanna Ram gravatar imagePrasanna Ram ( 2020-01-10 23:15:02 -0600 )edit

answered 2019-12-04 23:18:13 -0600

Prasanna Ram gravatar image

Thanks for the explanation, this seems to be existing in openstack Queens with ceph Luminous. Not sure if this will be fixed in further releases.

edit flag offensive delete link more


Although we're already running Nautilus (and this also happens there) our cloud still runs Ocata, so it's hard to say if this still applies. We plan to upgrade OpenStack to a current release, I'm curious if this has been fixed since Ceph has become one of the most used storage backends for openstack

eblock gravatar imageeblock ( 2019-12-05 09:08:43 -0600 )edit

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower


Asked: 2019-11-19 04:58:18 -0600

Seen: 241 times

Last updated: Dec 04 '19