Just to share my implementation, but it's a sort of work-around. My instances are all boot from volume in Ceph (which is why it is easier for me to do this). I have a monitoring tool like Nagios that keeps track which compute nodes goes down. In then event that a compute node is detected as down, it will trigger a script to update the host field of the instance in mysql and then do a hard-reboot to recreate the xml file. Might I add also that I have 8 compute node. Only 7 are active and the 8th always act as the passive, taking over the responsibility of running the instances from the dead node.

Hope that helps.