Openstack Instance High Availability(How to make pets VM highly available?)

asked 2015-02-01 14:04:44 -0600

Moh gravatar image

updated 2015-02-04 09:45:50 -0600

Hi openstack experts,

As my question title implies, I am seeking for a stable way of high availability in my non-cloud-ready (pets) instances when non-commercial hypervisors(e.g. ESXi) is used(for example in the case of KVM). I am looking to answer the following primary question:

How to bring instances(Pets) online in the case of host failure in openstack?

I search alot on the web, and I found the following available methods for (pets) instance HA in openstack:

  • Use nova evacuate command: This is admin task and is not automatic way of instance fail-over
  • Use heat service/application/instance level high availability: No applicable document is found for this method
  • Neutron + Allowed address pair: Only applied to stateless applications(e.g. webserver) and is not applicable to for example wordpress instance with build-in mysql database
  • vm-ensembles: Is a blueprint and not implemented

All of these methods have considerable disadvantages metioned above.

So, What is the best solution for VM-High availability in openstack?

Is there any other better methods?

Thanks in advance.

(By the use of pets, I mean: Stateful Legacy Applications that are not cloud ready)

edit retag flag offensive close merge delete



Have you looked at oVirt? It uses KVM as the hypervisor and can be used for "pets" and "goldfish".


Andris Lismanis gravatar imageAndris Lismanis ( 2015-07-14 11:42:29 -0600 )edit

If you are still looking for OpenStack Instance HA please visit and send us an email. We have developed a HA tool that is light-weight and scales to theoretically unlim nodes. Also has host health-check capabilities to move VMs before a node is down (I.e. sick node)

Rick Kundiger gravatar imageRick Kundiger ( 2016-10-10 17:15:56 -0600 )edit

8 answers

Sort by ยป oldest newest most voted

answered 2015-02-05 07:51:30 -0600

updated 2015-02-05 12:18:42 -0600


Providing high availability for pets instance is a gap in openstack in which commercial companies doing business accordingly by filling this gap. They work on their openstack-based solution to deliver advanced scheduling and high availability featues.

An important question: Why openstack community does not show any interest to devise a production-proven project in order to address problems of openstack pets instance high availability? The answer is simple. They believe that, in the context of cloud, the applications must be self-recoverable and self-resilient, that can work on unreliable cloud infrastructure in a highly availabile manner.

But about your mentioned solutions. I think the first one is the most strightforward than others, becasue:

  • Heat/HA: No success story of this method I found. And all of the HA related topics in the heat wiki pages are under heavy development and is not stable.
  • Nautron and Allowed Address Pair: As you hitted, this method is best matched to stateless instances and needs extra works to provide pets instance HA.

I found no ready to use implementation of the method #1. But in order to implement it, there are three steps that need to be fulfill when writing watch and react script in this way:

  1. Detecting: Detection of host failure using availabile tools: (1- Nagios OR 2- Pacemaker)
  2. Fencing: Isolating a failed not to protect any execution of same instance on different host and any data corruption in cinder volumes.
  3. Recovering: After detection and fencing, its turn to recovering the failed instances and starting them using nova-evacuate command on the other healthy host(Backup hosts)

The major requirements of this method are as follows:

  • Seting up shared storage using for example gluserFS or cephFS between compute nodes to share /nova/instance directory.
  • Considering two or more compute nodes as backup nodes(which are ready to run failed instances after fail detection)

Hope to be helpful.

edit flag offensive delete link more



Hi Mzoorikh. Thanks for your useful answer.

Moh gravatar imageMoh ( 2015-02-05 12:50:17 -0600 )edit

answered 2015-08-19 07:35:45 -0600

keerthivasanselvaraj gravatar image

updated 2015-08-19 07:36:31 -0600

Try with masakari for virtual machine high availability. It will really helpfull

edit flag offensive delete link more

answered 2015-02-02 00:17:46 -0600

You can use vsphere (esx) from VMware as a hypervisor. Vsphere has HA capabilities for host failures and will auto restart your pets on another hypervisor .

edit flag offensive delete link more


As mentioned in my post, I don't want to use any commercial products such as vmware solutions.

Moh gravatar imageMoh ( 2015-02-02 07:53:29 -0600 )edit

I am looking for open source one.

Moh gravatar imageMoh ( 2015-02-02 08:06:59 -0600 )edit

Hello, the way you wrote it, it was at least misleading (to me also): ...when non-commercial hypervisors (e.g. ESXi)...

vincent-legoll gravatar imagevincent-legoll ( 2015-08-19 07:47:13 -0600 )edit

answered 2015-02-03 09:14:08 -0600

NoNoNoo gravatar image

I think that there isn't yet the best solution for VM-High availability in openstack. You have listed some of solution but they should be checked in each context to verify if they are valid.

  • About "nova evacuate" command that isn't a automatic way of instance fail-over you can develop a custom software in order to verify compute nodes failure(example: nagios, a monitoring software) and automatically execute "nova evacuate".I have read somenthing about this solution here
  • For an HA-mysql you can try to setup a maria db galera cluster with ha proxy and keepalived, here there is a tutorial for centos 6 ; i haven't tried it on openstack.
edit flag offensive delete link more


Thanks. I think the first method(using nova evacuate + monitoring tools), is the most strightforward mechanism.

Moh gravatar imageMoh ( 2015-02-03 09:37:45 -0600 )edit

Yes, i think to it but I didn't found any opensource/commercial/draft solution to implement it. I think that could be useful for enterprise use of openstack. If you find more detail about this solution please share it with me. Thanks.

NoNoNoo gravatar imageNoNoNoo ( 2015-02-03 09:41:45 -0600 )edit

answered 2017-08-23 05:47:07 -0600

Fatemeh Abdollahei gravatar image

Due to Openstack Docs, As of September 2016, the OpenStack High Availability community is designing and developing an official and unified way to provide high availability for instances. you can see details here: (

edit flag offensive delete link more

answered 2015-07-13 09:09:36 -0600

keky gravatar image

I was confused in how to use the scheduler api to pick up a suitable host.So that we can migrate all the failed instances to it.

edit flag offensive delete link more

answered 2015-03-08 06:32:52 -0600

Kamil Babayev gravatar image

Good Day, but I think it is not correct to demand everything from application design. For example I have 10 compute nodes and I have installed Postfix as mail server in two nodes as two instances. if both of those node fails one after another. my system will stop. but it would be nice just to restart instance on another compute node. Can you please clarify this ? Why OpenStack does not include such important feature ? I am so disappointed..

edit flag offensive delete link more

answered 2018-05-16 06:34:07 -0600


i have configured windows failover cluster (3 Node) with SQL Alway on feature which do not required shared storage or quorum (Majority Node). but i am facing problem while reaching to cluster IP from passive node or even form other servers from same subnet. when i telnet Cluster service on port my request reach to Cluster IP from subnet but it do not respond. it seems there is something missing to allow multiple IP (Failover cluster IP ) communication over network from open stack side. these ip's are manually (static) assigned to Cluster configuration.

Please guide if i need anything else to be done from Openstack side to allow communication.

edit flag offensive delete link more

Get to know Ask OpenStack

Resources for moderators

Question Tools



Asked: 2015-02-01 14:04:44 -0600

Seen: 10,715 times

Last updated: Aug 19 '15