# Pacemaker or Nagios for hight-availability ?

HI,

I have a question. What is the best way to got hight-availability of controller and network node ? I want to double these node, so if one dead, i can start the second. I have seen Pacemaker in a lot of article, but i don't know if it's a good idea for a environment production ? Or can i do that with Nagios ?

Thanks for your help. Sorry for my english, L.L

We're running 3-node active/active with Pacemaker in production and there have been no problems so far - of course setting it up was a little tricky sometimes, but it was well worth the effort.

( 2016-02-19 08:01:30 -0600 )edit

UPDATE 02/19/2016 20:16 MSK
Regardless http://keithtenzer.com/2015/01/26/ope...
After some googling I noticed that RH provides ACTIVE/ACTIVE HA Controller Solutions based
on Pacemaker
. So , regardless traditional thinking that Pacemaker causes ACTIVE/PASSIVE
I don't know what kind of HA 3 Node Controller gets uploaded to overcloud by Triple0 (RDO Manager )
END UPDATE

As far as I understand 3 Node HA Controller Pacemaker/Corosync is ( ACTIVE/PASSIVE) approach ,
personally I got very good impression following HAProxy/Keepalived 3 Node HA Controller (ACTIVE/ACTIVE) schema on RDO Liberty . See :- https://github.com/beekhof/osp-ha-dep...
Due to load balancing across 3 Controller's Nodes running simultaneously

    Yes, it is correct. MariaDB Galera runing MultiMater Synchronous replication resolves the problem of deadlock happening when several services intend to update the same record by choosing and rolling back victim transaction. Neutron router in HA mode (VRRP) is based on two of three nodes, so one of this couple is always Master the other one is Backup (keepalived pair). Why the third Controller is required I understood when Master went down and Backup replaced Master.Personally I am not sorry about hours been spent on manual setup no matter of TripleO (RDO Manager) and Ansible approach for undercloud && overcloud automated deployment (at least in meantime).I want to understand what I am doing, instead of relay on prepared yaml templates. Because in case of any failure in future I am supposed to be able troubleshoot the problem, what is much easier when you clearly understand what you actually did. I realize that pretty soon Triple0 && Ansible will become a standard for prod deployments, no matter of what kind of HA for Controllers will be provided . The recent message from Larsks on RDO mailing list is fair enough for myself. Next RH's victory is just around the corner.

Ok, so i need to see what i want for my architecture, active/passive with pacemaker or active/active with keepalived. If i understand, in active/passive, when a service failed, pacemaker change of host. In active/active more than one service run ? Is it correct ? ^^'

( 2016-02-19 06:47:36 -0600 )edit

@dbaxps Thanks for reply. I was happy when i have deploy my openstack successfully, but i see it was the easy stage. The hard is HA ... I think it's not a bad choice to use ansible and Triple0 for my openstack deployement HA. But i'm in trouble. I have 4 node. 1 controller 1 network and 2 computes. But i see in doc Ansible RDO 3 controllers and 1 compute, and u speak about 3 Controller arch. So there is no network node in HA deployement ? The l3 ml2 dhcp etc are all in controller ?

So personnaly, you propose Ansible & Triple0 because it's easier to understand what u do with, and in case of failure, u can repare the problem rapidely ? it's going to be a reference. And A/A with keepalived because service running simultaneously. u are two to propose A/A HA, so with that and after some gooling, i want to use A/A for my arch.

Hum ... Sorry but what do u understand when u said RH ? ><'

EDIT: Sorry for the late answer, i don't work the week-end.

You misunderstood me I never proposed Triple0 && Ansible, I did propose :-
https://github.com/beekhof/osp-ha-dep...

( 2016-02-22 03:12:38 -0600 )edit

"So personnaly, you propose Ansible & Triple0 because it's easier to understand what u do with, and in case of failure, u can repare the problem rapidely" - NO. Just on opposite , following link from github.com for myself it's easier to troubleshoot

( 2016-02-22 03:15:30 -0600 )edit

RH stands for Red Hat Inc.

( 2016-02-22 03:18:23 -0600 )edit

Ok ok. Thanks a lot for these information. I go in your link. Bad english, sry >< Bye.

( 2016-02-22 03:52:41 -0600 )edit

