Ask Your Question
0

Unable to start Corosync Cluster Engine

asked 2016-05-02 07:18:08 -0500

mr.andersons gravatar image

updated 2016-05-04 05:46:41 -0500

I'm trying to create HA OpenStack cluster for controller nodes by following http://docs.openstack.org/ha-guide/ (OpenStack HA-guide).

So I have three nodes in cluster:

  • controller-0
  • controller-1
  • controller-2

Setted up a password for hacluster user on each host.

[root@controller-0 ~]# yum install pacemaker pcs corosync libqb fence-agents-all resource-agents –y ;

Authenticated in all nodes using password which should make up the cluster

[root@controller-0 ~]# pcs cluster auth controller-0 controller-1 controller-2 -u hacluster -p password --force  
controller-2: Authorized
controller-1: Authorized
controller-0: Authorized

After that created cluster:

[root@controller-1 ~]# pcs cluster setup --force --name ha-controller controller-0 controller-1 controller-2
Redirecting to /bin/systemctl stop  pacemaker.service
Redirecting to /bin/systemctl stop  corosync.service
Killing any remaining services...
Removing all cluster configuration files...
controller-0: Succeeded
controller-1: Succeeded
controller-2: Succeeded
Synchronizing pcsd certificates on nodes controller-0, controller-1 controller-2...
controller-2: Success
controller-1: Success
controller-0: Success
Restaring pcsd on the nodes in order to reload the certificates...
controller-2: Success
controller-1: Success
controller-0: Success

Started cluster:

[root@controller-0 ~]# pcs cluster start --all
controller-0:
controller-2:
controller-1:

But when I start corosync, I get:

[root@controller-0 ~]# systemctl start corosync
Job for corosync.service failed because the control process exited with error code. 
See "systemctl status corosync.service" and "journalctl -xe" for details.

In message log:

controller-0 systemd: Starting Corosync Cluster Engine...
controller-0 corosync[23538]: [MAIN  ] Corosync Cluster Engine ('2.3.4'): started and ready to provide service.
controller-0 corosync[23538]: [MAIN  ] Corosync built-in features: dbus systemd xmlconf snmp pie relro bindnow
controller-0 corosync[23539]: [TOTEM ] Initializing transport (UDP/IP Unicast).
controller-0 corosync[23539]: [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none
controller-0 corosync: Starting Corosync Cluster Engine (corosync): [FAILED]
controller-0 systemd: corosync.service: control process exited, code=exited status=1
controller-0 systemd: Failed to start Corosync Cluster Engine.
controller-0 systemd: Unit corosync.service entered failed state.
controller-0 systemd: corosync.service failed.

My corosync config file:

[root@controller-0 ~]# cat /etc/corosync/corosync.conf    
totem {   
    version: 2    
    secauth: off    
    cluster_name: ha-controller    
    transport: udpu    
}    
nodelist {    
    node {    
        ring0_addr: controller-0    
        nodeid: 1     
    }
    node {
        ring0_addr: controller-1
        nodeid: 2
    }
    node {
        ring0_addr: controller-2
        nodeid: 3
    }
}
quorum {
    provider: corosync_votequorum
    expected_votes: 3
    wait_for_all: 1
    last_man_standing: 1
    last_man_standing_window: 10000
}
logging {
    to_logfile: yes
    logfile: /var/log/cluster/corosync.log
    to_syslog: yes
}

Also all names are resolvable

OS is CentOS Linux release 7.2.1511 (Core)

[root@controller-0 ~]# uname -a
Linux controller-0 3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31 16:04:38 UTC 2016 x86_64 x86_64 x86_64

GNU/Linux Installed versions:

pacemaker.x86_64                1.1.13-10.el7_2.2   @updates
pacemaker-cli.x86_64            1.1.13-10.el7_2.2   @updates
pacemaker-cluster-libs.x86_64   1.1.13-10.el7_2.2   @updates
pacemaker-libs.x86_64           1.1.13-10.el7_2.2   @updates
corosync.x86_64                 2.3.4-7.el7_2.1     @updates
corosynclib.x86_64              2.3.4-7.el7_2.1     @updates
libqb.x86_64                    0.17.1-2.el7.1      @updates
fence-agents-all.x86_64         4.0.11-27.el7_2.7   @updates
resource-agents.x86_64          3.9.5-54.el7_2.9    @updates

UPDATE:
Tried on clean install, but nothing. Everything went well until starting cluster

[root@controller-0 ~]# pcs cluster start --all
controller-0: Starting Cluster ...
(more)
edit retag flag offensive close merge delete

Comments

Have U configured firewall? I followed this guide and everything was working - http://clusterlabs.org/doc/en-US/Pace...

Could U give another shot?

yprokule gravatar imageyprokule ( 2016-05-10 04:14:20 -0500 )edit

Tried it with and without firewall, but no luck. I will look at clusterlabs guide.

mr.andersons gravatar imagemr.andersons ( 2016-05-11 06:19:23 -0500 )edit

@mr.andersons - any luck? Have U managed to get it working?

yprokule gravatar imageyprokule ( 2016-05-25 01:09:56 -0500 )edit

From my experience, the HA Guide is not really helpful (and outdated). There were so many steps that I had to figure out myself digging through log files and other blog posts, then merging all that information.

eblock gravatar imageeblock ( 2018-04-26 03:37:40 -0500 )edit

3 answers

Sort by » oldest newest most voted
1

answered 2016-05-02 14:23:26 -0500

larsks gravatar image

updated 2016-05-03 08:28:04 -0500

I'm running on CentOS Linux release 7.2.1511 (Core) and I seem to have the same package versions that you have. I'm not able to reproduce your problem. You can see a complete recording of my session here:

The part where your sessions seem to go awry is at this step:

[root@controller-0 ~]# pcs cluster start --all
controller-0:
controller-2:
controller-1:

Which ought to look like this:

[root@controller-0 ~]# pcs cluster start --all
controller-1: Starting Cluster...
controller-0: Starting Cluster...
controller-2: Starting Cluster...

You probably want to look for additional diagnostics in:

  • The journal for pacemaker and corosync:

    journalctl -u pacemaker
    

    Or:

    journalctl -u corosync
    
  • The contents of /var/log/cluster/corosync.log

If you spot anything there, maybe update your question with the new information.

Also, note that yprokule is totally correct: pcs cluster start --all starts corosync for you; you shouldn't need to start it manually. You can infer this from the session recording I posted, but it's good to be explicit.

Update

Note that the current HA guide is missing an important step. It's not directly relevant to your issue, but you'll run into if you get things working.

When you run pcs cluster start --all, pcsd starts corosync and pacemaker on all the cluster nodes. It does not enable them persistently, which means that cluster services will not come up when you reboot a node.

You'll want to also run:

pcs cluster enable --all
edit flag offensive delete link more
0

answered 2016-05-02 16:18:27 -0500

yprokule gravatar image

IIRC, U don't need to start corosync manually it would be started via pcs cluster start --all

Looks like your command failed cause you tried to start it manually. According to your log service had already stared when new start was issued.

Could just run pcs cluster start --all and observe what's going on ?

Otherwise, follow steps from @larsks

edit flag offensive delete link more
0

answered 2018-04-25 05:47:59 -0500

I ran into a similar issue with a different setup. Please check your /etc/hosts file for entry with 127.0.0.1 and hostname. If its there, remove the entry and then try creating & starting the cluster.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

2 followers

Stats

Asked: 2016-05-02 07:18:08 -0500

Seen: 8,604 times

Last updated: Apr 25 '18