Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Galera breaks then all the controllers go down

These controller nodes have not been rebooted or stopped. However, we install cluster, get them running and after a few weeks of use, galera gets out of sync, I think, and we get into this mode. The entire stack stops working. Can anyone give us any idea how we can get to the bottom of this? We have seen this on multiple clusters on Liberty and Mitaka.

[root@overcloud-controller-1 log]# pcs status
Cluster name: tripleo_cluster
Stack: corosync
Current DC: overcloud-controller-1 (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum
Last updated: Wed Mar  1 06:01:15 2017          Last change: Wed Mar  1 05:50:30 2017 by hacluster via crmd on    overcloud-controller-0

2 nodes and 87 resources configured

Online: [ overcloud-controller-1 ]
OFFLINE: [ overcloud-controller-0 ]

Full list of resources:

 ip-172.16.0.10 (ocf::heartbeat:IPaddr2):       Started overcloud-controller-1
 ip-172.18.0.10 (ocf::heartbeat:IPaddr2):       Started overcloud-controller-1
 ip-10.1.32.80  (ocf::heartbeat:IPaddr2):       Started overcloud-controller-1
     Clone Set: haproxy-clone [haproxy]
     Started: [ overcloud-controller-1 ]
     Stopped: [ overcloud-controller-0 ]
 Master/Slave Set: galera-master [galera]
     Slaves: [ overcloud-controller-1 ]
     Stopped: [ overcloud-controller-0 ]
 Clone Set: memcached-clone [memcached]
     Started: [ overcloud-controller-1 ]
     Stopped: [ overcloud-controller-0 ]


ip-172.16.0.11 (ocf::heartbeat:IPaddr2):       Started overcloud-controller-1
 Clone Set: rabbitmq-clone [rabbitmq]
     Started: [ overcloud-controller-1 ]
     Stopped: [ overcloud-controller-0 ]
     Clone Set: openstack-core-clone [openstack-core]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 ]
 Master/Slave Set: redis-master [redis]
     Masters: [ overcloud-controller-1 ]
     Stopped: [ overcloud-controller-0 ]
 ip-172.22.0.22 (ocf::heartbeat:IPaddr2):       Started overcloud-controller-1
 ip-172.19.0.10 (ocf::heartbeat:IPaddr2):       Started overcloud-controller-1
 Clone Set: mongod-clone [mongod]
     Started: [ overcloud-controller-1 ]
     Stopped: [ overcloud-controller-0 ]
 Clone Set: openstack-aodh-evaluator-clone [openstack-aodh-evaluator]
 Stopped: [ overcloud-controller-0 overcloud-controller-1 ]
Clone Set: openstack-nova-scheduler-clone [openstack-nova-scheduler]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 ]
 Clone Set: neutron-l3-agent-clone [neutron-l3-agent]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 ]
 Clone Set: neutron-netns-cleanup-clone [neutron-netns-cleanup]
     Started: [ overcloud-controller-1 ]
     Stopped: [ overcloud-controller-0 ]
 Clone Set: neutron-ovs-cleanup-clone [neutron-ovs-cleanup]
 Started: [ overcloud-controller-1 ]
 Stopped: [ overcloud-controller-0 ]
openstack-cinder-volume        (systemd:openstack-cinder-volume):      Stopped
 Clone Set: openstack-heat-engine-clone [openstack-heat-engine]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 ]
 Clone Set: openstack-ceilometer-api-clone [openstack-ceilometer-api]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 ]
 Clone Set: openstack-aodh-listener-clone [openstack-aodh-listener]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 ]
 Clone Set: neutron-metadata-agent-clone [neutron-metadata-agent]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 ]
 Clone Set: openstack-gnocchi-metricd-clone [openstack-gnocchi-metricd]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 ]
 Clone Set: openstack-aodh-notifier-clone [openstack-aodh-notifier]
 Stopped: [ overcloud-controller-0 overcloud-controller-1 ]
 Clone Set: openstack-heat-api-clone [openstack-heat-api]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 ]
 Clone Set: openstack-ceilometer-collector-clone [openstack-ceilometer-collector]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 ]
 Clone Set: openstack-glance-api-clone [openstack-glance-api]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 ]
 Clone Set: openstack-cinder-scheduler-clone [openstack-cinder-scheduler]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 ]
 Clone Set: openstack-nova-api-clone [openstack-nova-api]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 ]
 Clone Set: openstack-nova-consoleauth-clone [openstack-nova-consoleauth]
 Stopped: [ overcloud-controller-0 overcloud-controller-1 ]
 Clone Set: openstack-sahara-api-clone [openstack-sahara-api]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 ]
 Clone Set: openstack-heat-api-cloudwatch-clone [openstack-heat-api-cloudwatch]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 ]
 Clone Set: openstack-sahara-engine-clone [openstack-sahara-engine]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 ]
 Clone Set: openstack-glance-registry-clone [openstack-glance-registry]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 ]
 Clone Set: openstack-gnocchi-statsd-clone [openstack-gnocchi-statsd]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 ]
 Clone Set: openstack-ceilometer-notification-clone [openstack-ceilometer-notification]
 Stopped: [ overcloud-controller-0 overcloud-controller-1 ]


Clone Set: openstack-cinder-api-clone [openstack-cinder-api]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 ]
 Clone Set: neutron-dhcp-agent-clone [neutron-dhcp-agent]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 ]
 Clone Set: neutron-openvswitch-agent-clone [neutron-openvswitch-agent]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 ]
 Clone Set: openstack-nova-novncproxy-clone [openstack-nova-novncproxy]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 ]
 Clone Set: delay-clone [delay]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 ]
 Clone Set: neutron-server-clone [neutron-server]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 ]

 Clone Set: openstack-ceilometer-central-clone [openstack-ceilometer-central]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 ]
 Clone Set: httpd-clone [httpd]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 ]
 Clone Set: openstack-heat-api-cfn-clone [openstack-heat-api-cfn]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 ]
 Clone Set: openstack-nova-conductor-clone [openstack-nova-conductor]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 ]

Failed Actions:
* openstack-heat-engine_start_0 on overcloud-controller-1 'not running' (7): call=303, status=complete, exitreason='none',
    last-rc-change='Mon Feb  6 23:19:13 2017', queued=0ms, exec=2075ms
* openstack-nova-scheduler_monitor_60000 on overcloud-controller-1 'OCF_PENDING' (196): call=321, status=complete, exitreason='none',
    last-rc-change='Wed Feb  8 21:47:43 2017', queued=0ms, exec=0ms
* rabbitmq_monitor_10000 on overcloud-controller-1 'not running' (7): call=259, status=complete, exitreason='none',
    last-rc-change='Wed Mar  1 02:45:20 2017', queued=0ms, exec=0ms
* openstack-gnocchi-statsd_start_0 on overcloud-controller-1 'not running' (7): call=635, status=complete, exitreason='none',
    last-rc-change='Tue Feb 28 23:43:13 2017', queued=0ms, exec=2100ms

* neutron-server_monitor_60000 on overcloud-controller-1 'OCF_PENDING' (196): call=498, status=complete, exitreason='none',
    last-rc-change='Wed Feb 15 09:07:38 2017', queued=0ms, exec=0ms
* neutron-openvswitch-agent_monitor_60000 on overcloud-controller-1 'OCF_PENDING' (196): call=520, status=complete, exitreason='none',
    last-rc-change='Wed Feb 15 14:36:52 2017', queued=0ms, exec=0ms