Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

pacemaker can't start resources with single controller online

Hello,

I have 3 controllers in my environment deployed by mirantis fuel and today I saw, that when 1/3 controllers are down, pacemaker works perfectly. But when 2/3 controllers are down - all pcs resources are stopped.

I can't find any reason why, maybe I need to change some properties in my cluster? I would be grateful for any help.

# pcs status
Cluster name:
WARNING: corosync and pacemaker node names do not match (IPs used in setup?)
Last updated: Wed Jun 28 15:14:39 2017          Last change: Wed Jun 28 14:55:45 2017 by hacluster via crmd on node-1.snetcloud.com
Stack: corosync
Current DC: node-13.snetcloud.com (version 1.1.14-70404b0) - partition WITHOUT quorum
3 nodes and 50 resources configured

Online: [ node-13.snetcloud.com ]
OFFLINE: [ node-1.snetcloud.com node-14.snetcloud.com ]

Full list of resources:

 Clone Set: clone_p_vrouter [p_vrouter]
     Stopped: [ node-13.snetcloud.com ]
 vip__management        (ocf::fuel:ns_IPaddr2): Stopped
 vip__zbx_vip_mgmt      (ocf::fuel:ns_IPaddr2): Stopped
 vip__vrouter_pub       (ocf::fuel:ns_IPaddr2): Stopped
 vip__vrouter   (ocf::fuel:ns_IPaddr2): Stopped
 vip__public    (ocf::fuel:ns_IPaddr2): Stopped
 Clone Set: clone_p_haproxy [p_haproxy]
     Stopped: [ node-13.snetcloud.com ]
 sysinfo_node-14.snetcloud.com  (ocf::pacemaker:SysInfo):       Stopped
 Master/Slave Set: master_p_conntrackd [p_conntrackd]
     Stopped: [ node-13.snetcloud.com ]
 sysinfo_node-13.snetcloud.com  (ocf::pacemaker:SysInfo):       Stopped
 Master/Slave Set: master_p_rabbitmq-server [p_rabbitmq-server]
     Slaves: [ node-13.snetcloud.com ]
 Clone Set: clone_p_mysqld [p_mysqld]
     Started: [ node-13.snetcloud.com ]
 Clone Set: clone_p_dns [p_dns]
     Stopped: [ node-13.snetcloud.com ]
 sysinfo_node-1.snetcloud.com   (ocf::pacemaker:SysInfo):       Stopped
 p_aodh-evaluator       (ocf::fuel:aodh-evaluator):     Stopped
 p_ceilometer-agent-central     (ocf::fuel:ceilometer-agent-central):   Stopped
 Clone Set: clone_p_heat-engine [p_heat-engine]
     Stopped: [ node-13.snetcloud.com ]
 Clone Set: clone_neutron-openvswitch-agent [neutron-openvswitch-agent]
     Stopped: [ node-13.snetcloud.com ]
 Clone Set: clone_neutron-l3-agent [neutron-l3-agent]
     Stopped: [ node-13.snetcloud.com ]
 Clone Set: clone_neutron-metadata-agent [neutron-metadata-agent]
     Stopped: [ node-13.snetcloud.com ]
 Clone Set: clone_neutron-dhcp-agent [neutron-dhcp-agent]
     Stopped: [ node-13.snetcloud.com ]
 Clone Set: clone_p_ntp [p_ntp]
     Stopped: [ node-13.snetcloud.com ]
 p_zabbix-server        (ocf::fuel:zabbix-server):      Stopped
 Clone Set: clone_ping_vip__public [ping_vip__public]
     Stopped: [ node-13.snetcloud.com ]

PCSD Status:
  *Unknown* (172.29.0.10): Offline
  node-13.snetcloud.com member (172.29.0.11): Offline
  *Unknown* (172.29.0.9): Offline

Cluster Properties: Cluster Properties: cluster-infrastructure: corosync cluster-recheck-interval: 190s dc-version: 1.1.14-70404b0 have-watchdog: false last-lrm-refresh: 1498654072 node-health-strategy: migrate-on-red start-failure-is-fatal: false stonith-enabled: false symmetric-cluster: false UID/GID: uid=hacluster gid=haclient

pcs config:

https://pastebin.com/raw/zL7M37Cq

pacemaker can't start resources with single controller online

Hello,

I have 3 controllers in my environment deployed by mirantis fuel and today I saw, that when 1/3 controllers are down, pacemaker works perfectly. But when 2/3 controllers are down - all pcs resources are stopped.

I can't find any reason why, maybe I need to change some properties in my cluster? I would be grateful for any help.

# pcs status
Cluster name:
WARNING: corosync and pacemaker node names do not match (IPs used in setup?)
Last updated: Wed Jun 28 15:14:39 2017          Last change: Wed Jun 28 14:55:45 2017 by hacluster via crmd on node-1.snetcloud.com
Stack: corosync
Current DC: node-13.snetcloud.com (version 1.1.14-70404b0) - partition WITHOUT quorum
3 nodes and 50 resources configured

Online: [ node-13.snetcloud.com ]
OFFLINE: [ node-1.snetcloud.com node-14.snetcloud.com ]

Full list of resources:

 Clone Set: clone_p_vrouter [p_vrouter]
     Stopped: [ node-13.snetcloud.com ]
 vip__management        (ocf::fuel:ns_IPaddr2): Stopped
 vip__zbx_vip_mgmt      (ocf::fuel:ns_IPaddr2): Stopped
 vip__vrouter_pub       (ocf::fuel:ns_IPaddr2): Stopped
 vip__vrouter   (ocf::fuel:ns_IPaddr2): Stopped
 vip__public    (ocf::fuel:ns_IPaddr2): Stopped
 Clone Set: clone_p_haproxy [p_haproxy]
     Stopped: [ node-13.snetcloud.com ]
 sysinfo_node-14.snetcloud.com  (ocf::pacemaker:SysInfo):       Stopped
 Master/Slave Set: master_p_conntrackd [p_conntrackd]
     Stopped: [ node-13.snetcloud.com ]
 sysinfo_node-13.snetcloud.com  (ocf::pacemaker:SysInfo):       Stopped
 Master/Slave Set: master_p_rabbitmq-server [p_rabbitmq-server]
     Slaves: [ node-13.snetcloud.com ]
 Clone Set: clone_p_mysqld [p_mysqld]
     Started: [ node-13.snetcloud.com ]
 Clone Set: clone_p_dns [p_dns]
     Stopped: [ node-13.snetcloud.com ]
 sysinfo_node-1.snetcloud.com   (ocf::pacemaker:SysInfo):       Stopped
 p_aodh-evaluator       (ocf::fuel:aodh-evaluator):     Stopped
 p_ceilometer-agent-central     (ocf::fuel:ceilometer-agent-central):   Stopped
 Clone Set: clone_p_heat-engine [p_heat-engine]
     Stopped: [ node-13.snetcloud.com ]
 Clone Set: clone_neutron-openvswitch-agent [neutron-openvswitch-agent]
     Stopped: [ node-13.snetcloud.com ]
 Clone Set: clone_neutron-l3-agent [neutron-l3-agent]
     Stopped: [ node-13.snetcloud.com ]
 Clone Set: clone_neutron-metadata-agent [neutron-metadata-agent]
     Stopped: [ node-13.snetcloud.com ]
 Clone Set: clone_neutron-dhcp-agent [neutron-dhcp-agent]
     Stopped: [ node-13.snetcloud.com ]
 Clone Set: clone_p_ntp [p_ntp]
     Stopped: [ node-13.snetcloud.com ]
 p_zabbix-server        (ocf::fuel:zabbix-server):      Stopped
 Clone Set: clone_ping_vip__public [ping_vip__public]
     Stopped: [ node-13.snetcloud.com ]

PCSD Status:
  *Unknown* (172.29.0.10): Offline
  node-13.snetcloud.com member (172.29.0.11): Offline
  *Unknown* (172.29.0.9): Offline

Cluster Properties: Cluster Properties: cluster properties:

 cluster-infrastructure: corosync
  cluster-recheck-interval: 190s
  dc-version: 1.1.14-70404b0
  have-watchdog: false
  last-lrm-refresh: 1498654072
  node-health-strategy: migrate-on-red
  start-failure-is-fatal: false
  stonith-enabled: false
  symmetric-cluster: false
 UID/GID: uid=hacluster gid=haclient

gid=haclient

pcs config:

https://pastebin.com/raw/zL7M37Cq