crm resources stopped
I have a HA environment built using Mirantis Fuel with 3 Controllers and 2 compute nodes. I am facing an issue with crm resources goes down unexpectedly but I can see the quorum of the controllers nodes when I run crm_mon but cannot see any resources active in crm_mon. I have to manually stop the pacemaker service on the controller which is working as DC and re initiate the services. I have been trying to figure out the reason from pacemaker logs but nothing is clearly visible. some of the extract from pacemaker.log
Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource p_vrouter:0 active on ctrl1. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource neutron-metadata-agent:0 active on ctrl1. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource neutron-l3-agent:0 active on ctrl1. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource neutron-dhcp-agent:0 active on ctrl1. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource p_vrouter:0 active on ctrl2. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource neutron-dhcp-agent:0 active on ctrl2. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource neutron-l3-agent:0 active on ctrl2. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource neutron-metadata-agent:0 active on ctrl2. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource p_vrouter:0 active on ctrl3. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource neutron-l3-agent:0 active on ctrl3. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource neutron-metadata-agent:0 active on ctrl3. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource neutron-dhcp-agent:0 active on ctrl3. Mar 11 05:05:56 [14062] ctrl3. pengine: info: apply_system_health: Applying automated node health strategy: migrate-on-red Mar 11 05:05:56 [14062] ctrl3. pengine: info: apply_system_health: Node ctrl1. has an combined system health of -1000000 Mar 11 05:05:56 [14062] ctrl3. pengine: info: apply_system_health: Node ctrl2. has an combined system health of -1000000 Mar 11 05:05:56 [14062] ctrl3. pengine: info: apply_system_health: Node ctrl3. has an combined system health of -1000000 Mar 11 05:05:56 [14062] ctrl3. pengine: info: clone_print: Clone Set: clone_p_vrouter [p_vrouter] Mar 11 05:05:56 [14062] ctrl3. pengine: info: short_print: Stopped: [ ctrl1. ctrl2. ctrl3. ] Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_print: vip__management (ocf::fuel:ns_IPaddr2): Stopped Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_print: vip__vrouter_pub (ocf::fuel:ns_IPaddr2): Stopped Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_print: vip__vrouter (ocf::fuel:ns_IPaddr2): Stopped Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_print: vip__public (ocf::fuel:ns_IPaddr2): Stopped Mar 11 05:05:56 [14062] ctrl3. pengine: info: clone_print: Clone Set: clone_p_haproxy [p_haproxy] Mar 11 05 ...