Ask Your Question
0

crm resources stopped

asked 2017-03-11 02:17:23 -0600

tirpitz gravatar image

I have a HA environment built using Mirantis Fuel with 3 Controllers and 2 compute nodes. I am facing an issue with crm resources goes down unexpectedly but I can see the quorum of the controllers nodes when I run crm_mon but cannot see any resources active in crm_mon. I have to manually stop the pacemaker service on the controller which is working as DC and re initiate the services. I have been trying to figure out the reason from pacemaker logs but nothing is clearly visible. some of the extract from pacemaker.log

Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource p_vrouter:0 active on ctrl1. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource neutron-metadata-agent:0 active on ctrl1. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource neutron-l3-agent:0 active on ctrl1. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource neutron-dhcp-agent:0 active on ctrl1. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource p_vrouter:0 active on ctrl2. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource neutron-dhcp-agent:0 active on ctrl2. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource neutron-l3-agent:0 active on ctrl2. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource neutron-metadata-agent:0 active on ctrl2. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource p_vrouter:0 active on ctrl3. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource neutron-l3-agent:0 active on ctrl3. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource neutron-metadata-agent:0 active on ctrl3. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource neutron-dhcp-agent:0 active on ctrl3. Mar 11 05:05:56 [14062] ctrl3. pengine: info: apply_system_health: Applying automated node health strategy: migrate-on-red Mar 11 05:05:56 [14062] ctrl3. pengine: info: apply_system_health: Node ctrl1. has an combined system health of -1000000 Mar 11 05:05:56 [14062] ctrl3. pengine: info: apply_system_health: Node ctrl2. has an combined system health of -1000000 Mar 11 05:05:56 [14062] ctrl3. pengine: info: apply_system_health: Node ctrl3. has an combined system health of -1000000 Mar 11 05:05:56 [14062] ctrl3. pengine: info: clone_print: Clone Set: clone_p_vrouter [p_vrouter] Mar 11 05:05:56 [14062] ctrl3. pengine: info: short_print: Stopped: [ ctrl1. ctrl2. ctrl3. ] Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_print: vip__management (ocf::fuel:ns_IPaddr2): Stopped Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_print: vip__vrouter_pub (ocf::fuel:ns_IPaddr2): Stopped Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_print: vip__vrouter (ocf::fuel:ns_IPaddr2): Stopped Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_print: vip__public (ocf::fuel:ns_IPaddr2): Stopped Mar 11 05:05:56 [14062] ctrl3. pengine: info: clone_print: Clone Set: clone_p_haproxy [p_haproxy] Mar 11 05 ... (more)

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted
1

answered 2017-03-11 11:24:58 -0600

tirpitz gravatar image

I got the issue, the issue was happening due to node health strategy incorporated in pacemaker, and the bench mark for node health strategy by default is configured to check the node load (CPU, disk), and if the load is higher than set threshold, the default configuration says "migrate-to-red" for node-health-strategy in xml configuration.

<nvpair id="cib-bootstrap-options-node-health-strategy" name="node-health-strategy" value="migrate-to-red"/>

once i switched it to none, pacemaker continued to work without issue.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2017-03-11 02:17:23 -0600

Seen: 288 times

Last updated: Mar 11 '17