Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

crm resources stopped

I have a HA environment built using Mirantis Fuel with 3 Controllers and 2 compute nodes. I am facing an issue with crm resources goes down unexpectedly but I can see the quorum of the controllers nodes when I run crm_mon but cannot see any resources active in crm_mon. I have to manually stop the pacemaker service on the controller which is working as DC and re initiate the services. I have been trying to figure out the reason from pacemaker logs but nothing is clearly visible. some of the extract from pacemaker.log

Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource p_vrouter:0 active on ctrl1. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource neutron-metadata-agent:0 active on ctrl1. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource neutron-l3-agent:0 active on ctrl1. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource neutron-dhcp-agent:0 active on ctrl1. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource p_vrouter:0 active on ctrl2. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource neutron-dhcp-agent:0 active on ctrl2. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource neutron-l3-agent:0 active on ctrl2. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource neutron-metadata-agent:0 active on ctrl2. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource p_vrouter:0 active on ctrl3. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource neutron-l3-agent:0 active on ctrl3. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource neutron-metadata-agent:0 active on ctrl3. Mar 11 05:05:56 [14062] ctrl3. pengine: info: determine_op_status: Operation monitor found resource neutron-dhcp-agent:0 active on ctrl3. Mar 11 05:05:56 [14062] ctrl3. pengine: info: apply_system_health: Applying automated node health strategy: migrate-on-red Mar 11 05:05:56 [14062] ctrl3. pengine: info: apply_system_health: Node ctrl1. has an combined system health of -1000000 Mar 11 05:05:56 [14062] ctrl3. pengine: info: apply_system_health: Node ctrl2. has an combined system health of -1000000 Mar 11 05:05:56 [14062] ctrl3. pengine: info: apply_system_health: Node ctrl3. has an combined system health of -1000000 Mar 11 05:05:56 [14062] ctrl3. pengine: info: clone_print: Clone Set: clone_p_vrouter [p_vrouter] Mar 11 05:05:56 [14062] ctrl3. pengine: info: short_print: Stopped: [ ctrl1. ctrl2. ctrl3. ] Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_print: vip__management (ocf::fuel:ns_IPaddr2): Stopped Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_print: vip__vrouter_pub (ocf::fuel:ns_IPaddr2): Stopped Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_print: vip__vrouter (ocf::fuel:ns_IPaddr2): Stopped Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_print: vip__public (ocf::fuel:ns_IPaddr2): Stopped Mar 11 05:05:56 [14062] ctrl3. pengine: info: clone_print: Clone Set: clone_p_haproxy [p_haproxy] Mar 11 05:05:56 [14062] ctrl3. pengine: info: short_print: Stopped: [ ctrl1. ctrl2. ctrl3. ] Mar 11 05:05:56 [14062] ctrl3. pengine: info: clone_print: Clone Set: clone_p_mysqld [p_mysqld] Mar 11 05:05:56 [14062] ctrl3. pengine: info: short_print: Stopped: [ ctrl1. ctrl2. ctrl3. ] Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_print: sysinfo_ctrl3. (ocf::pacemaker:SysInfo): Stopped Mar 11 05:05:56 [14062] ctrl3. pengine: info: clone_print: Master/Slave Set: master_p_conntrackd [p_conntrackd] Mar 11 05:05:56 [14062] ctrl3. pengine: info: short_print: Stopped: [ ctrl1. ctrl2. ctrl3. ] Mar 11 05:05:56 [14062] ctrl3. pengine: info: clone_print: Master/Slave Set: master_p_rabbitmq-server [p_rabbitmq-server] Mar 11 05:05:56 [14062] ctrl3. pengine: info: short_print: Stopped: [ ctrl1. ctrl2. ctrl3. ] Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_print: p_aodh-evaluator (ocf::fuel:aodh-evaluator): Stopped Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_print: p_ceilometer-agent-central (ocf::fuel:ceilometer-agent-central): Stopped Mar 11 05:05:56 [14062] ctrl3. pengine: info: clone_print: Clone Set: clone_neutron-l3-agent [neutron-l3-agent] Mar 11 05:05:56 [14062] ctrl3. pengine: info: short_print: Stopped: [ ctrl1. ctrl2. ctrl3. ] Mar 11 05:05:56 [14062] ctrl3. pengine: info: clone_print: Clone Set: clone_neutron-metadata-agent [neutron-metadata-agent] Mar 11 05:05:56 [14062] ctrl3. pengine: info: short_print: Stopped: [ ctrl1. ctrl2. ctrl3. ] Mar 11 05:05:56 [14062] ctrl3. pengine: info: clone_print: Clone Set: clone_p_heat-engine [p_heat-engine] Mar 11 05:05:56 [14062] ctrl3. pengine: info: short_print: Stopped: [ ctrl1. ctrl2. ctrl3. ] Mar 11 05:05:56 [14062] ctrl3. pengine: info: clone_print: Clone Set: clone_neutron-dhcp-agent [neutron-dhcp-agent] Mar 11 05:05:56 [14062] ctrl3. pengine: info: short_print: Stopped: [ ctrl1. ctrl2. ctrl3. ] Mar 11 05:05:56 [14062] ctrl3. pengine: info: clone_print: Clone Set: clone_p_dns [p_dns] Mar 11 05:05:56 [14062] ctrl3. pengine: info: short_print: Stopped: [ ctrl1. ctrl2. ctrl3. ] Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_print: sysinfo_ctrl1. (ocf::pacemaker:SysInfo): Stopped Mar 11 05:05:56 [14062] ctrl3. pengine: info: clone_print: Clone Set: clone_p_ntp [p_ntp] Mar 11 05:05:56 [14062] ctrl3. pengine: info: short_print: Stopped: [ ctrl1. ctrl2. ctrl3. ] Mar 11 05:05:56 [14062] ctrl3. pengine: info: clone_print: Clone Set: clone_ping_vip__public [ping_vip__public] Mar 11 05:05:56 [14062] ctrl3. pengine: info: short_print: Stopped: [ ctrl1. ctrl2. ctrl3. ] Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_print: sysinfo_ctrl2. (ocf::pacemaker:SysInfo): Stopped Mar 11 05:05:56 [14062] ctrl3. pengine: info: rsc_merge_weights: clone_p_vrouter: Rolling back scores from clone_p_dns Mar 11 05:05:56 [14062] ctrl3. pengine: info: rsc_merge_weights: clone_p_vrouter: Rolling back scores from clone_p_ntp Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource p_vrouter:0 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource p_vrouter:1 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource p_vrouter:2 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: rsc_merge_weights: clone_p_haproxy: Rolling back scores from vip__management Mar 11 05:05:56 [14062] ctrl3. pengine: info: rsc_merge_weights: clone_p_haproxy: Rolling back scores from vip__public Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource p_haproxy:0 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource p_haproxy:1 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource p_haproxy:2 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource vip__management cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: rsc_merge_weights: vip__vrouter_pub: Rolling back scores from master_p_conntrackd Mar 11 05:05:56 [14062] ctrl3. pengine: info: rsc_merge_weights: vip__vrouter_pub: Rolling back scores from vip__vrouter Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource vip__vrouter_pub cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource vip__vrouter cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource vip__public cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource p_mysqld:0 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource p_mysqld:1 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource p_mysqld:2 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource sysinfo_ctrl3. cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: p_conntrackd:0: Rolling back scores from vip__vrouter_pub Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource p_conntrackd:0 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: p_conntrackd:1: Rolling back scores from vip__vrouter_pub Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource p_conntrackd:1 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: p_conntrackd:2: Rolling back scores from vip__vrouter_pub Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource p_conntrackd:2 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: master_color: master_p_conntrackd: Promoted 0 instances of a possible 1 to master Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource p_rabbitmq-server:0 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource p_rabbitmq-server:1 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource p_rabbitmq-server:2 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: master_color: master_p_rabbitmq-server: Promoted 0 instances of a possible 1 to master Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource p_aodh-evaluator cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource p_ceilometer-agent-central cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource neutron-l3-agent:0 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource neutron-l3-agent:1 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource neutron-l3-agent:2 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource neutron-metadata-agent:0 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource neutron-metadata-agent:1 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource neutron-metadata-agent:2 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource p_heat-engine:0 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource p_heat-engine:1 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource p_heat-engine:2 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource neutron-dhcp-agent:0 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource neutron-dhcp-agent:1 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource neutron-dhcp-agent:2 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: notice: clone_rsc_colocation_rh: Cannot pair p_dns:0 with instance of clone_p_vrouter Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource p_dns:0 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: notice: clone_rsc_colocation_rh: Cannot pair p_dns:1 with instance of clone_p_vrouter Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource p_dns:1 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: notice: clone_rsc_colocation_rh: Cannot pair p_dns:2 with instance of clone_p_vrouter Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource p_dns:2 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource sysinfo_ctrl1. cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: notice: clone_rsc_colocation_rh: Cannot pair p_ntp:0 with instance of clone_p_vrouter Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource p_ntp:0 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: notice: clone_rsc_colocation_rh: Cannot pair p_ntp:1 with instance of clone_p_vrouter Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource p_ntp:1 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: notice: clone_rsc_colocation_rh: Cannot pair p_ntp:2 with instance of clone_p_vrouter Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource p_ntp:2 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource ping_vip__public:0 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource ping_vip__public:1 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource ping_vip__public:2 cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: info: native_color: Resource sysinfo_ctrl2. cannot run anywhere Mar 11 05:05:56 [14062] ctrl3. pengine: notice: stage6: Scheduling Node ctrl1. for shutdown Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave p_vrouter:0 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave p_vrouter:1 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave p_vrouter:2 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave vip__management (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave vip__vrouter_pub (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave vip__vrouter (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave vip__public (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave p_haproxy:0 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave p_haproxy:1 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave p_haproxy:2 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave p_mysqld:0 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave p_mysqld:1 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave p_mysqld:2 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave sysinfo_ctrl3. (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave p_conntrackd:0 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave p_conntrackd:1 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave p_conntrackd:2 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave p_rabbitmq-server:0 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave p_rabbitmq-server:1 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave p_rabbitmq-server:2 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave p_aodh-evaluator (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave p_ceilometer-agent-central (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave neutron-l3-agent:0 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave neutron-l3-agent:1 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave neutron-l3-agent:2 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave neutron-metadata-agent:0 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave neutron-metadata-agent:1 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave neutron-metadata-agent:2 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave p_heat-engine:0 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave p_heat-engine:1 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave p_heat-engine:2 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave neutron-dhcp-agent:0 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave neutron-dhcp-agent:1 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave neutron-dhcp-agent:2 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave p_dns:0 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave p_dns:1 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave p_dns:2 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave sysinfo_ctrl1. (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave p_ntp:0 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave p_ntp:1 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave p_ntp:2 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave ping_vip__public:0 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave ping_vip__public:1 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave ping_vip__public:2 (Stopped) Mar 11 05:05:56 [14062] ctrl3. pengine: info: LogActions: Leave sysinfo_ctrl2. (Stopped)

has anybody faced such an issue ? and if someone can guide on how to debug pacemaker.