HARestarter Not Working - It gets fired without VM going down + gets stuck on the recreation of Port
My HARestarter is not working. I am trying to monitor HeartBeat and based on that trying to restart VM if it goes down. I have tried all possible metrics to restart the host (only if the thresholds are matched) but it does not work for me. Can somebody please help me with proper working template that has been used?
There are 2 problems with this: 1. The VM restart happens without VM going down. 2. On restart, it deletes the resources and recreates them. While recreating, it stucks at the Port ID could not be found
The template i am using is:
heat_template_version: 2013-05-23
description: >
HEAT template for my service
resources:
wait_handle:
type: OS::Heat::UpdateWaitConditionHandle
catalog:
type: OS::Nova::Server
properties:
name: catalog
image: "centos"
flavor: "m1.medium"
key_name: "ucm-dev-keypair"
networks:
- port: { get_resource: catalog_port }
user_data:
str_replace:
template: |
#!/bin/bash -v
mkdir -p /etc/cfn
cat <<EOF > /etc/cfn/cfn-credentials
AWSAccessKeyId=AWS_KEY
AWSSecretKey=AWS_SECRET
EOF
chmod 000400 /etc/cfn/cfn-credentials
cat <<EOF > /etc/cfn/cfn-hup.conf
[main]
stack=STACK_NAME
credential-file=/etc/cfn/cfn-credentials
region='nova'
interval=2
EOF
chmod 000600 /etc/cfn/cfn-hup.conf
cat <<EOF > /tmp/cfn-hup-crontab.txt
* * * * * /opt/aws/bin/cfn-hup -f
* * * * * /opt/aws/bin/cfn-push-stats --watch HEARTBEATFAILUREALARM --heartbeat
EOF
chmod 000600 /etc/cfn/cfn-credentials
crontab /tmp/cfn-hup-crontab.txt
/opt/aws/bin/cfn-signal -e 0 -i 'catalog' "wc_url"
params:
wc_url: { get_resource: wait_handle }
AWS_KEY: { get_resource: WebServerKeys }
AWS_SECRET: { get_attr: [ WebServerKeys, SecretAccessKey ] }
STACK_NAME: { get_param: 'OS::stack_name' }
HEARTBEATFAILUREALARM: { get_resource: ha_alarm }
catalog_port:
type: OS::Neutron::Port
properties:
network_id: "1a068b22-fd93-462d-b7b7-15f56d5e7f17"
fixed_ips:
- subnet_id: "3d7a55d6-4f05-49eb-86d8-7e4cf1b5d650"
wait_condition:
type: AWS::CloudFormation::WaitCondition
#depends_on: wait_handle
properties:
Count: 1
Handle: { get_resource: wait_handle }
Timeout: 3600
heatproxy:
type: AWS::IAM::User
WebServerKeys:
type: AWS::IAM::AccessKey
depends_on: heatproxy
properties:
UserName: { get_resource: heatproxy }
ha_alarm:
type: AWS::CloudWatch::Alarm
depends_on: wait_condition
properties:
AlarmActions:
- { get_resource: restart }
AlarmDescription: Restart the Instance if we miss a heartbeat
ComparisonOperator: LessThanOrEqualToThreshold
EvaluationPeriods: '2'
MetricName: Heartbeat
Namespace: system/linux
Period: '60'
Statistic: Minimum
Threshold: '0'
catalog_floating_ip:
type: OS::Neutron::FloatingIP
properties:
floating_network_id: "3dcfc74c-3913-48ca-acae-789a022b7783"
port_id: { get_resource: catalog_port }
restart:
type: OS::Heat::HARestarter
properties:
InstanceId: { get_resource: catalog }
outputs:
catalog_private_ip:
description: IP address of catalog in private network
value: { get_attr: [ catalog, first_address ] }
catalog_public_ip:
description: Floating IP address of catalog in public network
value: { get_attr: [ catalog_floating_ip, floating_ip_address ] }
Can you edit your question to add more specific details? 'not working' is not precise enough information for anyone to help you.