Ask Your Question
1

HARestarter Not Working - It gets fired without VM going down + gets stuck on the recreation of Port

asked 2014-06-19 00:35:16 -0500

Jay Mehta gravatar image

updated 2014-06-23 02:27:06 -0500

My HARestarter is not working. I am trying to monitor HeartBeat and based on that trying to restart VM if it goes down. I have tried all possible metrics to restart the host (only if the thresholds are matched) but it does not work for me. Can somebody please help me with proper working template that has been used?

There are 2 problems with this: 1. The VM restart happens without VM going down. 2. On restart, it deletes the resources and recreates them. While recreating, it stucks at the Port ID could not be found

The template i am using is:

heat_template_version: 2013-05-23

description: >
  HEAT template for my service

resources:
  wait_handle:
    type: OS::Heat::UpdateWaitConditionHandle

  catalog:
    type: OS::Nova::Server
    properties:
      name: catalog
      image: "centos"
      flavor: "m1.medium"
      key_name: "ucm-dev-keypair"      
      networks:
        - port: { get_resource: catalog_port }
      user_data:
        str_replace:
          template: |
            #!/bin/bash -v
            mkdir -p /etc/cfn
            cat <<EOF > /etc/cfn/cfn-credentials
            AWSAccessKeyId=AWS_KEY
            AWSSecretKey=AWS_SECRET
            EOF
            chmod 000400 /etc/cfn/cfn-credentials
            cat <<EOF > /etc/cfn/cfn-hup.conf
            [main]
            stack=STACK_NAME
            credential-file=/etc/cfn/cfn-credentials 
            region='nova'
            interval=2
            EOF
            chmod 000600 /etc/cfn/cfn-hup.conf
            cat <<EOF > /tmp/cfn-hup-crontab.txt
            * * * * * /opt/aws/bin/cfn-hup -f
            * * * * * /opt/aws/bin/cfn-push-stats --watch HEARTBEATFAILUREALARM --heartbeat
            EOF
            chmod 000600 /etc/cfn/cfn-credentials            
            crontab /tmp/cfn-hup-crontab.txt            
            /opt/aws/bin/cfn-signal -e 0 -i 'catalog' "wc_url"
          params:
            wc_url: { get_resource: wait_handle }
            AWS_KEY: { get_resource: WebServerKeys }
            AWS_SECRET: { get_attr: [ WebServerKeys, SecretAccessKey ] }
            STACK_NAME: { get_param: 'OS::stack_name' }
            HEARTBEATFAILUREALARM: { get_resource: ha_alarm }

  catalog_port:
    type: OS::Neutron::Port
    properties:
      network_id: "1a068b22-fd93-462d-b7b7-15f56d5e7f17"
      fixed_ips:
        - subnet_id: "3d7a55d6-4f05-49eb-86d8-7e4cf1b5d650"

  wait_condition:
    type: AWS::CloudFormation::WaitCondition
    #depends_on: wait_handle
    properties:
      Count: 1
      Handle: { get_resource: wait_handle }
      Timeout: 3600

  heatproxy:
    type: AWS::IAM::User

  WebServerKeys:
    type: AWS::IAM::AccessKey
    depends_on: heatproxy
    properties:
      UserName: { get_resource: heatproxy }

  ha_alarm:
    type: AWS::CloudWatch::Alarm
    depends_on: wait_condition
    properties:
      AlarmActions: 
      - { get_resource: restart }
      AlarmDescription: Restart the Instance if we miss a heartbeat
      ComparisonOperator: LessThanOrEqualToThreshold
      EvaluationPeriods: '2'
      MetricName: Heartbeat
      Namespace: system/linux
      Period: '60'
      Statistic: Minimum
      Threshold: '0'

  catalog_floating_ip:
    type: OS::Neutron::FloatingIP
    properties:
      floating_network_id: "3dcfc74c-3913-48ca-acae-789a022b7783"
      port_id: { get_resource: catalog_port }

  restart:
    type: OS::Heat::HARestarter
    properties:
      InstanceId: { get_resource: catalog }

outputs:
  catalog_private_ip:
    description: IP address of catalog in private network
    value: { get_attr: [ catalog, first_address ] }
  catalog_public_ip:
    description: Floating IP address of catalog in public network
    value: { get_attr: [ catalog_floating_ip, floating_ip_address ] }
edit retag flag offensive close merge delete

Comments

Can you edit your question to add more specific details? 'not working' is not precise enough information for anyone to help you.

smaffulli gravatar imagesmaffulli ( 2014-06-20 16:18:40 -0500 )edit

1 answer

Sort by ยป oldest newest most voted
0

answered 2014-08-13 11:56:50 -0500

zaneb gravatar image

This is due to a Nova bug in which Nova deletes a port that the user has explicitly created as soon as the server is detached from it, because it can't remember whether it created the port implicitly itself or not.

There is also a bug report in Heat to track this.

However, the bottom line is that HARestarter is probably unfixable in general, and you should regard it as deprecated.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2014-06-19 00:35:16 -0500

Seen: 224 times

Last updated: Aug 13 '14