Configure Aodh alarm for server failure

I'd like to create an Aodh alarm (or a group of several alarms) that fires when a Nova server stops running because either the hypervisor it was running on failed or because somebody manually deletes it from Nova. Ideally, I'd like to be able to specify a single server or a group of servers to watch.

It looks like the Event alarm feature could be used for this purpose. Is there a worked example somewhere of how to set up an alarm to trigger in these situations?

Is there another way to create an alarm that triggers in these situations, based on Ceilometer data?

2 answers

answered 2017-02-27 16:41:37 -0600

I ended up using several event alarms to cover various possible actions:

  • Stopping the server from the Nova API
  • Deleting the server from the Nova API
  • Nova putting the server into the 'error' state for whatever reason

I created a Heat template to define these alarms (requires Ocata).

In the case of that template, the alarm action posts to Zaqar, and a subscription is used to trigger a Mistral workflow that tells Heat to replace the server.

If Nova (or any other tool) sends a notification using oslo.messaging, then in theory you can create an Aodh event alarm that triggers any action.

However, AFAIK nor Nova nor Ceilometer generates such events.

Then, you can use regular alarms (threshold based) if you can identify a threshold in Ceilometer/Gnocchi that maps to that event/problem.

IIRC Nova sends events like 'compute.instance.(stop|delete|resume' so it suppose to be doable to create event alarms based on this events. With regular alarms U'll need to configure insufficient data actions to get notifications when data stops flowing.

yprokule ( 2016-10-12 05:01:27 -0600 )

