Ask Your Question
0

Heat stack with unreachable resource stuck in failed state

asked 2016-10-05 01:48:38 -0500

Fenuks gravatar image

I have the following problem (I'm running openstack mitaka (13.2.0) on ubuntu 14.04):

  1. I've launched a stack via Heat, that created an instance and used Os::Heat::SoftwareConfig + Os::Heat::SoftwareDeployment to set up nginx on another instance, the Gateway.
  2. SoftwareDeployment was configured with all actions: Create, Update, Delete, Suspend, Resume, as I needed it to reconfigure Gateway on any action accordingly.
  3. Gateway was irrecoverably broken at some moment, stack update attempted to recreate a port all of a sudden and failed to attach a new one while deleting old one.
  4. Now, any action on stack from 1 results in failed state, as it's triggering action of SoftwareDeployment, that can not reach the server and fails after timeout.
  5. According to documentation (http://docs.openstack.org/developer/heat/template_guide/openstack.html#OS::Heat::SoftwareDeployment (http://docs.openstack.org/developer/h...)), updates to "server" cause replacement, so it would have to delete old resource and create a new one, but delete fails after timeout and it's stuck.

I can't yet delete the whole stack, as it's being used (and not sure it will work), so I've created a new gateway and set it up manually. Now I'd like to either remove the SoftwareDeployment referencing old gateway, or make it reference the new one.

I've tried the following already:

  1. Mark the resource unhealthy — it tries to "update" it with existing values and fails after timeout.
  2. Set action='DELETE' and status='COMPLETE' directly in database Heat.resource — on stack update it tries to "create" the deleted resource with existing values before replacing it with new one (bug?) and again fails after timeout.

I don't know the logic behind this, so not sure what else I can do.

Is there any other ways I can try to fix this?

edit retag flag offensive close merge delete

1 answer

Sort by » oldest newest most voted
1

answered 2016-10-07 07:50:46 -0500

zaneb gravatar image

The good news is that this is fixed in Newton by this patch. There's no Launchpad bug associated with it for some reason, but if you raise one then we could consider backporting it to Mitaka.

The easiest way to work around the problem for now is probably to start an update and manually signal success to the software deployment yourself with the openstack stack resource signal command, to take the place of the server that is supposed to doing the signalling but is gone.

edit flag offensive delete link more

Comments

1

Thank you, reported as https://bugs.launchpad.net/heat/+bug/1631366 (https://bugs.launchpad.net/heat/+bug/...) with reference to this question. Will try your workaround and mark your answer.

Fenuks gravatar imageFenuks ( 2016-10-07 08:21:45 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2016-10-05 01:48:38 -0500

Seen: 1,747 times

Last updated: Oct 07 '16