Heat stack with unreachable resource stuck in failed state
I have the following problem (I'm running openstack mitaka (13.2.0) on ubuntu 14.04):
- I've launched a stack via Heat, that created an instance and used Os::Heat::SoftwareConfig + Os::Heat::SoftwareDeployment to set up nginx on another instance, the Gateway.
- SoftwareDeployment was configured with all actions: Create, Update, Delete, Suspend, Resume, as I needed it to reconfigure Gateway on any action accordingly.
- Gateway was irrecoverably broken at some moment, stack update attempted to recreate a port all of a sudden and failed to attach a new one while deleting old one.
- Now, any action on stack from 1 results in failed state, as it's triggering action of SoftwareDeployment, that can not reach the server and fails after timeout.
- According to documentation (http://docs.openstack.org/developer/heat/template_guide/openstack.html#OS::Heat::SoftwareDeployment (http://docs.openstack.org/developer/h...)), updates to "server" cause replacement, so it would have to delete old resource and create a new one, but delete fails after timeout and it's stuck.
I can't yet delete the whole stack, as it's being used (and not sure it will work), so I've created a new gateway and set it up manually. Now I'd like to either remove the SoftwareDeployment referencing old gateway, or make it reference the new one.
I've tried the following already:
- Mark the resource unhealthy — it tries to "update" it with existing values and fails after timeout.
- Set action='DELETE' and status='COMPLETE' directly in database Heat.resource — on stack update it tries to "create" the deleted resource with existing values before replacing it with new one (bug?) and again fails after timeout.
I don't know the logic behind this, so not sure what else I can do.
Is there any other ways I can try to fix this?