Provisioning randomly fails

asked 2015-09-20 20:44:34 -0500

Major-Pickle gravatar image

We have a pretty big Openstack infrastructure based on Icehouse. We will upgrade one day. Anyway we have run into a very peculiar problem. 95% of the time we can provision new vm's just fine. We do this via a ruby script by using the Fog module. 5% of the time though the vm fails to provision. It gets stuck in a BUILD state. The VM gets created but the hypervisior and ip are never set. So it seems like the controller knows about it. The compute node that the new VM was intended to go to never gets it. There are no logs at all on that particular compute node. We could then create a new vm and direct it to that compute node to try and rule out problems with it and almost always it will work just fine. In fact we have never had it not work.

We have done so much investigation and rule out all low hanging fruit like not enough resources, return codes, etc. Everything checks out and we are just stuck looking at a blank screen because for all intents and purposes it should have worked.

edit retag flag offensive close merge delete