How can I delete a busy resource provider?

asked 2019-09-05 05:19:23 -0500

updated 2019-09-06 04:35:13 -0500

I have a compute host where I can't start or migrate instances to. The nova log on the host has these error messages:

2019-09-05 11:15:50.189 8 ERROR [req-8abd8387-ba64-43ae-bfec-9e390f0117d1 - - - - -] [req-ed0096a9-e843-4f68-8fd0-873d5aa0cbf9] Failed to create resource provider record in placement API for UUID e9d75bbf-1cc2-41c4-956d-bd1451be8d8b. Got 409: {"errors": [{"status": 409, "request_id": "req-ed0096a9-e843-4f68-8fd0-873d5aa0cbf9", "detail": "There was a conflict when trying to complete your request.\n\n Conflicting resource provider name: already exists. ", "title": "Conflict"}]}. 2019-09-05 11:15:50.190 8 ERROR nova.compute.manager [req-8abd8387-ba64-43ae-bfec-9e390f0117d1 - - - - -] Error updating resources for node ResourceProviderCreationFailed: Failed to create resource provider

I found (this similar question), which pointed me towards installing osc-placement. This helped me further, but I'm unable to remove the existing resource provider, as suggested in the answer on that question:

$ openstack resource provider list
| uuid                                 | name                                | generation |
| cdfa7343-1258-4b1c-9577-889972bc851c |    |       1276 |

$ openstack resource provider delete cdfa7343-1258-4b1c-9577-889972bc851c
Unable to delete resource provider cdfa7343-1258-4b1c-9577-889972bc851c: Resource provider has allocations. (HTTP 409)

There are no instances running on the host:

$ openstack server list --host cdfa7343-1258-4b1c-9577-889972bc851c
[empty result]

resource provider allocation show requires the UUID of a consumer, but I can't find a consumer on that host.

How can I delete the resource provider, so it can register again?

Further investigation:

I ran

for SERVER in $(openstack server list --all-projects -f value -c ID); do 
    if openstack resource provider allocation show -f value $SERVER |grep -q cdfa7343-1258-4b1c-9577-889972bc851c; then
        echo $SERVER

to see if I can find an instance that has allocations on that host. With that process I found multiple instances that have allocations on compute05, while I could determine with openstack server show that they are actually running on a different host.

How can I detach these instances from compute05?

answered 2019-09-05 08:22:56 -0500

After some trials with test instances I determined that openstack resource provider allocation delete does not have any impact on the instance itself. openstack resource provider allocation show does not show any allocations, but after a migration to a different host the allocation is back (with the correct host).

So, after determining that it is safe to delete current resource allocations, I ran openstack resource provider allocation delete on all instances that were attached to compute05.

After that I could run

openstack resource provider delete cdfa7343-1258-4b1c-9577-889972bc851c

without error.

Within a minute the nova-compute service had registered itself again, and the compute host was back in working condition. I can migrate instances to it and start instances on it.

