nova-compute agent hangs in futex call

asked 2015-12-09 10:11:36 -0600

asked 2015-12-09 10:11:36 -0600

Hello, I'm running Liberty release on Ubuntu 14.04 and I had to recreate the nova database. After I recreated it , using nova-manage sync db , All the nova-compute agents are stuck in this syscall:

futex(0x2dd2914, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 25586585, {1449677025, 974868766}, ffffffff) = -1 ETIMEDOUT (Connection timed out)

All the other nova services running on the controllers are working fine. Has someone else had such an issue ? The debug messages in /var/log/nova/nova-compute.log look like these:

2015-12-09 10:52:32.240 13748 ERROR nova.compute.manager [req-39f672ac-46cb-4ca5-a81e-01ef90fd06ed - - - - -] No compute node record for host xxxx 2015-12-09 10:52:32.243 13748 WARNING nova.compute.monitors [req-39f672ac-46cb-4ca5-a81e-01ef90fd06ed - - - - -] Excluding nova.compute.monitors.cpu monitor virt_driver. Not in the list of enabled monitors (CONF.compute_monitors). 2015-12-09 10:52:32.244 13748 INFO nova.compute.resource_tracker [req-39f672ac-46cb-4ca5-a81e-01ef90fd06ed - - - - -] Auditing locally available compute resources for node xxxx

1 answer

answered 2015-12-10 08:31:22 -0600

answered 2015-12-10 08:31:22 -0600

This happened because the nova.compute.resource_tracker component couldn't connect to ceph in order to report the available storage space back to controller. I recently removed a ceph pool that I used for nova ephemeral storage and for an unknown reason I can't connect with the same ceph user anymore. After I created another user in ceph and regenerated the secret for libvirt and added it to the nova.conf , then all started to work again.

Asked: 2015-12-09 10:11:36 -0600

