instance failed to create due to block device setup timeout
Hi, Im using Icehouse with Ceph over 1G network interfaces. When I try to create instance with some big volume on RBD storage backend it fails with errors in log
2015-04-16 20:32:26.820 37847 ERROR nova.compute.manager [req-c6e71f84-41bb-4e88-acf8-ea0e85e5473f 7f14a7553320496da7a577966ab3b809 6901ba9f2a134fddae41aa8ee0da7faf] [instance: e0fb3b5b-557b-4015-bab3-e95b9e2f2b58] Instance failed block device setup
2015-04-16 20:32:26.820 37847 TRACE nova.compute.manager [instance: e0fb3b5b-557b-4015-bab3-e95b9e2f2b58] Traceback (most recent call last):
2015-04-16 20:32:26.820 37847 TRACE nova.compute.manager [instance: e0fb3b5b-557b-4015-bab3-e95b9e2f2b58] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1708, in _prep_block_device
2015-04-16 20:32:26.820 37847 TRACE nova.compute.manager [instance: e0fb3b5b-557b-4015-bab3-e95b9e2f2b58] self.driver, self._await_block_device_map_created))
2015-04-16 20:32:26.820 37847 TRACE nova.compute.manager [instance: e0fb3b5b-557b-4015-bab3-e95b9e2f2b58] File "/usr/lib/python2.7/dist-packages/nova/virt/block_device.py", line 378, in attach_block_devices
2015-04-16 20:32:26.820 37847 TRACE nova.compute.manager [instance: e0fb3b5b-557b-4015-bab3-e95b9e2f2b58] map(_log_and_attach, block_device_mapping)
2015-04-16 20:32:26.820 37847 TRACE nova.compute.manager [instance: e0fb3b5b-557b-4015-bab3-e95b9e2f2b58] File "/usr/lib/python2.7/dist-packages/nova/virt/block_device.py", line 376, in _log_and_attach
2015-04-16 20:32:26.820 37847 TRACE nova.compute.manager [instance: e0fb3b5b-557b-4015-bab3-e95b9e2f2b58] bdm.attach(*attach_args, **attach_kwargs)
2015-04-16 20:32:26.820 37847 TRACE nova.compute.manager [instance: e0fb3b5b-557b-4015-bab3-e95b9e2f2b58] File "/usr/lib/python2.7/dist-packages/nova/virt/block_device.py", line 328, in attach
2015-04-16 20:32:26.820 37847 TRACE nova.compute.manager [instance: e0fb3b5b-557b-4015-bab3-e95b9e2f2b58] wait_func(context, vol['id'])
2015-04-16 20:32:26.820 37847 TRACE nova.compute.manager [instance: e0fb3b5b-557b-4015-bab3-e95b9e2f2b58] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1163, in _await_block_device_map_created
2015-04-16 20:32:26.820 37847 TRACE nova.compute.manager [instance: e0fb3b5b-557b-4015-bab3-e95b9e2f2b58] attempts=attempts)
2015-04-16 20:32:26.820 37847 TRACE nova.compute.manager [instance: e0fb3b5b-557b-4015-bab3-e95b9e2f2b58] VolumeNotCreated: Volume 074bc092-b674-4254-b546-b23725137e62 did not finish being created even after we waited 203 seconds or 180 attempts.
I see creating block device in Ceph takes much more time then Nova waits. How can I tune this timeout and attempts number in Nova?
what's the current status of your ceph cluster? 'ceph -s' , 'rados df' and 'ceph osd tree' output could possibly help.
Ceph is ok and as I said problem is only with time it takes to download large images to/from ceph. This issue is connected to known bug #1332382. Sad to know its fixed only in latest versions and I need to manually fix this value in Python code now on all my compute nodes.