Ask Your Question
1

instance failed to create due to block device setup timeout

asked 2015-04-16 12:43:43 -0500

brussels gravatar image

updated 2015-04-18 13:33:42 -0500

SGPJ gravatar image

Hi, Im using Icehouse with Ceph over 1G network interfaces. When I try to create instance with some big volume on RBD storage backend it fails with errors in log

2015-04-16 20:32:26.820 37847 ERROR nova.compute.manager [req-c6e71f84-41bb-4e88-acf8-ea0e85e5473f 7f14a7553320496da7a577966ab3b809 6901ba9f2a134fddae41aa8ee0da7faf] [instance: e0fb3b5b-557b-4015-bab3-e95b9e2f2b58] Instance failed block device setup
2015-04-16 20:32:26.820 37847 TRACE nova.compute.manager [instance: e0fb3b5b-557b-4015-bab3-e95b9e2f2b58] Traceback (most recent call last):
2015-04-16 20:32:26.820 37847 TRACE nova.compute.manager [instance: e0fb3b5b-557b-4015-bab3-e95b9e2f2b58]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1708, in _prep_block_device
2015-04-16 20:32:26.820 37847 TRACE nova.compute.manager [instance: e0fb3b5b-557b-4015-bab3-e95b9e2f2b58]     self.driver, self._await_block_device_map_created))
2015-04-16 20:32:26.820 37847 TRACE nova.compute.manager [instance: e0fb3b5b-557b-4015-bab3-e95b9e2f2b58]   File "/usr/lib/python2.7/dist-packages/nova/virt/block_device.py", line 378, in attach_block_devices
2015-04-16 20:32:26.820 37847 TRACE nova.compute.manager [instance: e0fb3b5b-557b-4015-bab3-e95b9e2f2b58]     map(_log_and_attach, block_device_mapping)
2015-04-16 20:32:26.820 37847 TRACE nova.compute.manager [instance: e0fb3b5b-557b-4015-bab3-e95b9e2f2b58]   File "/usr/lib/python2.7/dist-packages/nova/virt/block_device.py", line 376, in _log_and_attach
2015-04-16 20:32:26.820 37847 TRACE nova.compute.manager [instance: e0fb3b5b-557b-4015-bab3-e95b9e2f2b58]     bdm.attach(*attach_args, **attach_kwargs)
2015-04-16 20:32:26.820 37847 TRACE nova.compute.manager [instance: e0fb3b5b-557b-4015-bab3-e95b9e2f2b58]   File "/usr/lib/python2.7/dist-packages/nova/virt/block_device.py", line 328, in attach
2015-04-16 20:32:26.820 37847 TRACE nova.compute.manager [instance: e0fb3b5b-557b-4015-bab3-e95b9e2f2b58]     wait_func(context, vol['id'])
2015-04-16 20:32:26.820 37847 TRACE nova.compute.manager [instance: e0fb3b5b-557b-4015-bab3-e95b9e2f2b58]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1163, in _await_block_device_map_created
2015-04-16 20:32:26.820 37847 TRACE nova.compute.manager [instance: e0fb3b5b-557b-4015-bab3-e95b9e2f2b58]     attempts=attempts)
2015-04-16 20:32:26.820 37847 TRACE nova.compute.manager [instance: e0fb3b5b-557b-4015-bab3-e95b9e2f2b58] VolumeNotCreated: Volume 074bc092-b674-4254-b546-b23725137e62 did not finish being created even after we waited 203 seconds or 180 attempts.

I see creating block device in Ceph takes much more time then Nova waits. How can I tune this timeout and attempts number in Nova?

edit retag flag offensive close merge delete

Comments

what's the current status of your ceph cluster? 'ceph -s' , 'rados df' and 'ceph osd tree' output could possibly help.

omar-munoz gravatar imageomar-munoz ( 2015-04-16 19:29:45 -0500 )edit

Ceph is ok and as I said problem is only with time it takes to download large images to/from ceph. This issue is connected to known bug #1332382. Sad to know its fixed only in latest versions and I need to manually fix this value in Python code now on all my compute nodes.

brussels gravatar imagebrussels ( 2015-04-17 03:15:32 -0500 )edit

5 answers

Sort by ยป oldest newest most voted
0

answered 2017-03-05 20:49:21 -0500

updated 2017-03-05 20:49:37 -0500

Hi all, I tried to change the value in nova.conf of nova-controller node but it dosent work in icehouse version. The only way to fix it in my env is to change the value of the /usr/lib/python2.7/dist-packages/nova/compute/manager.py

if anyone has idea about it..kindly help to update.

edit flag offensive delete link more
0

answered 2017-03-06 03:33:31 -0500

omkar_telee gravatar image

This is not intended answer. So no downvoting plz. Some times, even if instance creation fails, the volume is created successfully. You can use that volume to boot new instance.

Happens in my case mostly when I boot instance of size more than 100GB

edit flag offensive delete link more
0

answered 2016-11-01 00:45:53 -0500

jiasir gravatar image

updated 2016-11-01 00:48:46 -0500

Increase the block device mapping timeout:

Blockquote block_device_allocate_retries = 60 (default) to 300

Blockquote block_device_allocate_retries_interval = 3(default) to 10

Blockquote block_device_creation_timeout = 10(default) to 300

edit flag offensive delete link more
0

answered 2015-04-18 12:11:12 -0500

dodi gravatar image

updated 2015-04-18 13:34:05 -0500

SGPJ gravatar image

You should use rbd for glance as well and enable the copy-on-write feature in /etc/glance/glance-api.conf

change from False to True:

show_image_direct_url = True

restart the glance services and give it a go. If you don't it takes long as the cinder does the qemu-img convert from any type to raw which adds up to the deployment time.

edit flag offensive delete link more
0

answered 2015-04-18 19:08:55 -0500

You can and should configure block device timeout on nova-compute.

In novative.conf on compute:

block_device_allocate_retries=600
block_device_allocate_retries_interval=1

The above setting will force nova to wait up to 600 seconds checking every second.

The default retries is set to 180 I think. Set it higher if needed than 600 if you want to support larger images as they can take more time to copy.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

2 followers

Stats

Asked: 2015-04-16 12:43:43 -0500

Seen: 7,491 times

Last updated: Mar 06 '17