Ask Your Question
0

some of concurrent Attach operations fail with Caught error: Timed out waiting for a reply to message ID XX

asked 2015-10-26 08:03:14 -0500

keisuke-ikeda gravatar image

updated 2015-11-06 07:42:51 -0500

I created a script for create,attach,detach, and delete a volume and started to kick it 10 times every two seconds. As a result, ten concurrent operation ran. Instance which volume is attached to is same for all attach operation. Six of those failed at the step attach volume. Nova-api grepped with request id is as follows. Of course, request-id is different for each request. Why this error is caused? How can I configure for these operations to succeed?
I attached - nova-api.log - my script (cinder-parallel_r1.sh(parent script), cinder-CADD_r1.sh(child)) - script log

2015-10-26 20:56:06.820 13147 DEBUG nova.api.openstack.wsgi [req-e38253b2-159f-4b9a-a1fd-4f307b208e2d 7d409352a9364606919753ca957a7b18 a2714e56a2cc48c6919fe0c3a6d71105 - - -] Action: 'create', calling method: <bound method VolumeAttachmentController.create of <nova.api.openstack.compute.contrib.volumes.VolumeAttachmentController object at 0x4a85390>>, body: {"volumeAttachment": {"device": null, "volumeId": "04a0dad0-c5f5-4acf-a25d-6888e22a9cc3"}} _process_stack /usr/lib/python2.7/site-packages/nova/api/openstack/wsgi.py:780
2015-10-26 20:56:06.821 13147 INFO nova.api.openstack.compute.contrib.volumes [req-e38253b2-159f-4b9a-a1fd-4f307b208e2d 7d409352a9364606919753ca957a7b18 a2714e56a2cc48c6919fe0c3a6d71105 - - -] Attach volume 04a0dad0-c5f5-4acf-a25d-6888e22a9cc3 to instance 29f7c799-0a68-4fee-a349-242182a8becc at None
2015-10-26 20:56:06.821 13147 DEBUG nova.compute.api [req-e38253b2-159f-4b9a-a1fd-4f307b208e2d 7d409352a9364606919753ca957a7b18 a2714e56a2cc48c6919fe0c3a6d71105 - - -] [instance: 29f7c799-0a68-4fee-a349-242182a8becc] Fetching instance by UUID get /usr/lib/python2.7/site-packages/nova/compute/api.py:1911
2015-10-26 20:57:06.991 13147 ERROR nova.api.openstack [req-e38253b2-159f-4b9a-a1fd-4f307b208e2d 7d409352a9364606919753ca957a7b18 a2714e56a2cc48c6919fe0c3a6d71105 - - -] Caught error: Timed out waiting for a reply to message ID 7a1d6f994a7545caae5cced3edd0e5e2
2015-10-26 20:57:06.993 13147 INFO nova.api.openstack [req-e38253b2-159f-4b9a-a1fd-4f307b208e2d 7d409352a9364606919753ca957a7b18 a2714e56a2cc48c6919fe0c3a6d71105 - - -] http://10.32.4.194:8774/v2/a2714e56a2cc48c6919fe0c3a6d71105/servers/29f7c799-0a68-4fee-a349-242182a8becc/os-volume_attachments returned with HTTP 500
2015-10-26 20:57:06.994 13147 DEBUG nova.api.openstack.wsgi [req-e38253b2-159f-4b9a-a1fd-4f307b208e2d 7d409352a9364606919753ca957a7b18 a2714e56a2cc48c6919fe0c3a6d71105 - - -] Returning 500 to user: The server has either erred or is incapable of performing the requested operation. __call__ /usr/lib/python2.7/site-packages/nova/api/openstack/wsgi.py:1166
2015-10-26 20:57:06.994 13147 INFO nova.osapi_compute.wsgi.server [req-e38253b2-159f-4b9a-a1fd-4f307b208e2d 7d409352a9364606919753ca957a7b18 a2714e56a2cc48c6919fe0c3a6d71105 - - -] 10.32.4.208 "POST /v2/a2714e56a2cc48c6919fe0c3a6d71105/servers/29f7c799-0a68-4fee-a349-242182a8becc/os-volume_attachments HTTP/1.1" status: 500 len: 359 time: 60.1805718`enter code here`
edit retag flag offensive close merge delete

Comments

"Timed out waiting for a reply to message ID XX"

I would think of resource shortage on the system where sender of the message is running. Typical case: Running OS in a few VMs on an overloaded PC. Your log is unreadable, by the way; ensure it is formatted correctly.

Bernd Bausch gravatar imageBernd Bausch ( 2015-10-31 01:24:58 -0500 )edit

Thank you for comment. And sorry for annoying log. I tried to format. The host KVM is 8GB memory,4 core and has one Windows2012(4GB memory,2 core) and centos6(1GB memory, 1 core). I am not sure that a server with this spec is difficult to handle 10 concurrent operation.

keisuke-ikeda gravatar imagekeisuke-ikeda ( 2015-11-06 07:52:35 -0500 )edit

1 answer

Sort by ยป oldest newest most voted
0

answered 2015-12-16 07:55:46 -0500

I think the cinder command timed out after 60 seconds due to the increased load. Try increasing the setting in cinder.conf to a value larger than 60 and see if that allows it to succeed.

/etc/cinder/cinder.conf rpc_response_timeout=60

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2015-10-26 08:03:14 -0500

Seen: 820 times

Last updated: Nov 06 '15