Ask Your Question
0

Cinder RabbitMQ timeout problem with multiple rabbit hosts

asked 2015-07-15 02:50:52 -0600

zsolt-krenak gravatar image

updated 2015-07-15 02:52:52 -0600

Hi All!

I've got 3 HA controller setup where cinder-volume is Active/Passive mode with Corosync/Pacemaker and the RabbitMQ is clustered with mirrored queues. When I shutdown the node which cinder-volume actually connecting to on AMQP, cinder doesn't switch to another rabbit host only after exactly 15 minutes. This seems to be happening as well to cinder-scheduler, but no other openstack service. Every other service like neutron or nova switch to another rabbit host after a few timeouts. I could use any help on this, thanks in advance!

My rabbit config:

[oslo_messaging_rabbit]

rabbit_ha_queues = True
rabbit_hosts = 192.168.56.20:5672,192.168.56.21:5672,192.168.56.22:5672
rabbit_userid = openstack
rabbit_password = verysecurepassword

Here's the log part where the 15 minutes delay is seen.

2015-07-14 17:37:38.601 6697 DEBUG oslo_messaging._drivers.impl_rabbit [-] Received recoverable error from kombu: on_error /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/impl_rabbit.py:783
2015-07-14 17:37:38.601 6697 TRACE oslo_messaging._drivers.impl_rabbit Traceback (most recent call last):
2015-07-14 17:37:38.601 6697 TRACE oslo_messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 436, in _ensured
2015-07-14 17:37:38.601 6697 TRACE oslo_messaging._drivers.impl_rabbit     return fun(*args, **kwargs)
2015-07-14 17:37:38.601 6697 TRACE oslo_messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 508, in __call__
2015-07-14 17:37:38.601 6697 TRACE oslo_messaging._drivers.impl_rabbit     return fun(*args, channel=channels[0], **kwargs), channels[0]
2015-07-14 17:37:38.601 6697 TRACE oslo_messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/impl_rabbit.py", line 832, in execute_method
2015-07-14 17:37:38.601 6697 TRACE oslo_messaging._drivers.impl_rabbit     method()
2015-07-14 17:37:38.601 6697 TRACE oslo_messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/impl_rabbit.py", line 1025, in _consume
2015-07-14 17:37:38.601 6697 TRACE oslo_messaging._drivers.impl_rabbit     rate=self.driver_conf.heartbeat_rate)
2015-07-14 17:37:38.601 6697 TRACE oslo_messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 264, in heartbeat_check
2015-07-14 17:37:38.601 6697 TRACE oslo_messaging._drivers.impl_rabbit     return self.transport.heartbeat_check(self.connection, rate=rate)
2015-07-14 17:37:38.601 6697 TRACE oslo_messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqp.py", line 131, in heartbeat_check
2015-07-14 17:37:38.601 6697 TRACE oslo_messaging._drivers.impl_rabbit     return connection.heartbeat_tick(rate=rate)
2015-07-14 17:37:38.601 6697 TRACE oslo_messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/dist-packages/amqp/connection.py", line 914, in heartbeat_tick
2015-07-14 17:37:38.601 6697 TRACE oslo_messaging._drivers.impl_rabbit     raise ConnectionForced('Too many heartbeats missed')
2015-07-14 17:37:38.601 6697 TRACE oslo_messaging._drivers.impl_rabbit ConnectionForced: Too many heartbeats missed
2015-07-14 17:37:38.601 6697 TRACE oslo_messaging._drivers.impl_rabbit 
2015-07-14 17:37:38.604 6697 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on 192.168.56.20:5672 is unreachable: Too many heartbeats missed. Trying again in 1 seconds.
2015-07-14 17:37:40.603 ...
(more)
edit retag flag offensive close merge delete

Comments

Please go through this thread ,check if that helps to resolve your issue.

Also, I am Not sure on 15 minutes timeout issue, may be you need to check cinder.conf file if some default timeout is specified.

sunnyarora gravatar imagesunnyarora ( 2015-07-15 19:24:32 -0600 )edit

Thanks for the comment, I didn't found a solution in that thread sadly

zsolt-krenak gravatar imagezsolt-krenak ( 2015-07-16 01:52:46 -0600 )edit

2 answers

Sort by ยป oldest newest most voted
0

answered 2015-10-20 09:43:24 -0600

Attila Szlovencsak gravatar image

Hi!

Setting kombu_reconnect_delay=3.0 seems to solve this problem.

https://ask.openstack.org/en/question...

edit flag offensive delete link more
0

answered 2015-07-23 06:47:11 -0600

zsolt-krenak gravatar image

Still couldn't find a solution for this problem. As a workaround I configured cinder-volume to always connect RabbitMQ on localhost, so the problem causing scenario won't happen ever. Still it would be good to find a proper solution.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2015-07-15 02:50:52 -0600

Seen: 2,366 times

Last updated: Oct 20 '15