Revision history [back]

click to hide/show revision 1
initial version

RabbitMQ failover issue Kilo

Hi!

I have an issue with HA RabbitMQ queue under Kilo (2015.1.1).

If a rabbit node fails, some of the services (like nova-scheduler) manage to fail over to an other node.

2015-10-19 16:47:41.741 28727 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on controller1:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
2015-10-19 16:47:42.763 28727 DEBUG amqp [-] Start from server, version: 0.9, properties: {u'information': u'Licensed under the MPL.  See http://www.rabbitmq.com/', u'product': u'RabbitMQ', u'copyright': u'Copyright (C) 2007-2013 GoPivotal, Inc.', u'capabilities': {u'exchange_exchange_bindings': True, u'connection.blocked': True, u'authentication_failure_close': True, u'basic.nack': True, u'consumer_priorities': True, u'consumer_cancel_notify': True, u'publisher_confirms': True}, u'platform': u'Erlang/OTP', u'version': u'3.2.4'}, mechanisms: [u'AMQPLAIN', u'PLAIN'], locales: [u'en_US'] _start /usr/lib/python2.7/dist-packages/amqp/connection.py:754
2015-10-19 16:47:42.780 28727 DEBUG amqp [-] Open OK! _open_ok /usr/lib/python2.7/dist-packages/amqp/connection.py:640
2015-10-19 16:47:42.782 28727 DEBUG amqp [-] using channel_id: 1 __init__ /usr/lib/python2.7/dist-packages/amqp/channel.py:80
2015-10-19 16:47:42.790 28727 DEBUG amqp [-] Channel open _open_ok /usr/lib/python2.7/dist-packages/amqp/channel.py:438
2015-10-19 16:47:43.053 28727 INFO oslo_messaging._drivers.impl_rabbit [-] Reconnected to AMQP server on controller2:5672

But on the same node, nova-api fails with the following log

2015-10-19 16:47:42.014 11763 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on controller1:5672 is unreachable: [Errno 111
] ECONNREFUSED. Trying again in 1 seconds.
2015-10-19 16:47:42.271 11763 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] 
ECONNREFUSED
2015-10-19 16:47:42.899 11764 INFO oslo_messaging._drivers.impl_rabbit [-] Reconnected to AMQP server on controller2:5672
2015-10-19 16:47:42.906 11795 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
2015-10-19 16:47:43.052 11764 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
2015-10-19 16:47:43.151 11796 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
2015-10-19 16:47:43.541 11796 INFO oslo_messaging._drivers.impl_rabbit [-] Reconnected to AMQP server on controller3:5672
2015-10-19 16:47:43.543 11763 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
2015-10-19 16:47:43.569 11795 INFO oslo_messaging._drivers.impl_rabbit [-] Reconnected to AMQP server on controller3:5672
2015-10-19 16:47:43.607 11763 INFO oslo_messaging._drivers.impl_rabbit [-] Reconnected to AMQP server on controller3:5672
2015-10-19 16:47:44.163 11795 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
2015-10-19 16:47:44.326 11764 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111]

I wonder why it happens, since these two services use the same nova.conf.

[DEFAULT]
default_log_levels=oslo.messaging=DEBUG,kombu=DEBUG
rpc_backend = rabbit
...
[oslo_messaging_rabbit]
rabbit_ha_queues = True
heartbeat_timeout_threshold = 5
heartbeat_rate = 2
rabbit_max_retries = 2
rabbit_hosts = controller1:5672,controller3:5672,controller2:5672
rabbit_use_ssl = false
rabbit_userid = openstack
rabbit_password = <...>

Relevant python libs are

root@controller2:~# pip freeze 2>&1|grep -E 'amq|kombu|oslo.m|nova'
amqp==1.4.6
amqplib==1.0.2
kombu==3.0.24
nova==2015.1.1
oslo.messaging==1.8.3
oslo.middleware==1.0.0
python-novaclient==2.22.0