RabbitMQ failover issue Kilo
Hi!
I have an issue with HA RabbitMQ queue under Kilo (2015.1.1).
If a rabbit node fails, some of the services (like nova-scheduler) manage to fail over to an other node.
2015-10-19 16:47:41.741 28727 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on controller1:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
2015-10-19 16:47:42.763 28727 DEBUG amqp [-] Start from server, version: 0.9, properties: {u'information': u'Licensed under the MPL. See http://www.rabbitmq.com/', u'product': u'RabbitMQ', u'copyright': u'Copyright (C) 2007-2013 GoPivotal, Inc.', u'capabilities': {u'exchange_exchange_bindings': True, u'connection.blocked': True, u'authentication_failure_close': True, u'basic.nack': True, u'consumer_priorities': True, u'consumer_cancel_notify': True, u'publisher_confirms': True}, u'platform': u'Erlang/OTP', u'version': u'3.2.4'}, mechanisms: [u'AMQPLAIN', u'PLAIN'], locales: [u'en_US'] _start /usr/lib/python2.7/dist-packages/amqp/connection.py:754
2015-10-19 16:47:42.780 28727 DEBUG amqp [-] Open OK! _open_ok /usr/lib/python2.7/dist-packages/amqp/connection.py:640
2015-10-19 16:47:42.782 28727 DEBUG amqp [-] using channel_id: 1 __init__ /usr/lib/python2.7/dist-packages/amqp/channel.py:80
2015-10-19 16:47:42.790 28727 DEBUG amqp [-] Channel open _open_ok /usr/lib/python2.7/dist-packages/amqp/channel.py:438
2015-10-19 16:47:43.053 28727 INFO oslo_messaging._drivers.impl_rabbit [-] Reconnected to AMQP server on controller2:5672
But on the same node, nova-api fails with the following log
2015-10-19 16:47:42.014 11763 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on controller1:5672 is unreachable: [Errno 111
] ECONNREFUSED. Trying again in 1 seconds.
2015-10-19 16:47:42.271 11763 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111]
ECONNREFUSED
2015-10-19 16:47:42.899 11764 INFO oslo_messaging._drivers.impl_rabbit [-] Reconnected to AMQP server on controller2:5672
2015-10-19 16:47:42.906 11795 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
2015-10-19 16:47:43.052 11764 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
2015-10-19 16:47:43.151 11796 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
2015-10-19 16:47:43.541 11796 INFO oslo_messaging._drivers.impl_rabbit [-] Reconnected to AMQP server on controller3:5672
2015-10-19 16:47:43.543 11763 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
2015-10-19 16:47:43.569 11795 INFO oslo_messaging._drivers.impl_rabbit [-] Reconnected to AMQP server on controller3:5672
2015-10-19 16:47:43.607 11763 INFO oslo_messaging._drivers.impl_rabbit [-] Reconnected to AMQP server on controller3:5672
2015-10-19 16:47:44.163 11795 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
2015-10-19 16:47:44.326 11764 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111]
I wonder why it happens, since these two services use the same nova.conf ...