Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

nova services fails intermittently to connect to rabbitmq with timeout error

I am using Openstack Pike with RabbitMQ 3.6.5 on Erlang 18.3.4.4 as the messaging queue. When the nova services are restarted, one of the nova services (mostly nova-scheduler) will have error in the logs as shown below. It says the service cannot connect to the rabbitmq and the error appears on and on until it is restarted. Instance launch also fails due to this error. The service openstack-nova-<service> start commands exits successfully.

2018-05-31 06:19:44.737 5510 INFO nova.service [req-ba58e560-cca2-43ee-a770-5bc94bc14624 - - - - -] Starting scheduler node (version 16.1.1-1.el7) 2018-05-31 06:19:44.745 5510 DEBUG nova.service [req-ba58e560-cca2-43ee-a770-5bc94bc14624 - - - - -] Creating RPC server for service scheduler start /usr/lib/python2.7/site-packages/nova/service.py:179

2018-05-31 06:19:44.749 5510 DEBUG oslo.messaging._drivers.pool [req-ba58e560-cca2-43ee-a770-5bc94bc14624 - - - - -] Pool creating new connection create /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/pool.py:143

2018-05-31 06:19:44.753 5510 DEBUG oslo.messaging._drivers.impl_rabbit [req-ba58e560-cca2-43ee-a770-5bc94bc14624 - - - - -] [57700d60-c761-4072-b4a3-d33706864086] Connecting to AMQP server on localhost:5672 __init__ /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:597

2018-05-31 06:19:44.761 5510 DEBUG nova.scheduler.host_manager [req-ba58e560-cca2-43ee-a770-5bc94bc14624 - - - - -] Found 2 cells: 00000000-0000-0000-0000-000000000000, 4963dd3d-246d-4f9b-9d4b-0cb1ab3032ae _load_cells /usr/lib/python2.7/site-packages/nova/scheduler/host_manager.py:642 2018-05-31 06:19:44.761 5510 DEBUG nova.scheduler.host_manager [req-ba58e560-cca2-43ee-a770-5bc94bc14624 - - - - -] START:_async_init_instance_info _async_init_instance_info /usr/lib/python2.7/site-packages/nova/scheduler/host_manager.py:421

2018-05-31 06:19:44.763 5510 DEBUG oslo_concurrency.lockutils [req-ba58e560-cca2-43ee-a770-5bc94bc14624 - - - - -] Lock "00000000-0000-0000-0000-000000000000" acquired by "nova.context.get_or_set_cached_cell_and_set_connections" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:270

2018-05-31 06:19:44.764 5510 DEBUG oslo_concurrency.lockutils [req-ba58e560-cca2-43ee-a770-5bc94bc14624 - - - - -] Lock "00000000-0000-0000-0000-000000000000" released by "nova.context.get_or_set_cached_cell_and_set_connections" :: held 0.001s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:282

2018-05-31 06:19:49.761 5510 DEBUG oslo.messaging._drivers.impl_rabbit [req-ba58e560-cca2-43ee-a770-5bc94bc14624 - - - - -] [57700d60-c761-4072-b4a3-d33706864086] Received recoverable error from kombu: on_error /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:744

2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit Traceback (most recent call last): 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 494, in _ensured 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit return fun(args, *kwargs) 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 569, in __call__ 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit self.revive(self.connection.default_channel) 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 819, in default_channel 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit self.connection 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 802, in connection 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit self._connection = self._establish_connection() 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 757, in _establish_connection 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit conn = self.transport.establish_connection() 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 130, in establish_connection 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit conn.connect() 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 300, in connect 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit self.drain_events(timeout=self.connect_timeout) 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 464, in drain_events 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit return self.blocking_read(timeout) 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 468, in blocking_read 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit frame = self.transport.read_frame() 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 237, in read_frame 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit frame_header = read(7, True) 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 377, in _read 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit s = recv(n - len(rbuf)) 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 354, in recv 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit return self._recv_loop(self.fd.recv, b'', bufsize, flags) 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 348, in _recv_loop 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit self._read_trampoline() 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 319, in _read_trampoline 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit timeout_exc=socket.timeout("timed out")) 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 203, in _trampoline 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit mark_as_closed=self._mark_as_closed) 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/eventlet/hubs/__init__.py", line 162, in trampoline 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit return hub.switch() 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 294, in switch 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit return self.greenlet.switch() 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit timeout: timed out 2018-05-31 06:19:49.761 5510 ERROR oslo.messaging._drivers.impl_rabbit 2018-05-31 06:19:49.763 5510 ERROR oslo.messaging._drivers.impl_rabbit [req-ba58e560-cca2-43ee-a770-5bc94bc14624 - - - - -] [57700d60-c761-4072-b4a3-d33706864086] AMQP server on 127.0.0.1:5672 is unreachable: timed out. Trying again in 1 seconds. Client port: None: timeout: timed out

The rabbit mq logs shows frequent TCP connection closures :

=INFO REPORT==== 31-May-2018::06:39:41 === accepting AMQP connection <0.957.0> (127.0.0.1:33456 -> 127.0.0.1:5672)

=ERROR REPORT==== 31-May-2018::06:40:11 === closing AMQP connection <0.948.0> (127.0.0.1:33450 -> 127.0.0.1:5672): {handshake_timeout,frame_header}

Can anyone please shed some light on how I can further debug this please ? Thanks.