nova [Errno 111] ECONNREFUSED after migrating rabbitmq to a separate host

asked 2020-03-29 09:16:18 -0500

Ion42 gravatar image

updated 2020-04-12 06:31:18 -0500

Having a memory shortage on the controlnode (controller), I was forced to move the rabbit message queue service to a separate server (rabbit). The message queue on the new server shows quite many queues (150) but not as many as on the control node before (191).

The nova-conductor.log shows now connection problems during creation attempt of new instances:

2020-03-29 15:55:22.884 5502 ERROR oslo.messaging._drivers.impl_rabbit [req-bfb76a17-41c6-45bb-b397-c1c575f43c10 e685349ef4ec43cba929131ccd7b81fa 89dd119cc70a4e35b2cb6a2dafcb2d02 - default default] Connection failed: [Errno 111] ECONNREFUSED (retrying in 32.0 seconds): ConnectionRefusedError: [Errno 111] ECONNREFUSED

My understanding is that I need to change "only" the parameter "transport_url" in /etc/nova etc. ...

root@control:~# find /etc -type f -exec grep ^transport_url {} \;
transport_url = rabbit://openstack:***@rabbit
transport_url = rabbit://openstack:***@rabbit
transport_url = rabbit://openstack:***@rabbit
transport_url = rabbit://openstack:***@rabbit
root@control:~# find /etc -type f -exec grep -l ^transport_url {} \;

... but this is obviously not enough.

On the compute host I find such messages in /var/log/nova/nova-compute.log:

2020-03-29 14:05:11.973 7780 ERROR oslo_service.periodic_task [req-1692dd36-2dd8-4a69-9372-50c190ad67db - - - - -] Error during ComputeManager._sync_scheduler_instance_info: oslo_messaging.exceptions.MessageDeliveryFailure: Unable to connect to AMQP server on rabbit:5672 after inf tries: Basic.publish: (404) NOT_FOUND - no exchange 'scheduler_fanout' in vhost '/'
2020-03-29 14:05:11.973 7780 ERROR oslo_service.periodic_task Traceback (most recent call last):

The rabbitmq on the new server does not seem to touch limits

root@master:~# rabbitmqctl status | grep 'file_descriptors' -A 10                                                                                                {file_descriptors,

Any hint which config change I forgot?

Additional information requested in the - much appreciated! - comments:

root@master:~# rabbitmqctl list_vhosts
Listing vhosts
root@master:~# rabbitmqctl list_permissions -p openstack
Listing permissions in vhost "openstack"
Error: no_such_vhost: openstack

root@master:~# rabbitmqctl list_queues -p openstack | grep scheduler_fanout

Interesting - in the queue on the new server the scheduler fanout is missing!

In the queue list before, it was there:

root@control:~# cat rabbitmq_queues.before.sorted | grep scheduler_fanout
cinder-scheduler_fanout_f6e79fd5faec42b7b39d340c40203a1d        0
scheduler_fanout_7d3b075d00b54bdd89f9a482adc76c2d       0
scheduler_fanout_d03c3305050c4e249c79e9e153dea10d       0

To check the rabbitmq connection, I followed the instructions here for a check client - successfully:

root@control:~# ./ --server master --username openstack --password ****

I'll switch back to the old config in order to have more before/after information.

edit retag flag offensive close merge delete


Did you restart nova services? Regarding the message queue I would check if any ulimit is hit, compare the settings between old and new rabbit server.

eblock gravatar imageeblock ( 2020-03-30 05:00:36 -0500 )edit

Yes, restart of services and the whole control/compute servers did not help. The limis are also not reached, I've added the rabbit status output in the question.

Ion42 gravatar imageIon42 ( 2020-04-05 14:20:21 -0500 )edit

Could you share the output of:

control:~ # rabbitmqctl list_vhosts
control:~ # rabbitmqctl list_permissions -p openstack
control:~ # rabbitmqctl list_queues -p openstack | grep scheduler_fanout
eblock gravatar imageeblock ( 2020-04-06 02:00:15 -0500 )edit

I would first try to solve/debug the connection refused. Did you try to connect to rabbitmq from this host yourself f. e. with nmap or a small Python script?

Dev.Faz gravatar imageDev.Faz ( 2020-04-06 23:40:58 -0500 )edit

You might need to add the vhost "openstack" and grant permissions. I believe the deployment guide is not entirely correct or incomplete. In our older openstack (Ocata) we also have only one vhost /, in our new deployment (Train) we had to add the vhost openstack manually.

eblock gravatar imageeblock ( 2020-04-12 11:20:34 -0500 )edit