Cannot launch instances anymore due to RabbitMQ errors

asked 2020-05-27 15:08:30 -0500

awalters gravatar image

updated 2020-05-27 15:31:58 -0500

I used to be able to launch instances from this setup (Packstack single node install, Queens) with no problem. I haven't made changes to the network or anything within OpenStack in months. Now, for whatever reason, I can no longer launch instances - they get stuck in Scheduling. Looking in the nova-scheduler log, I can see rabbitmq errors, and the rabbitmq log is full of:

=ERROR REPORT==== 27-May-2020::15:58:26 ===

closing AMQP connection <0.17487.0> (<node's ip:47928="" -=""> <node's ip="">:5672):

{handshake_timeout,frame_header}

I've tried restarting the service and even full on rebooting, I've reset the guest password to guest, and blown away mnesia - none of these made any difference at all.

Looking back at earlier logs, I can see these errors occasionally were in there as far back as I have logs (January), but they used to still work sometimes. But now it's just giving an error every time.

rabbitmqctl list_connections shows a number of running connections, but the last one in there shows in the logs as initiating 4 days ago. I've rebooted and restarted numerous times since then, so that seems interesting.

edit retag flag offensive close merge delete

Comments

What about the status of the cluster : rabbitmqctl cluster_status ? And queue list rabbitmqctl list_queue ?

chalans gravatar imagechalans ( 2020-05-27 16:27:35 -0500 )edit

list_queues gives a very long list of queues, which are all at 0. Not sure if there's a specific one I should be looking at/for? scheduler.<my node="" name=""> is there. And cluster_status shows the node as running.

awalters gravatar imageawalters ( 2020-05-27 19:14:22 -0500 )edit

1 answer

Sort by ยป oldest newest most voted
0

answered 2020-05-29 14:26:19 -0500

awalters gravatar image

Well this was a fun one. This is a small test box and I wasn't running anything actively at the moment, so I didn't notice that a change that was done (not by me) to a firewall between this and the internet had broken the internet connectivity. Even though it was only trying to communicate with itself, via IP as far as I can tell, it must have been trying to do something with DNS at some point in the handshake (DNS was set to 8.8.8.8, which it couldn't reach). Restored connectivity and restarted services, and it's fine now.

edit flag offensive delete link more

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2020-05-27 15:08:30 -0500

Seen: 88 times

Last updated: May 29