RabbitMQ debugging

asked 2019-11-05 12:36:06 -0600

dabovard gravatar image

Hello everyone,

So I deployed the Kolla-Ansible Deployment method and was able to get OpenStack up and running. (3 nodes, controller, compute, neutron)

All of my services (as far as I'm aware of right now) are working except for Nova. I was able to access the dashboard and create a router and network, upload an image, and create a security group. However, as soon as I attempted to launch an instance, I got an error.

To me it's very bizarre because I have verified that rabbitMQ established connections with each node over different ports for the OS services, but this particular "launch instance" transaction is failing.

My issue is don't know how to debug rabbitMQ and get to the root of the issue. When I inspect the log files they give MessagingTimeout errors (associated with rabbitMQ) between the controller node (where rabbitMQ is installed) and the compute node when I try to launch an instance. There aren't any error indicators in my rabbitMQ logs. When I look at the nova-compute logs on the compute node, it's never receiving the request. The compute node is never getting the request from the controller, and eventually the controller times out.

Have any of you ever had similar issues with rabbitMQ? If so, how did you tackle/debug it?

OpenStack says the "MessagingTimeout" error is when the process gets stuck from any of the following steps 2->5:

  1. Client -> request -> rabbitMQ
  2. rabbitMQ -> request -> Server
  3. Server processes request and produces response (in this case launching an instance)
  4. Server -> response -> rabbitMQ
  5. rabbitMQ -> response -> Client

In my circumstance the client is the controller, and the server is the compute node (I believe), and it seems it's failing somewhere in step 2. It's not step 1 because then it wouldn't throw this error, and it's not step 3 because the compute node never gets the request...

Any suggestions/help/resources are greatly appreciated!

edit retag flag offensive close merge delete

Comments

Could be a timing problem. Message queues require accurate timing; if your servers' clocks are not sufficiently synchronized, messages can get ignored.

Bernd Bausch gravatar imageBernd Bausch ( 2019-11-05 18:27:37 -0600 )edit