Ask Your Question
0

nova-compute node stuck in down state

asked 2016-06-02 03:46:10 -0500

yee379 gravatar image

Hi,

i had to hard reboot a compute node on centos7 with rdo liberty.

so when it came back up, it complained about issues with iptables (the restore didn't like -A neutron-openvswi-i22d31416-1 -m set --match-set NIPv44c9e9c39-028f-44d6-a89d- src -j RETURN statements). so i restarted iptables blank, and reinserted the rules without these NIP statements.

then i noticed that none of my qvo interfaces on br-int existed:

        Interface "qvo1f6e4987-de"
            error: "could not open network device qvo1f6e4987-de (No such device)"

hmm... sure enough, an ip a showed only my hypervisor ports.... :/

finally, when i check the hypervisor status in horizon, it shows it as being status enabled, but state down. looking through the nova logs on the resurrected compute node i see every 3 minutes:

2016-06-02 01:36:47.059 25861 ERROR oslo.messaging._drivers.impl_rabbit [req-beba7b52-6391-4666-ba56-62465cbf4aab - - - - -] AMQP server 172.23.99.199:5671 closed the connection. Check login credentials: Socket closed
2016-06-02 01:36:48.100 25861 INFO oslo.messaging._drivers.impl_rabbit [req-beba7b52-6391-4666-ba56-62465cbf4aab - - - - -] Reconnected to AMQP server on 172.23.99.199:5671
2016-06-02 01:39:48.102 25861 DEBUG oslo.messaging._drivers.impl_rabbit [req-beba7b52-6391-4666-ba56-62465cbf4aab - - - - -] Received recoverable error from kombu: on_error /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:615
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 436, in _ensured
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit     return fun(*args, **kwargs)
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 508, in __call__
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit     return fun(*args, channel=channels[0], **kwargs), channels[0]
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 664, in execute_method
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit     method()
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 999, in _publish
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit     producer.publish(msg, expiration=expiration)
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 172, in publish
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit     routing_key, mandatory, immediate, exchange, declare)
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 188, in _publish
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit     mandatory=mandatory, immediate=immediate,
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 2130, in basic_publish_confirm
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit     self.wait([(60, 80)])
2016-06-02 01:39:48.102 ...
(more)
edit retag flag offensive close merge delete

Comments

Check neutron-openvswitch-service status, which is responsible of creating bridges and connect to the tunnel.

Eduardo Gonzalez gravatar imageEduardo Gonzalez ( 2016-06-02 04:15:36 -0500 )edit

thanks Eduardo; i get the same errors from neutron-openvswitch: 2016-06-02 03:27:41.760 1928 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server 172.23.99.199:5671 closed the connection. Check login credentials: Socket closed

yee379 gravatar imageyee379 ( 2016-06-02 05:32:13 -0500 )edit

You have rabbitmq issues, check connections, auth, etc

Eduardo Gonzalez gravatar imageEduardo Gonzalez ( 2016-06-02 05:34:07 -0500 )edit

i thought so too; but i've checked all settings, ports opened, iptables etc. i have even compared the configs line by line with another hypervisor. the configs are identical to hypervisors that do work. there are also no errors on the rabbitmq server side either.

yee379 gravatar imageyee379 ( 2016-06-02 05:51:52 -0500 )edit

I guess you have already tried with telnet rabbit_ip 5671 and connects, is offtopic but ovs tunnels are created with neutron server?

Eduardo Gonzalez gravatar imageEduardo Gonzalez ( 2016-06-02 05:56:42 -0500 )edit

1 answer

Sort by ยป oldest newest most voted
0

answered 2016-06-02 08:00:58 -0500

kildarejoe gravatar image

Have you run the neutron-openvswitch-agent from the command line

/usr/bin/neutron-openvswitch-agent - should show you connecting to the rabbit queue sucessfully - or where about in the process it is having an problem - once the agent connects to the rabbit cluster and creates the channel to publish/consume messages - it should notify all other compute/neutron nodes to create the overlay networks for layer 2 communication.

The do an ovs-vsctl show - to see the veth pair interface connections and see the gre/vxlan tunnels have established.

edit flag offensive delete link more

Comments

hi: yeah, the logs look normal, i see Connecting to AMQP server on 172.23.99.199:5671, Connected to AMQP server on 172.23.99.199:5671 then after 3 minutes `AMQP server 172.23.99.199:5671 closed the connection. Check login credentials: Socket closed, Reconnected to AMQP server on 172.23.99.199:5671

yee379 gravatar imageyee379 ( 2016-06-02 13:16:51 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2016-06-02 03:46:10 -0500

Seen: 2,449 times

Last updated: Jun 02 '16