Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

nova-compute node stuck in down state

Hi,

i had to hard reboot a compute node on centos7 with rdo liberty.

so when it came back up, it complained about issues with iptables (the restore didn't like -A neutron-openvswi-i22d31416-1 -m set --match-set NIPv44c9e9c39-028f-44d6-a89d- src -j RETURN statements). so i restarted iptables blank, and reinserted the rules without these NIP statements.

then i noticed that none of my qvo interfaces on br-int existed:

        Interface "qvo1f6e4987-de"
            error: "could not open network device qvo1f6e4987-de (No such device)"

hmm... sure enough, an ip a showed only my hypervisor ports.... :/

finally, when i check the hypervisor status in horizon, it shows it as being status enabled, but state down. looking through the nova logs on the resurrected compute node i see every 3 minutes:

2016-06-02 01:36:47.059 25861 ERROR oslo.messaging._drivers.impl_rabbit [req-beba7b52-6391-4666-ba56-62465cbf4aab - - - - -] AMQP server 172.23.99.199:5671 closed the connection. Check login credentials: Socket closed
2016-06-02 01:36:48.100 25861 INFO oslo.messaging._drivers.impl_rabbit [req-beba7b52-6391-4666-ba56-62465cbf4aab - - - - -] Reconnected to AMQP server on 172.23.99.199:5671
2016-06-02 01:39:48.102 25861 DEBUG oslo.messaging._drivers.impl_rabbit [req-beba7b52-6391-4666-ba56-62465cbf4aab - - - - -] Received recoverable error from kombu: on_error /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:615
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 436, in _ensured
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit     return fun(*args, **kwargs)
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 508, in __call__
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit     return fun(*args, channel=channels[0], **kwargs), channels[0]
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 664, in execute_method
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit     method()
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 999, in _publish
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit     producer.publish(msg, expiration=expiration)
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 172, in publish
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit     routing_key, mandatory, immediate, exchange, declare)
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 188, in _publish
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit     mandatory=mandatory, immediate=immediate,
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 2130, in basic_publish_confirm
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit     self.wait([(60, 80)])
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 67, in wait
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit     self.channel_id, allowed_methods)
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 240, in _wait_method
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit     self.method_reader.read_method()
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 189, in read_method
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit     raise m
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit IOError: Socket closed
2016-06-02 01:39:48.102 25861 ERROR oslo.messaging._drivers.impl_rabbit
2016-06-02 01:39:48.104 25861 ERROR oslo.messaging._drivers.impl_rabbit [req-beba7b52-6391-4666-ba56-62465cbf4aab - - - - -] AMQP server 172.23.99.199:5671 closed the connection. Check login credentials: Socket closed
2016-06-02 01:39:49.144 25861 INFO oslo.messaging._drivers.impl_rabbit [req-beba7b52-6391-4666-ba56-62465cbf4aab - - - - -] Reconnected to AMQP server on 172.23.99.199:5671

i'm pretty sure the network side is good; rabbitmq port is open and accessible and the login details are good.

i've tried just toggling the compute service but that doesn't help. any ideas?