Instances are not reachable after restarting one of the servers

asked 2017-02-13 15:46:51 -0500

ampena gravatar image

updated 2017-02-14 07:19:11 -0500

After restarting one of the servers while doing some physical tests the instances are not reachable, not even with ping. I accessed to one of them via the OpenStack web tool console and realized that they are not receiving any configuration from OpenStack and as a consequence they can not reach the gateway but they can ping each other because are in the same LAN.

This solution of OpenStack was installed with the Ubuntu autopilot guide from the home web page.

Using juju GUI I realized that the applications that are installed in the server I restarted are the next ones: Neutron-gateway Base-machine Juju-gui Ceph-osd Ceph-radosgw Ceilometer Neutron-api Keystone Rabbitmq-sever

I found there are two services that are not running, neutron-openvswitch-agent and neutron-vpn-agent. Next are the logs.

for neutron-openvswitch-agent:

    2017-02-08 18:23:25.312 137224 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [-] Failed reporting state!
2017-02-08 18:23:25.312 137224 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent Traceback (most recent call last):
2017-02-08 18:23:25.312 137224 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 322, in _report_state
2017-02-08 18:23:25.312 137224 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     True)
2017-02-08 18:23:25.312 137224 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/dist-packages/neutron/agent/rpc.py", line 87, in report_state
2017-02-08 18:23:25.312 137224 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     return method(context, 'report_state', **kwargs)
2017-02-08 18:23:25.312 137224 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call
2017-02-08 18:23:25.312 137224 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     retry=self.retry)
2017-02-08 18:23:25.312 137224 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in _send
2017-02-08 18:23:25.312 137224 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     timeout=timeout, retry=retry)
2017-02-08 18:23:25.312 137224 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 470, in send
2017-02-08 18:23:25.312 137224 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     retry=retry)
2017-02-08 18:23:25.312 137224 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 459, in _send
2017-02-08 18:23:25.312 137224 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     result = self._waiter.wait(msg_id, timeout)
2017-02-08 18:23:25.312 137224 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 342, in wait
2017-02-08 18:23:25.312 137224 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     message = self.waiters.get(msg_id, timeout=timeout)
2017-02-08 18:23:25.312 137224 ERROR neutron.plugins.ml2 ...
(more)
edit retag flag offensive close merge delete

Comments

Both services don't get a reply from the message queue. Can be a networking problem or timing (NTP server). VPN also can't access the DB at 10.232.213.67. I'd check the networking first; try reaching DB and MQ manually.

Bernd Bausch gravatar imageBernd Bausch ( 2017-02-14 05:29:09 -0500 )edit

I tried reaching DB and MQ, and they are reachable. Both server are synchronized also. The problem may be another.

ampena gravatar imageampena ( 2017-02-14 07:10:21 -0500 )edit

AMQP server on 10.232.213.58:5672 is unreachable: [Errno 111] ECONNREFUSED is quite explicit. You need to find out why. So your services run in containers? Perhaps container networking is a different cup of tea than normal networking.

Bernd Bausch gravatar imageBernd Bausch ( 2017-02-14 08:05:40 -0500 )edit

I have a group of servers and all the applications are distributed there. In the installation process many containers were created automatically and everything was working before I restarted one of the server. I can see all the applications distributed in the containers using juju GUI.

ampena gravatar imageampena ( 2017-02-14 08:35:56 -0500 )edit