Neutron router hosted on rebuilt controller fails to route as expected.
I have a 6 node (5 compute/storage; 1 controller) RDO Kilo openstack cluster installed using a packstack answer file. The cluster behaved as expected until power supply on controller failed. I put another machine in place as a controller, using the same CentOS 7 base OS and the same packstack answer file. The mysql database from the failed controller has been restored from backups. Glance images were NFS mounted, and cinder storage is out on the compute nodes.
Everything except neutron appears to be working. I can start and stop instances, log in to instances via the web console, etc. Neutron shows my old security groups and the instances have the correct security groups applied. The neutron agents appear to all be running (see below). The problem is that my VMs can't access the outside world (ping and ssh), and the outside world can't access the VMs (ping, ssh, http).
[root@plume recovery(keystone_admin)]# neutron agent-list
+--------------------+-----------------------+-------+----------------+---------------------------+
| agent_type | host | alive | admin_state_up | binary |
+--------------------+-----------------------+-------+----------------+---------------------------+
| Open vSwitch agent | compute-0-0.plume | :-) | True | neutron-openvswitch-agent |
| Open vSwitch agent | plume.usfs-i2.umt.edu | :-) | True | neutron-openvswitch-agent |
| Open vSwitch agent | compute-0-3.plume | :-) | True | neutron-openvswitch-agent |
| Open vSwitch agent | compute-0-4.plume | :-) | True | neutron-openvswitch-agent |
| Open vSwitch agent | compute-0-1.plume | :-) | True | neutron-openvswitch-agent |
| L3 agent | plume.usfs-i2.umt.edu | :-) | True | neutron-l3-agent |
| DHCP agent | plume.usfs-i2.umt.edu | :-) | True | neutron-dhcp-agent |
| Metadata agent | plume.usfs-i2.umt.edu | :-) | True | neutron-metadata-agent |
| Open vSwitch agent | compute-0-2.plume | :-) | True | neutron-openvswitch-agent |
+--------------------+-----------------------+-------+----------------+---------------------------+
Using the suggested tcpdump command from the Openstack Operations manual on the controller node, which is the only node having a connection to the external network, I get the following results as I ping from various places.
The tcpdump command is:
tcpdump -i any -n -v \ 'icmp[icmptype] = icmp-echoreply or icmp[icmptype] = icmp-echo'
- Ping from external machine to VM: "Destination host unreachable"; no output from tcpdump
- Ping from external machine to controller node: timely ping replies; tcpdump logs every ping and reply
- Ping from VM to router's internal interface: 18 ping replies; tcpdump logs only the first request, not the reply and not subsequent requests
- Ping from VM to external machine: "Destination host unreachable"; no output from tcpdump
- Ping from VM to router's external interface: timely ping replies; no output from tcpdump
- Ping from external machine to router's external interface: "Destination host unreachable"; no output from tcpdump
- Ping from controller node to router's external interface: "Destination host unreachable"; no output from tcpdump
Can anyone help me figure out why neutron isn't routing as it should? Was it maintaining some state somewhere other than the database and I just didn't transfer it to the replacement controller?