Xenial Queens broken neutron-openvswitch installation

asked 2020-05-27 17:00:13 -0500

jaywatt gravatar image

Hi!

We've been running an OpenStack environment for the last 2 and a half years with a few hiccups along the way, but mostly with little downtime. Recently we've been trying to add a new piece of hardware to the stack as a nova-compute node to provide more CPU cores and RAM to our VMs. Unfortunately, for some reason, the install is not going well.

We're running Xenial/Queens with JuJu and MaaS for deployment/provisioning. We were running Xenial/Pike until December when we upgraded. We're starting to suspect that the upgrade to Queens is what's causing the trouble as we were able to add new hardware before the upgrade. We even went as far as removing one of our existing machines that was acting as a nova-compute node and tried adding it back to the stack and it too is now exhibiting the same problems as our new hardware.

The root cause of the problems seems to be with the neutron-openvswitch application. When we install the nova-compute charm via JuJu everything seems to go smoothly up until the (automatic) installation/configuration of the subordinate neutron-openvswitch charm. While watching the logs at a certain point during the install connectivity on our OpenStack admin network (10.10.30.0/24 on eno1) is lost. We're able to force the installation to proceed a bit further by adding a second connection on eno2 (a different external network), but the loss of connectivity on eno1 remains and the compute service isn't able to communicate with the rest of the stack.

Looking at our other compute nodes in the stack that are functional, it looks like the admin network bridge (br-eno1) is not being created by the neutron-openvswitch charm. Some part of the process looks like it's taking down eno1 in preparation of creating the bridge, but then fails, leaving the machine unable to communicate on that interface with the rest of the stack.

None of our configuration has changed since the upgrade to Queens, but perhaps there is some deprecation or change to the default configuration that came along with the Pike -> Queens upgrade we are unaware of? We've read through the release notes but can't seem to find anything that would explain this behavior.

Any help would be greatly appreciated. I'm including a few segments of log files I think are relevant below but can provide anything else that might be needed. Thanks in advance!

Broken server ifconfig

eno1      Link encap:Ethernet  HWaddr FF:FF:FF:FF:FF:FF (redacted)
          inet addr:10.10.30.101  Bcast:10.10.30.255  Mask:255.255.255.0
          inet6 addr: fe80::4ed9:8fff:fec5:2e3/64 Scope:Link
          UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
          RX packets:487314 errors:0 dropped:0 overruns:0 frame:0
          TX packets:91955 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:255807482 (255.8 MB)  TX bytes:6693026 ...
(more)
edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted
0

answered 2020-08-19 15:53:47 -0500

jaywatt gravatar image

SOLVED!

It turns out that after the upgrade to Queens JuJu was handing out a bad network config to this server. In addition, the OpenVSwitch install was assigning eno1 to br-data instead of creating br-eno1 like on my other servers. The steps to resolve the problem were:

  • Remove eno1 from the br-data bridge: ovs-vsctl del-port br-data eno1
  • Copy the functional config from another working server to this servers /etc/network/interfaces file and comment out the line that reads the (busted) cloud config file from /etc/network/interface.d/50-cloud-init.cfg
  • Update the IPs in the new interfaces file to those found in ifconfig for the eno1 and eno2 interfaces
  • Reboot
  • Profit

I don't yet know exactly what caused JuJu to stop sending a proper network config after the upgrade.

My final interfaces file looked like this. Anyone else copying this file will of course have to change all of their IPs.

auto lo
iface lo inet loopback

auto lo
iface lo inet loopback
    dns-nameservers 10.10.30.99 10.244.0.66 10.244.0.67
    dns-search maas

auto eno1
iface eno1 inet manual
    mtu 1500

auto eno2
iface eno2 inet static
    address 10.189.134.103/24
    dns-nameservers 10.189.134.99 10.244.0.66 10.244.0.67
    mtu 1500

auto br-eno1
iface br-eno1 inet static
    address 10.10.30.101/24
    dns-nameservers 10.10.30.99 10.244.0.66 10.244.0.67
    gateway 10.10.30.254
    bridge_ports eno1

I found the following sites helpful when troubleshooting

  • https://www.server-world.info/en/note?os=Ubuntu_16.04&p=openstack_queens&f=1 (https://www.server-world.info/en/note...)

  • https://www.manning.com/books/openstack-in-action (https://www.manning.com/books/opensta...) (paid Ebook)

edit flag offensive delete link more

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2020-05-27 17:00:13 -0500

Seen: 40 times

Last updated: May 27