Ask Your Question
0

Compute Nodes Added After Network Creation have dead OVS tap devices

asked 2015-11-13 12:05:02 -0500

sdub gravatar image

updated 2015-11-13 13:48:51 -0500

In my deployment of openstack-icehouse, when I configure compute nodes before creating my flat network, and then restart the neutron-dhcp-agent on that compute node, I get results like this with ovs:

[root@vbit10 ~]# ovs-vsctl show
d6fa53f6-df49-47f8-ae8c-92e72eacdefc
Bridge br-vm
    Port phy-br-vm
        Interface phy-br-vm
    Port "eth2"
        Interface "eth2"
    Port br-vm
        Interface br-vm
            type: internal
Bridge br-int
    fail_mode: secure
    Port int-br-vm
        Interface int-br-vm
    Port br-int
        Interface br-int
            type: internal
    Port "tapf0048ae4-6f"
        tag: 1
        Interface "tapf0048ae4-6f"
            type: internal
ovs_version: "2.1.3"

If I then attempt to configure a compute node in the exact same way after the network is created, the OVS tap device is tagged as 4095, otherwise known as dead:

[root@vbit11 ~]# ovs-vsctl show
4f11a547-421e-49c3-ba81-9c403cab0955
Bridge br-int
    fail_mode: secure
    Port int-br-vm
        Interface int-br-vm
    Port "tapda35c485-be"
        tag: 4095
        Interface "tapda35c485-be"
            type: internal
    Port br-int
        Interface br-int
            type: internal
Bridge br-vm
    Port br-vm
        Interface br-vm
            type: internal
    Port "eth2"
        Interface "eth2"
    Port phy-br-vm
        Interface phy-br-vm
ovs_version: "2.1.3"

Any VM hosted on the broken compute node cannot be SSH'd to. You can fix this problem by recreating the subnet/network, and then recreating the ovs bridges, and restarting neutron-dhcp-agent and neutron-ovs-agent. However, I am trying to create an ansible playbook that just gets an instance of openstack-icehouse up and running with one shot (and this is the last issue I'm dealing with!). I've followed the openstack-icehouse installation guide fairly accurately aside from using a tenant network, ext-network.

I've done some diagnosing and what I've found is that the binding is failing on these devices. When neutron-server goes to bind the port, it finds the the segment is "None." It looks for the "segment" in the neutron db in the ml2_port_bindings table. If it can't find the "segment" uuid, it fails binding and is tagged as 4095. I tried to find other neutron files that modified the "models.PortBinding" table to see what initially populates the db with information, but I can't seem to find anything.

Here's the table I'm talking about: http://pastebin.com/P2z59hVp

Any help on this? Thanks!

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted
0

answered 2015-11-20 12:23:39 -0500

sdub gravatar image

The problem arised because we had the DHCP agent and the OVS agent running on the compute node (we don't have a network node). The DHCP agent on the compute node was telling neutron-server running on the controller to attempt binding before the OVS agent had an opportunity to report it's status to neutron-server.

The solution?

Start the OVS agent, wait 10 seconds, and then start the DHCP agent. There's been a bug open for this for a long time and it's being pushed off to M: https://bugs.launchpad.net/neutron/+bug/1399249 (https://bugs.launchpad.net/neutron/+b...)

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2015-11-13 12:05:02 -0500

Seen: 451 times

Last updated: Nov 20 '15