How do I debug OVS/VXLAN tenant network issues?

asked 2019-08-13 15:52:48 -0500

rlr gravatar image

updated 2019-08-15 16:08:58 -0500

I'm working through a Proof of Concept with Stein. I have two physical servers configured as 1) a controller/network/compute (host=poc2) and 2) just a compute (host=poc1).

There are two physical Ethernet ports on each server, and I have successfully set up a provider network on an isolated flat LAN ( that grants VM instances access. My OpenStack management traffic is on the other LAN. I'm using openvswitch for the ML2 driver. Connectivity passes validation, with 4 VM instances on either compute node (2 each) able to ping each other in a full mesh.

The next step is to create an overlay tenant network using VXLAN over this flat provider network. I appear to have this configured correctly, as I see the tap points set up, and the ovs-vsctl command shows the br-tun with appropriate ports. Additionally, co-hosted VM instances can exchange ping traffic via their tap ports (on poc1 or on poc2), but any attempt to use the overlay/tenant network to ping between servers (poc1 and poc2) fails.

I do have the L2population driver loaded, which I know is supposed to handle the ARP updates. I suspect something is wrong with this, but It is certainly possible something else is going on.

How do I attack this problem? I can't seem to find any log files that show ARP exchanges or L2pop activities.

Thanks for any guidance. Relevant config follows...

An ARP capture via tcpdump on the controller's physical NIC (em2). Note that no messages come into the tunnel bridge.  In this case, the tenant network is the IP address space, and the provider network is > [no cksum] VXLAN, flags [I] (0x08), vni 83
fa:16:3e:d5:bb:0d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has tell, length 28

        My openvswitch agent configuration:




        My ml2_conf.ini:






    bridge_mappings = provider:br-provider
    integration_bridge = br-int
    tenant_network_type = vxlan

        My poc2 (controller/network/compute) server with two VM instances, and em2 as the physical NIC to the provider network:

        [root@poc2 ~(keystone_demo)]# ovs-vsctl show
            Manager "ptcp:6640:"
                is_connected: true
            Bridge br-int
                Controller "tcp:"
                    is_connected: true
                fail_mode: secure
                Port "tap6c607995-02"
                    tag: 1
                    Interface "tap6c607995-02"
                        type: internal
                Port "ovn-89320c-0"
                    Interface ...
edit retag flag offensive close merge delete


I don't know if it's relevant, but in your openvswitch configuration you use a misspelled enable_tunneling (this option is deprecated anyway), and also misspell tunnel_types in one place.

I see a reference to Geneve in br-int. If only for clarity, I would remove it from ml2_conf.ini.

Bernd Bausch gravatar imageBernd Bausch ( 2019-08-13 19:22:55 -0500 )edit

Check whether the poc1 and poc2 configs are identical. Feel free to add the other config files to the question, in particular ml2_conf.ini.

Other than that, my approach would be tracing packets and find out where they get lost. This is tedious, I know.

Bernd Bausch gravatar imageBernd Bausch ( 2019-08-13 19:25:06 -0500 )edit

Thanks Bernd. I've updated my question to include ml2_config.ini (I don't have enough ask.openstack points yet to upload files). Thanks for pointing out the misspelling. I'll make the fix and report back.

rlr gravatar imagerlr ( 2019-08-14 08:06:18 -0500 )edit

Using tcpdump on both poc1 and poc2, I can see ARP requests for IP addresses using tunneling (the VXLAN tenant network) are not getting responses. I'll be spending some time figuring out how this is intended to work, but any advise to get me moving down the right path would be appreciated.

rlr gravatar imagerlr ( 2019-08-14 14:14:54 -0500 )edit

How Neutron blocks access to MAC addresses is an area that I don't know well. I thought that ARPs were unnecessary thanks to the L2population driver, but perhaps that driver doesn't work because ARPs don't get through?

You say there are no responses - do you see them arrive on the other side?

Bernd Bausch gravatar imageBernd Bausch ( 2019-08-15 02:35:58 -0500 )edit