How do I debug OVS/VXLAN tenant network issues?

asked 2019-08-13 15:52:48 -0500

rlr gravatar image

updated 2019-08-15 16:08:58 -0500

I'm working through a Proof of Concept with Stein. I have two physical servers configured as 1) a controller/network/compute (host=poc2) and 2) just a compute (host=poc1).

There are two physical Ethernet ports on each server, and I have successfully set up a provider network on an isolated flat LAN (192.168.0.0/24) that grants VM instances access. My OpenStack management traffic is on the other LAN. I'm using openvswitch for the ML2 driver. Connectivity passes validation, with 4 VM instances on either compute node (2 each) able to ping each other in a full mesh.

The next step is to create an overlay tenant network using VXLAN over this flat provider network. I appear to have this configured correctly, as I see the tap points set up, and the ovs-vsctl command shows the br-tun with appropriate ports. Additionally, co-hosted VM instances can exchange ping traffic via their tap ports (on poc1 or on poc2), but any attempt to use the overlay/tenant network to ping between servers (poc1 and poc2) fails.

I do have the L2population driver loaded, which I know is supposed to handle the ARP updates. I suspect something is wrong with this, but It is certainly possible something else is going on.

How do I attack this problem? I can't seem to find any log files that show ARP exchanges or L2pop activities.

Thanks for any guidance. Relevant config follows...

An ARP capture via tcpdump on the controller's physical NIC (em2). Note that no messages come into the tunnel bridge.  In this case, the tenant network is the IP address space 192.168.2.128/25, and the provider network is 192.168.2.0/25.

192.168.2.12.53375 > 192.168.2.11.4789: [no cksum] VXLAN, flags [I] (0x08), vni 83
fa:16:3e:d5:bb:0d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.2.203 tell 192.168.2.212, length 28


        My openvswitch agent configuration:

        [ovs]
        vxlan_udp_port=4789
        tunnel_type=vxlan
        tunnel_id_ranges=1001:2000
        tenant_network_type=vxlan
        local_ip=192.168.2.12
        enalbe_tunneling=True
        bridge_mappings=provider:br-provider
        integration_bridge=br-int
        tunnel_bridge=br-tun

        [agent]
        l2_population=True
        drop_flows_on_start=False
        tunnel_types=vxlan
        vxlan_udp_port=4789
        polling_interval=2

        [securitygroup]
        firewall_driver=neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver


        My ml2_conf.ini:

    [ml2]
    type_drivers=flat,vxlan
    tenant_network_types=vxlan
    mechanism_drivers=openvswitch,l2population
    path_mtu=0
    extension_drivers=port_security,qos

    [securitygroup]
    enable_security_group=True
    firewall_driver=neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver

    [ml2_type_geneve]
    #max_header_size=38
    #vni_ranges=10:100

    [ml2_type_flat]
    flat_networks=provider

    [ml2_type_vxlan]
    vxlan_group=224.0.0.1
    vni_ranges=10:100

    #[ovs]
    bridge_mappings = provider:br-provider
    integration_bridge = br-int
    tenant_network_type = vxlan


        My poc2 (controller/network/compute) server with two VM instances, and em2 as the physical NIC to the provider network:

        [root@poc2 ~(keystone_demo)]# ovs-vsctl show
        954225e6-4e81-42e1-ae90-8bac02f38e9d
            Manager "ptcp:6640:127.0.0.1"
                is_connected: true
            Bridge br-int
                Controller "tcp:127.0.0.1:6633"
                    is_connected: true
                fail_mode: secure
                Port "tap6c607995-02"
                    tag: 1
                    Interface "tap6c607995-02"
                        type: internal
                Port "ovn-89320c-0"
                    Interface ...
(more)
edit retag flag offensive close merge delete

Comments

I don't know if it's relevant, but in your openvswitch configuration you use a misspelled enable_tunneling (this option is deprecated anyway), and also misspell tunnel_types in one place.

I see a reference to Geneve in br-int. If only for clarity, I would remove it from ml2_conf.ini.

Bernd Bausch gravatar imageBernd Bausch ( 2019-08-13 19:22:55 -0500 )edit

Check whether the poc1 and poc2 configs are identical. Feel free to add the other config files to the question, in particular ml2_conf.ini.

Other than that, my approach would be tracing packets and find out where they get lost. This is tedious, I know.

Bernd Bausch gravatar imageBernd Bausch ( 2019-08-13 19:25:06 -0500 )edit

Thanks Bernd. I've updated my question to include ml2_config.ini (I don't have enough ask.openstack points yet to upload files). Thanks for pointing out the misspelling. I'll make the fix and report back.

rlr gravatar imagerlr ( 2019-08-14 08:06:18 -0500 )edit

Using tcpdump on both poc1 and poc2, I can see ARP requests for IP addresses using tunneling (the VXLAN tenant network) are not getting responses. I'll be spending some time figuring out how this is intended to work, but any advise to get me moving down the right path would be appreciated.

rlr gravatar imagerlr ( 2019-08-14 14:14:54 -0500 )edit

How Neutron blocks access to MAC addresses is an area that I don't know well. I thought that ARPs were unnecessary thanks to the L2population driver, but perhaps that driver doesn't work because ARPs don't get through?

You say there are no responses - do you see them arrive on the other side?

Bernd Bausch gravatar imageBernd Bausch ( 2019-08-15 02:35:58 -0500 )edit