# How do I debug OVS/VXLAN tenant network issues?

I'm working through a Proof of Concept with Stein. I have two physical servers configured as 1) a controller/network/compute (host=poc2) and 2) just a compute (host=poc1).

There are two physical Ethernet ports on each server, and I have successfully set up a provider network on an isolated flat LAN (192.168.0.0/24) that grants VM instances access. My OpenStack management traffic is on the other LAN. I'm using openvswitch for the ML2 driver. Connectivity passes validation, with 4 VM instances on either compute node (2 each) able to ping each other in a full mesh.

The next step is to create an overlay tenant network using VXLAN over this flat provider network. I appear to have this configured correctly, as I see the tap points set up, and the ovs-vsctl command shows the br-tun with appropriate ports. Additionally, co-hosted VM instances can exchange ping traffic via their tap ports (on poc1 or on poc2), but any attempt to use the overlay/tenant network to ping between servers (poc1 and poc2) fails.

I do have the L2population driver loaded, which I know is supposed to handle the ARP updates. I suspect something is wrong with this, but It is certainly possible something else is going on.

How do I attack this problem? I can't seem to find any log files that show ARP exchanges or L2pop activities.

Thanks for any guidance. Relevant config follows...

An ARP capture via tcpdump on the controller's physical NIC (em2). Note that no messages come into the tunnel bridge.  In this case, the tenant network is the IP address space 192.168.2.128/25, and the provider network is 192.168.2.0/25.

192.168.2.12.53375 > 192.168.2.11.4789: [no cksum] VXLAN, flags [I] (0x08), vni 83
fa:16:3e:d5:bb:0d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.2.203 tell 192.168.2.212, length 28

My openvswitch agent configuration:

[ovs]
vxlan_udp_port=4789
tunnel_type=vxlan
tunnel_id_ranges=1001:2000
tenant_network_type=vxlan
local_ip=192.168.2.12
enalbe_tunneling=True
bridge_mappings=provider:br-provider
integration_bridge=br-int
tunnel_bridge=br-tun

[agent]
l2_population=True
drop_flows_on_start=False
tunnel_types=vxlan
vxlan_udp_port=4789
polling_interval=2

[securitygroup]
firewall_driver=neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver

My ml2_conf.ini:

[ml2]
type_drivers=flat,vxlan
tenant_network_types=vxlan
mechanism_drivers=openvswitch,l2population
path_mtu=0
extension_drivers=port_security,qos

[securitygroup]
enable_security_group=True
firewall_driver=neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver

[ml2_type_geneve]
#vni_ranges=10:100

[ml2_type_flat]
flat_networks=provider

[ml2_type_vxlan]
vxlan_group=224.0.0.1
vni_ranges=10:100

#[ovs]
bridge_mappings = provider:br-provider
integration_bridge = br-int
tenant_network_type = vxlan

My poc2 (controller/network/compute) server with two VM instances, and em2 as the physical NIC to the provider network:

[root@poc2 ~(keystone_demo)]# ovs-vsctl show
954225e6-4e81-42e1-ae90-8bac02f38e9d
Manager "ptcp:6640:127.0.0.1"
is_connected: true
Bridge br-int
Controller "tcp:127.0.0.1:6633"
is_connected: true
fail_mode: secure
Port "tap6c607995-02"
tag: 1
Interface "tap6c607995-02"
type: internal
Port "ovn-89320c-0"
Interface ...
edit retag close merge delete

I don't know if it's relevant, but in your openvswitch configuration you use a misspelled enable_tunneling (this option is deprecated anyway), and also misspell tunnel_types in one place.

I see a reference to Geneve in br-int. If only for clarity, I would remove it from ml2_conf.ini.

( 2019-08-13 19:22:55 -0500 )edit

Check whether the poc1 and poc2 configs are identical. Feel free to add the other config files to the question, in particular ml2_conf.ini.

Other than that, my approach would be tracing packets and find out where they get lost. This is tedious, I know.

( 2019-08-13 19:25:06 -0500 )edit

Thanks Bernd. I've updated my question to include ml2_config.ini (I don't have enough ask.openstack points yet to upload files). Thanks for pointing out the misspelling. I'll make the fix and report back.

( 2019-08-14 08:06:18 -0500 )edit

Using tcpdump on both poc1 and poc2, I can see ARP requests for IP addresses using tunneling (the VXLAN tenant network) are not getting responses. I'll be spending some time figuring out how this is intended to work, but any advise to get me moving down the right path would be appreciated.

( 2019-08-14 14:14:54 -0500 )edit

How Neutron blocks access to MAC addresses is an area that I don't know well. I thought that ARPs were unnecessary thanks to the L2population driver, but perhaps that driver doesn't work because ARPs don't get through?

You say there are no responses - do you see them arrive on the other side?

( 2019-08-15 02:35:58 -0500 )edit