Ask Your Question

Revision history [back]

Floating IP interface (fg) in FIP namespace duplicates packets

Hi guys, I haven't been able to get my head around this for a few days now. I am using Openvswitch networking with ML2 plugin, ARP responder and L2population are on, setup is DVR. Maybe I should also point out the kernel version (3.19.0-42) and network interface kernel module (i40e), because at this point I have no clue what's going on.

The symptomps were unreachable floating IP addressed of the instances and intermittent outside connectivity from inside the instances. First round of debugging led to an interesting result: the bridge I use to connect to external network (br-ex) had the fg- interface MAC address on the wrong side - that is, in the port connecting to the outside world instead of the phy-br-ex port connected to integration bridge (br-int):

(dev) root@computenode:~$ ovs-ofctl show br-ex
OFPT_FEATURES_REPLY (xid=0x2): dpid:00001c40242b758a
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: OUTPUT SET_VLAN_VID SET_VLAN_PCP STRIP_VLAN SET_DL_SRC SET_DL_DST SET_NW_SRC SET_NW_DST SET_NW_TOS SET_TP_SRC SET_TP_DST ENQUEUE
 2(vlan420): addr:1c:40:24:2b:75:8a
     config:     0
     state:      0
     current:    10GB-FD
     speed: 10000 Mbps now, 0 Mbps max
 5(phy-br-ex): addr:7a:04:1f:e5:f2:90
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-ex): addr:1c:40:24:2b:75:8a
     config:     PORT_DOWN
     state:      LINK_DOWN
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0

(dev) root@computenode:~$ ovs-appctl fdb/show br-ex
 port  VLAN  MAC                Age
    2     0  fa:16:3e:33:9b:4b    2
    2     0  00:08:e3:ff:fd:90    2

(dev) root@computenode:~$ ip netns
fip-8b87c295-c6f0-46d4-b6b1-a13b6f50a1fa
qrouter-ef83504d-d07b-4bf3-9245-dd0231d8b331

(dev) root@computenode:~$ ip netns exec fip-8b87c295-c6f0-46d4-b6b1-a13b6f50a1fa ifconfig
fg-bd6cc674-ab Link encap:Ethernet  HWaddr fa:16:3e:33:9b:4b
          inet addr:37.9.173.202  Bcast:37.9.173.223  Mask:255.255.255.224
          inet6 addr: fe80::f816:3eff:fe33:9b4b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:7885 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1604 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:524844 (524.8 KB)  TX bytes:138281 (138.2 KB)

Note: vlan420 is the interface with access to external network.

As you can see, the fg port's MAC (fa:16:3e:33:9b:4b) is on FDB port 2 of br-ex as opposed to expected port 5. Thus, no packet destined to the floating IP makes it past this point, as its destination MAC is to be found on the port that the packet came from (which results in the packet being dropped).

So I speculated further; something must have looped back a packet going out from FIP namespace through br-ex and vlan420 and forward it back to br-ex - and the poor bridge learned that fg's MAC is on the other side. I started shutting down any redundancy networking and other compute nodes until there was nothing but a switch, one patch cable and the server. The issue persisted.

Sending an ARP or ICMP from inside the FIP namespace to the outside world illustrates the issue. There are always two (almost) identical packets on the external network interface, but one is TX'ed and the other is RX'ed. How on Earth is that possible, I don't know.

(dev) root@computenode:~$ ip netns exec fip-8b87c295-c6f0-46d4-b6b1-a13b6f50a1fa arping -A -I fg-bd6cc674-ab -c 1 -w 1 150.150.150.150
ARPING 150.150.150.150 from 150.150.150.150 fg-bd6cc674-ab
Sent 1 probes (1 broadcast(s))

.. run in parallel:
(dev) root@computenode:~$ tcpdump -i vlan420 -e -n arp and host 150.150.150.150
02:59:00.603955 fa:16:3e:33:9b:4b > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Reply 150.150.150.150 is-at fa:16:3e:33:9b:4b, length 28
02:59:00.603976 fa:16:3e:33:9b:4b > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Reply 150.150.150.150 is-at fa:16:3e:33:9b:4b, length 46

.. the return packet immediatelly hit br-ex from the other side
(dev) root@computenode:~$ ovs-appctl fdb/show br-ex
 port  VLAN  MAC                Age
    2     0  00:08:e3:ff:fd:90    5
    2     0  fa:16:3e:33:9b:4b    2

See the few-microsecond difference and the packet size? That makes me think the packet was looped by some internal mechanism locally, but passed the link layer (since the second one has the minimal Ethernet frame size). But maybe I'm wrong and I will be grateful for any advice.

Blocking incoming packets with fg's source MAC in br-ex's flows gets the job done, but the fact that all traffic from VMs to outside world gets duplicated for the rest of time drives me mad. Thank you for any help.