VMs can't communicate if they spawn on different compute nodes. [closed]
I have a fairly typical set up running Icehouse. I installed it using RDO but made many modifications post-install. I'll describe my problem here and below you can find my configuration details. Basically, if two VMs spawn on the same compute node they can communicate without problems while if they spawn on different compute nodes they can not communicate. This also pertains to communicating with the controller/network node as upon startup any VM can't contact the DCHP server running on the controller/network node.
I've done quite a bit of background and I tried analyzing the network stack on the compute nodes using tcpdump. I pasted the compute network stack image below for reference. If I ping the DHCP server from one of these VMs, I can see the ARP requests all the until phy-br-eth3 (the last veth pair) but I can't see the ARP requests on br-eth3. I'm struggling to understand how the veth pair is associated with a bridge and why it's not passing it forward. Any help would be greatly appreciated.
Controler + Network Node Compute1 Compute2
eth0 -- 192.168.100.1 eth0 -- 192.168.100.2 eth0 -- 192.168.0.3
eth3 -- 20.20.0.1 eth3 -- 20.20.0.2 eth3 -- 20.20.0.3
I'm using the openvswitch plugin with VLANs. Here's the config:
[OVS] vxlan_udp_port=4789 network_vlan_ranges=default:200:4094 tenant_network_type=vlan enable_tunneling=False integration_bridge=br-int bridge_mappings=default:br-eth3
I use eth3 for the private network and eth0 for the public one. Here's the network config for eth3 on the controller/network node (won't paste it for the compute as it's very similar):
[root@os-controller ~]# ifconfig br-eth3 br-eth3 Link encap:Ethernet HWaddr 00:1E:68:37:F3:C9 inet addr:20.20.0.1 Bcast:20.20.255.255 Mask:255.255.0.0 inet6 addr: fe80::21e:68ff:fe37:f3c9/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:445421 errors:0 dropped:0 overruns:0 frame:0 TX packets:410007 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:131984658 (125.8 MiB) TX bytes:96168683 (91.7 MiB) [root@os-controller ~]# ifconfig eth3 eth3 Link encap:Ethernet HWaddr 00:1E:68:37:F3:C9 inet6 addr: fe80::21e:68ff:fe37:f3c9/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:10149879 errors:0 dropped:0 overruns:0 frame:0 TX packets:14561098 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:9496509970 (8.8 GiB) TX bytes:18348225120 (17.0 GiB) Interrupt:17 Memory:fcd80000-fcda0000
Output of ovs-vsctl for the compute nodes (I'll only paste it for one) and controller/network mode:
[root@os-controller ~]# ovs-vsctl show c8fbaa95-8288-4320-a25a-6b7feb155584 Bridge "br-eth3" Port "br-eth3" Interface "br-eth3" type: internal Port "eth3" Interface "eth3" Bridge br-int Port "qr-10604ac2-f6" tag: 3 Interface "qr-10604ac2-f6" type: internal Port int-vmdata Interface int-vmdata Port "tap57a88343-0c" tag: 2 Interface "tap57a88343-0c" type: internal ...
Wondering if you're still experiencing this problem six months on, or if you've had any success in moving past it?