Ask Your Question
0

vm's on same compute node fail to communicate on different vlans

asked 2015-02-17 14:09:53 -0500

wp_bengleman gravatar image

I have a 3 node setup using neutron with vlan segregation and external gateways. Instances spawned on different compute hosts and across different Vlans, I.e., Vlan 132 and Vlan 108 are able to communicate with each other over our existing infrastructure. Instances spawned on the same compute node, however are not able to complete the TCP handshake. Security groups are wide open between the different networks and external firewall settings have been verified.

Expected traffic flow is from instance A on Vlan 132 (10.32.0.84/22) to an external hw firewall (handling the default route for both Vlans) to instance B on Vlan 108 (10.8.200.82/16)

Openstack Version: Juno Openvswitch Version: 2.3.1 Ubuntu Trusty

ml2_conf.ini

[ml2]
type_drivers = vlan
tenant_network_types = vlan
mechanism_drivers = openvswitch,l2_population
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = physnets:108:108,physnets:132:132
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True
enable_ipset = True
firewall_driver = neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver

[ovs]
tenant_network_type = vlan
integration_bridge = br-int
bridge_mappings = physnets:ovsbr0
local_ip = 10.142.0.96
use_veth_interconnection = True

[agent]
physical_interface_mappings = physnets:bond0
arp_responder = True
l2_population = True

Openswitch flows on br-int

[STACK] root@s-compute3:~# ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=2059.312s, table=0, n_packets=1342, n_bytes=217895, idle_age=72, priority=1 actions=NORMAL
 cookie=0x0, duration=2056.254s, table=0, n_packets=1090, n_bytes=94825, idle_age=72, priority=3,in_port=11,dl_vlan=132 actions=mod_vlan_vid:1,NORMAL
 cookie=0x0, duration=2055.737s, table=0, n_packets=7675, n_bytes=547580, idle_age=0, priority=3,in_port=11,dl_vlan=108 actions=mod_vlan_vid:2,NORMAL
 cookie=0x0, duration=2058.451s, table=0, n_packets=199398, n_bytes=12224756, idle_age=0, priority=2,in_port=11 actions=drop
 cookie=0x0, duration=2059.265s, table=23, n_packets=0, n_bytes=0, idle_age=2059, priority=0 actions=drop

If anyone has any thoughts on why communication would fail when these instances run on the same compute node, I would be grateful. Thanks in Advance!

Brian

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted
0

answered 2015-03-13 16:21:36 -0500

wp_bengleman gravatar image

Figured this out. In our original configuration we were applying the bridge_mappings to ovsbr0. This also happened to be our uplink bridge. The error case here seems to be with Openvswitch and hairpinning packets through the kernel module. To solve we followed the advice from the folks over at the OpenCloudBlog ( http://www.opencloudblog.com/?p=557 ).

Effectively we created a dedicated bridge interface for the uplinks (bonded interfaces) and created additional bridges for integration, br-vlan and br-int. These are all then connected via patch ports.

Updated config files:

type_drivers = vxlan,local,vlan,flat
tenant_network_types = vxlan
mechanism_drivers = openvswitch
[ml2_type_flat]
flat_networks = *

[ml2_type_vlan]
network_vlan_ranges = physnets:108:108,physnets:132:132

[ml2_type_vxlan]
vni_ranges = 65537:69999

[ovs]
tunnel_type = vxlan
tunnel_bridge = br-tun
tenant_network_type = vxlan
integration_bridge = br-int
tunnel_id_ranges = 65537:69999
enable_tunneling = True
bridge_mappings = physnets:br-vlan
local_ip = 10.143.0.96

[agent]
tunnel_types = vxlan

openvswitch-config:

Bridge br-uplink
    Port "bond0"
        Interface "p3p2"
        Interface "p3p1"
    Port br-uplink
        Interface br-uplink
            type: internal
    Port patch-to-vlan
        Interface patch-to-vlan
            type: patch
            options: {peer=patch-to-uplink}
Bridge br-int
    fail_mode: secure
    Port "qvobd84afa7-1d"
        tag: 3
        Interface "qvobd84afa7-1d"
    Port int-br-vlan
        Interface int-br-vlan
            type: patch
            options: {peer=phy-br-vlan}
    Port br-int
        Interface br-int
            type: internal
    Port patch-tun
        Interface patch-tun
            type: patch
            options: {peer=patch-int}
Bridge br-tun
    Port br-tun
        Interface br-tun
            type: internal
    Port patch-int
        Interface patch-int
            type: patch
            options: {peer=patch-tun}
    Port "vxlan-0a8f005c"
        Interface "vxlan-0a8f005c"
            type: vxlan
            options: {df_default="true", in_key=flow, local_ip="10.143.0.96", out_key=flow, remote_ip="10.143.0.92"}
    Port "vxlan-0a8f005e"
        Interface "vxlan-0a8f005e"
            type: vxlan
            options: {df_default="true", in_key=flow, local_ip="10.143.0.96", out_key=flow, remote_ip="10.143.0.94"}
Bridge br-vlan
    Port br-vlan
        Interface br-vlan
            type: internal
    Port phy-br-vlan
        Interface phy-br-vlan
            type: patch
            options: {peer=int-br-vlan}
    Port patch-to-uplink
        Interface patch-to-uplink
            type: patch
            options: {peer=patch-to-vlan}
    Port "l3vxlan"
        tag: XXX
        Interface "l3vxlan"
            type: internal

Hope this helps!

edit flag offensive delete link more

Comments

Update. The above outlined approach worked for a bit and then we started seeing problems again. After a deeper investigation, the problem turned out to be TCP Sequence Randomization at the firewall between the subnets which was competing with iptables on the local host. disabled at the fw fixed it.

wp_bengleman gravatar imagewp_bengleman ( 2015-05-08 10:05:45 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2015-02-17 14:09:53 -0500

Seen: 533 times

Last updated: Mar 13 '15