Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Why network traffic gets out of control as instance is launched

I have followed the standard ubuntu 12.04 installation instructions with GRE networking. I have one controller node and 4 compute nodes. Two network interfaces/node (one public, one private) as per instructions. When I start an instance the network just goes crazy (I'm using 10.0.99. as private net and 10.0.20. as tenant net):

13:58:46.846449 IP 10.0.99.5 > 10.0.99.30: GREv0, key=0x4, length 64: ARP, Reply 10.0.20.11 is-at fa:16:3e:e6:b9:b3 (oui Unknown), length 42
13:58:46.846453 IP 10.0.99.5 > 10.0.99.30: GREv0, key=0x4, length 64: ARP, Reply 10.0.20.11 is-at fa:16:3e:e6:b9:b3 (oui Unknown), length 42
13:58:46.846458 IP 10.0.99.5 > 10.0.99.30: GREv0, key=0x4, length 64: ARP, Reply 10.0.20.11 is-at fa:16:3e:e6:b9:b3 (oui Unknown), length 42
13:58:46.846465 IP 10.0.99.5 > 10.0.99.30: GREv0, key=0x4, length 64: ARP, Reply 10.0.20.11 is-at fa:16:3e:e6:b9:b3 (oui Unknown), length 42
13:58:46.846471 IP 10.0.99.5 > 10.0.99.30: GREv0, key=0x4, length 64: ARP, Reply 10.0.20.11 is-at fa:16:3e:e6:b9:b3 (oui Unknown), length 42
13:58:46.846476 IP 10.0.99.5 > 10.0.99.30: GREv0, key=0x4, length 64: ARP, Reply 10.0.20.11 is-at fa:16:3e:e6:b9:b3 (oui Unknown), length 42
13:58:46.846482 IP 10.0.99.5 > 10.0.99.30: GREv0, key=0x4, length 64: ARP, Reply 10.0.20.11 is-at fa:16:3e:e6:b9:b3 (oui Unknown), length 42
13:58:46.846489 IP 10.0.99.5 > 10.0.99.30: GREv0, key=0x4, length 64: ARP, Reply 10.0.20.11 is-at fa:16:3e:e6:b9:b3 (oui Unknown), length 42
13:58:46.846494 IP 10.0.99.5 > 10.0.99.30: GREv0, key=0x4, length 64: ARP, Reply 10.0.20.11 is-at fa:16:3e:e6:b9:b3 (oui Unknown), length 42
13:58:46.846500 IP 10.0.99.5 > 10.0.99.30: GREv0, key=0x4, length 64: ARP, Reply 10.0.20.11 is-at fa:16:3e:e6:b9:b3 (oui Unknown), length 42

It just generates gigabytes of traffic in seconds! The traffic can be ARP or DHCP requests or just pings and somehow the traffic multiplies so that there is a constant flow of packets.

After restarting all services if I get lucky the first instance is working fine without problems but as soon as I create second instance (that gets a different node) the problems begin. When terminating the second instance the traffic will soon stop but then the first instance is not accessible anymore.

I have double checked all the configuration files but I don't see anything wrong. Can someone suggest where to look in particular or get more information about this?

Why network traffic gets out of control as instance is launched

I have followed the standard ubuntu 12.04 installation instructions with GRE networking. I have one controller node and 4 compute nodes. Two network interfaces/node (one public, one private) as per instructions. When I start an instance the network just goes crazy (I'm using 10.0.99. as private net and 10.0.20. as tenant net):

13:58:46.846449 IP 10.0.99.5 > 10.0.99.30: GREv0, key=0x4, length 64: ARP, Reply 10.0.20.11 is-at fa:16:3e:e6:b9:b3 (oui Unknown), length 42
13:58:46.846453 IP 10.0.99.5 > 10.0.99.30: GREv0, key=0x4, length 64: ARP, Reply 10.0.20.11 is-at fa:16:3e:e6:b9:b3 (oui Unknown), length 42
13:58:46.846458 IP 10.0.99.5 > 10.0.99.30: GREv0, key=0x4, length 64: ARP, Reply 10.0.20.11 is-at fa:16:3e:e6:b9:b3 (oui Unknown), length 42
13:58:46.846465 IP 10.0.99.5 > 10.0.99.30: GREv0, key=0x4, length 64: ARP, Reply 10.0.20.11 is-at fa:16:3e:e6:b9:b3 (oui Unknown), length 42
13:58:46.846471 IP 10.0.99.5 > 10.0.99.30: GREv0, key=0x4, length 64: ARP, Reply 10.0.20.11 is-at fa:16:3e:e6:b9:b3 (oui Unknown), length 42
13:58:46.846476 IP 10.0.99.5 > 10.0.99.30: GREv0, key=0x4, length 64: ARP, Reply 10.0.20.11 is-at fa:16:3e:e6:b9:b3 (oui Unknown), length 42
13:58:46.846482 IP 10.0.99.5 > 10.0.99.30: GREv0, key=0x4, length 64: ARP, Reply 10.0.20.11 is-at fa:16:3e:e6:b9:b3 (oui Unknown), length 42
13:58:46.846489 IP 10.0.99.5 > 10.0.99.30: GREv0, key=0x4, length 64: ARP, Reply 10.0.20.11 is-at fa:16:3e:e6:b9:b3 (oui Unknown), length 42
13:58:46.846494 IP 10.0.99.5 > 10.0.99.30: GREv0, key=0x4, length 64: ARP, Reply 10.0.20.11 is-at fa:16:3e:e6:b9:b3 (oui Unknown), length 42
13:58:46.846500 IP 10.0.99.5 > 10.0.99.30: GREv0, key=0x4, length 64: ARP, Reply 10.0.20.11 is-at fa:16:3e:e6:b9:b3 (oui Unknown), length 42

It just generates gigabytes of traffic in seconds! The traffic can be ARP or DHCP requests or just pings and somehow the traffic multiplies so that there is a constant flow of packets.

After restarting all services if I get lucky the first instance is working fine without problems but as soon as I create second instance (that gets a different node) the problems begin. When terminating the second instance the traffic will soon stop but then the first instance is not accessible anymore.

I have double checked all the configuration files but I don't see anything wrong. Can someone suggest where to look in particular or get more information about this?

Edit:

I have tried to understand the GRE traffic flow but it looks strange, here's a beginning of instance boot from the controller:

fa:16:3e:14:e1:95 > Broadcast, vlan 1, p 0, 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request 00:22:4d:a8:6a:04 > 00:22:4d:a8:6c:92, 10.0.99.5 > 10.0.99.30: GREv0, key=0x4, fa:16:3e:14:e1:95 > Broadcast, 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request
fa:16:3e:14:e1:95 > Broadcast, vlan 2, p 0, 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request
00:22:4d:a8:6c:92 > 00:22:4d:a8:6d:eb, 10.0.99.30 > 10.0.99.3: GREv0, key=0x5, fa:16:3e:14:e1:95 > Broadcast, 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request
00:22:4d:a8:6c:92 > 00:22:4d:69:2a:db, 10.0.99.30 > 10.0.99.4: GREv0, key=0x5, fa:16:3e:14:e1:95 > Broadcast, 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request
00:22:4d:a8:6c:92 > 00:22:4d:a8:6a:04, 10.0.99.30 > 10.0.99.5: GREv0, key=0x5, fa:16:3e:14:e1:95 > Broadcast, 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request
00:22:4d:a8:6c:92 > 00:22:4d:a8:71:b7, 10.0.99.30 > 10.0.99.6: GREv0, key=0x5, fa:16:3e:14:e1:95 > Broadcast, 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request

seems like part of the traffic is sent in VLAN:s instead of tunnels? Is this ok for GRE setup? Also the dhcp request flows to the wrong network (key=0x5) whereas is should go to the 0x04 network there the instance is located. The first line displays the GRE tunneled request coming from the instance and it has the correct key=0x04. That seems like the only one that is correct?

I've tried to re-create the br-int and br-ex interfaces but it didn't help. Is there something in openvswitch that needs to be cleared as well?