Ask Your Question
4

iptables INVALID rule preventing RST packets on closed ports between VMs

asked 2014-04-28 14:52:53 -0500

snewpy gravatar image

updated 2014-04-30 06:04:11 -0500

I have two VMs on a single tenant network (10.5.0.0/24), lets call them vm1 and vm2. Both have the same security groups.

When something is listening on tcp/5432 on vm1, connections from vm2 to that port are successful as expected. However, if nothing is listening, then the RST packet responding to the closed port is dropped by the INVALID rule of vm2's physdev-in chain, making connections time out rather than being refused.

If I replace the DROP action of the INVALID rule with a LOG action, the RST packet gets through and the following packet is logged by netfilter:

IN=qbra8260acc-e8 OUT=qbra8260acc-e8 PHYSIN=qvba8260acc-e8 PHYSOUT=tapa8260acc-e8 MAC=fa:16:3e:12:38:32:fa:16:3e:91:4e:7a:08:00 SRC=10.5.0.42 DST=10.5.0.43 LEN=40 TOS=0x10 PREC=0x00 TTL=64 ID=22888 DF PROTO=TCP SPT=5432 DPT=55265 WINDOW=0 RES=0x00 ACK RST URGP=0

UPDATE: An additional data point -- these packets are only marked as invalid if the two VMs are running on the same compute node. If they are on different compute nodes, then the RST is not marked invalid.

vm1 (10.5.0.42)'s physdev-out iptables rules are:

-A neutron-openvswi-sg-chain -m physdev --physdev-out tap458d9c1a-32 --physdev-is-bridged -j neutron-openvswi-i458d9c1a-3
-A neutron-openvswi-i458d9c1a-3 -m state --state INVALID -j DROP
-A neutron-openvswi-i458d9c1a-3 -m state --state RELATED,ESTABLISHED -j RETURN
-A neutron-openvswi-i458d9c1a-3 -p icmp -j RETURN
-A neutron-openvswi-i458d9c1a-3 -p tcp -m tcp --dport 22 -j RETURN
-A neutron-openvswi-i458d9c1a-3 -p udp -m udp -m multiport --dports 1:65535 -j RETURN
-A neutron-openvswi-i458d9c1a-3 -s 10.5.0.44/32 -p tcp -m tcp --dport 5432 -j RETURN
-A neutron-openvswi-i458d9c1a-3 -s 10.5.0.43/32 -p tcp -m tcp --dport 5432 -j RETURN
-A neutron-openvswi-i458d9c1a-3 -s 10.5.0.3/32 -p udp -m udp --sport 67 --dport 68 -j RETURN
-A neutron-openvswi-i458d9c1a-3 -j neutron-openvswi-sg-fallback

vm2 (10.5.0.43)'s physdev-in rules are:

-A neutron-openvswi-sg-chain -m physdev --physdev-out tapa8260acc-e8 --physdev-is-bridged -j neutron-openvswi-ia8260acc-e
-A neutron-openvswi-ia8260acc-e -m state --state INVALID -j DROP
-A neutron-openvswi-ia8260acc-e -m state --state RELATED,ESTABLISHED -j RETURN
-A neutron-openvswi-ia8260acc-e -p icmp -j RETURN
-A neutron-openvswi-ia8260acc-e -p tcp -m tcp --dport 22 -j RETURN
-A neutron-openvswi-ia8260acc-e -p udp -m udp -m multiport --dports 1:65535 -j RETURN
-A neutron-openvswi-ia8260acc-e -s 10.5.0.44/32 -p tcp -m tcp --dport 5432 -j RETURN
-A neutron-openvswi-ia8260acc-e -s 10.5.0.42/32 -p tcp -m tcp --dport 5432 -j RETURN
-A neutron-openvswi-ia8260acc-e -s 10.5.0.3/32 -p udp -m udp --sport 67 --dport 68 -j RETURN
-A neutron-openvswi-ia8260acc-e -j neutron-openvswi-sg-fallback

This is fresh install of Icehouse on Ubuntu 14.04. Any advice would be very much appreciated.

Wireshark (tshark -i tapa8260acc-e8 'host 10.5.0.42') on the tap interface for vm2 shows no RST packets reaching it, but they do reach the qbra8260acc-e8 interface.

Capturing on 'tapa8260acc-e8'
  1   0.000000    10.5.0.43 -> 10.5.0.42    TCP 74 54667 > postgresql ...
(more)
edit retag flag offensive close merge delete

Comments

I am using the OVS agent. Looking with Wireshark I can see the packet leaving the interface of vm1 and it looks OK, but it gets dropped as invalid by the compute node. I'm not sure what else I can see with Wireshark? By the time it hits tapa8260acc-e8 associated with vm2 it is marked invalid by nf.

snewpy gravatar imagesnewpy ( 2014-04-29 06:41:15 -0500 )edit

can you show tcpdump -vvv on the qbra8260acc-e8 interface?

darragh-oreilly gravatar imagedarragh-oreilly ( 2014-04-29 08:29:57 -0500 )edit

I updated the question with the output of the tcpdump on that interface.

snewpy gravatar imagesnewpy ( 2014-04-29 08:36:35 -0500 )edit

I don't see it

darragh-oreilly gravatar imagedarragh-oreilly ( 2014-04-29 08:46:27 -0500 )edit

I think you have to click the (more) button to see the whole question.

snewpy gravatar imagesnewpy ( 2014-04-29 08:47:41 -0500 )edit

4 answers

Sort by ยป oldest newest most voted
1

answered 2015-07-30 16:54:35 -0500

kevinbenton gravatar image

Root Cause

This issue is caused by the iptables setup in the reference OVS implementation in Neutron.

Each VM gets its own filtering bridge, so the path of a packet between two VMs on the same host looks like this:

VM1 -> bridge1 (iptables filtering) -> OVS -> bridge2 (iptables filtering) -> VM2

In this setup each packet goes through a conntrack lookup twice (once on each bridge). This would normally not be an issue; however, the conntrack state is shared between the filtering bridges. This is normally not a problem because conntrack is keeping track of both sides of the TCP connection. The issue comes with the RST flag.

When conntrack encounters a TCP packet with a RST flag it immediately destroys the conntrack entry for that connection. This means that once the RST packet reaches the second filtering bridge, the conntrack state has already been removed, so the RST packet is marked as INVALID.

VM1 -> bridge1 (iptables filtering) -> OVS -> bridge2 (iptables filtering) -> VM2
RST >> conntrack destroys conn.     >>>>>>>>> no match, INVALID DROP

If you run conntrack -E -o timestamp while attempting to make a connection that causes a RST, you can see the RST is destroying the state in conntrack:

~$ sudo conntrack -E -o timestamp
[1438290214.284944]     [NEW] tcp      6 120 SYN_SENT src=10.0.0.9 dst=10.0.0.10 sport=36397 dport=99 [UNREPLIED] src=10.0.0.10 dst=10.0.0.9 sport=99 dport=36397 zone=1
[1438290214.285129] [DESTROY] tcp      6 src=10.0.0.9 dst=10.0.0.10 sport=36397 dport=99 [UNREPLIED] src=10.0.0.10 dst=10.0.0.9 sport=99 dport=36397 zone=1

The Fix

There is a bug open for this behavior here: https://bugs.launchpad.net/neutron/+bug/1478925 (https://bugs.launchpad.net/neutron/+b...)

However, it won't be fixed for Icehouse because it's already EOL. It will be fixed in Liberty, but the ability to be back-ported to Juno and Kilo will depend on how complex the fix is.

This can be fixed with conntrack zones, which were only added Kilo so if that's the route taken it won't make it to Juno. It can also be fixed with a hack to get iptables to skip the DESTROY phase, but that will then leave TCP states open that were RST until they expire, so it's not likely that will be an acceptable solution.

edit flag offensive delete link more

Comments

thanks kevin, makes sense. I didn't see this with the linuxbridge agent because it uses only one bridge per network on each node.

darragh-oreilly gravatar imagedarragh-oreilly ( 2015-07-31 13:28:47 -0500 )edit
1

answered 2014-04-28 23:22:12 -0500

SGPJ gravatar image

I have detailed answer to disable IPTables rules, please refer to link

Disable neutron security group rules then, modify nwfilter nova-base. I deleted no-ip-spoofing and no-arp-spoofing.

@HOST

$ virsh nwfilter-dumpxml nova-base <filter name="nova-base" chain="root"> <uuid>83f28c40-429a-4e50-80a7-0a2b70a2d210</uuid> <filterref filter="no-mac-spoofing"/> <filterref filter="no-ip-spoofing"/> <filterref filter="no-arp-spoofing"/> <filterref filter="allow-dhcp-server"/> </filter>

$ virsh nwfilter-edit nova-base Network filter nova-base XML configuration edited.

$ virsh nwfilter-dumpxml nova-base <filter name="nova-base" chain="root"> <uuid>83f28c40-429a-4e50-80a7-0a2b70a2d210</uuid> <filterref filter="no-mac-spoofing"/> <filterref filter="allow-dhcp-server"/> </filter>

I did ping to vm1 on vm2 again. This time, it succeeded.

Thanks.

edit flag offensive delete link more

Comments

I tried removing no-ip-spoofing and no-arp-spoofing from nwfilter in libvirt but it didn't make any difference. My problem isn't pinging or general network connectivity, it's specifically that valid, related RST packets are being marked as INVALID by netfilter.

snewpy gravatar imagesnewpy ( 2014-04-29 06:51:24 -0500 )edit

You need to manually edit the .xml files to achieve it.

SGPJ gravatar imageSGPJ ( 2014-04-29 07:04:03 -0500 )edit

I'm not trying to spoof MAC or IP addresses, and I have network connectivity other than RST packets, so I think this is not the same problem. The interfaces in the instances' XML files don't contain any filterref nodes, so I don't think that changing these will help.

snewpy gravatar imagesnewpy ( 2014-04-29 07:17:45 -0500 )edit
0

answered 2015-02-02 00:00:13 -0500

eyosef gravatar image

Hi, I'm having the same problem. It all comes from the conntrack

RST Packet is marked as INVALID because there is no related line for it in the conntrack.

When placing the VM's on different compute nodes, the SYN packet adds a new record in the conntrack with SYN_SENT value and UNREPLIED state. Once the RST packet returns it related to this flow and everything is O.K.

When placing the VM's on the same compute node. the SYN packet is not tracked by the conntrack and when the RST packet returns it is not related to any existing flow and hence dropped!!!

I used - watch -n 0.5 'cat /proc/net/nf_conntrack | grep My_port_XX'

I'm running with kernel version 2.6.32

edit flag offensive delete link more
0

answered 2014-04-29 03:44:11 -0500

darragh-oreilly gravatar image

updated 2014-04-29 08:26:22 -0500

I don't see this with ubuntu compute node and cirros instances and linuxbridge-agent. Instead I see the RST packet hit the RELATED,ESTABLISHED rule. You could use Wireshark to look closer at the packets, and maybe see why the return hits the INVALID rule.

$ tcpdump -tvvv -r dump-reset 
reading from file dump-reset, link-type EN10MB (Ethernet)
IP (tos 0x0, ttl 64, id 56219, offset 0, flags [DF], proto TCP (6), length 60)
    10.0.0.5.57436 > 10.0.0.4.9000: Flags [S], cksum 0x1437 (incorrect -> 0x16a1), seq 4247068559, win 14600, options [mss 1460,sackOK,TS val 377876 ecr 0,nop,wscale 2], length 0
IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
    10.0.0.4.9000 > 10.0.0.5.57436: Flags [R.], cksum 0x7b8d (correct), seq 0, ack 4247068560, win 0, length 0
edit flag offensive delete link more

Comments

I posted a wireshark excerpt above, there doesn't seem to be any obvious reason why it is invalid. They are valid enough to reach the qbra8260acc-e8 interface (associated with vm2), but do not reach the tap interface for vm2.

snewpy gravatar imagesnewpy ( 2014-04-29 08:31:21 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

4 followers

Stats

Asked: 2014-04-28 14:52:53 -0500

Seen: 4,771 times

Last updated: Jul 30 '15