Ask Your Question
1

OVS doesn't forward packets from tap-device to br-int

asked 2013-04-12 17:50:06 -0500

markus-sendingthesea gravatar image

Hi all,

I'm really stuck with my network-setup. I'm using quantum with OVS-plugin. After installation everything worked so far. But after reboot it doesn't any more. I can't reach instance IPs since then. So I followed the packets from compute node to network node and could figure out where it breaks.

From compute-node I can see DHCP packets start their journey:

root@compute1:~# tcpdump -nn -i br-int tcpdump: WARNING: br-int: no IPv4 address assigned tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on br-int, link-type EN10MB (Ethernet), capture size 65535 bytes 19:41:15.067727 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:4a:33:9d, length 286 19:41:18.070965 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:4a:33:9d, length 286 19:41:22.588759 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:c4:29:fa, length 300

Those packets also make it to the network-node...

root@network:~# tcpdump -nn -i br-int tcpdump: WARNING: br-int: no IPv4 address assigned tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on br-int, link-type EN10MB (Ethernet), capture size 65535 bytes 19:43:11.132307 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:4a:33:9d, length 286 19:43:14.135490 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:4a:33:9d, length 286 19:43:17.434270 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:c4:29:fa, length 300

...and even get to tap66178edc-18 device that is configured for dhcp. As I can see the dhcp-server also answers the dhcp-request:

root@network:~# tcpdump -nn -i tap66178edc-18 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on tap66178edc-18, link-type EN10MB (Ethernet), capture size 65535 bytes 19:44:20.399708 ARP, Request who-has 10.5.5.3 tell 10.5.5.2, length 28 19:44:21.399712 ARP, Request who-has 10.5.5.3 tell 10.5.5.2, length 28 19:44:22.403720 ARP, Request who-has 10.5.5.3 tell 10.5.5.2, length 28 19:44:27.684714 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:c4:29:fa, length 300 19:44:27.685000 IP 10.5.5.2.67 > 10.5.5.3.68: BOOTP/DHCP, Reply, length 308 19:44:32.687714 ARP, Request who-has 10.5.5.3 tell 10.5.5.2, length 28 19:44:33.687725 ARP, Request who-has 10.5.5.3 tell 10.5.5.2, length 28 19:44:34.687720 ARP, Request who-has 10.5 ... (more)

edit retag flag offensive close merge delete

20 answers

Sort by ยป oldest newest most voted
0

answered 2013-04-17 20:37:51 -0500

markus-sendingthesea gravatar image

Hey Guys,

thanks a lot for your help so far. I'm short in time next days, but will do more tests next week. I'll keep you up to date.

edit flag offensive delete link more
0

answered 2013-04-16 19:23:13 -0500

markus-sendingthesea gravatar image

Hi Darragh,

Yes I have configured a router and set the id in l3_agent.ini:

If use_namespaces is set as False then the agent can only configure one router.

This is done by setting the specific router_id.

router_id = cc2cd248-ce60-47ff-841a-6cbf6915b86c

root@network:/etc/quantum# quantum router-list +--------------------------------------+-----------------+--------------------------------------------------------+ | id | name | external_gateway_info | +--------------------------------------+-----------------+--------------------------------------------------------+ | cc2cd248-ce60-47ff-841a-6cbf6915b86c | provider-router | {"network_id": "f665454d-8159-4d13-a40f-d3d1f0bc1149"} | +--------------------------------------+-----------------+--------------------------------------------------------+

But I think my problem starts already on layer 2, because ICMP doesn't get from the network-node to the instance as mentioned in #10.

Markus.

edit flag offensive delete link more
0

answered 2013-04-16 17:55:29 -0500

darragh-oreilly gravatar image

you are currently not using namespaces - that means you need to do this:

If use_namespaces is set as False then the agent can only configure one router.

This is done by setting the specific router_id.

router_id = 1064ad16-36b7-4c2f-86f0-daa2bcbd6b2a

Have you done that?

Darragh.

edit flag offensive delete link more
0

answered 2013-04-16 14:42:52 -0500

I have had similar problems recently in my environment. The dhcp-agent got the dhcp-request, but the reply didn't find its way back to instance. In the end... clearing "/etc/openvswitch/conf.db" and rebooting seemed to do the trick.

edit flag offensive delete link more
0

answered 2013-04-16 14:21:55 -0500

markus-sendingthesea gravatar image

Hmm, ok. I'll give it a try. But looking at the quantum.conf I found this:

Enable or disable overlapping IPs for subnets

Attention: the following parameter MUST be set to False if Quantum is

being used in conjunction with nova security groups and/or metadata service.

allow_overlapping_ips = False

So, do I have to disable security groups and metadata service when I turn on "overlapping_ips"? If so, where do I have to do it?

Sorry about all those questions, but that stuff is really confusing.

Thanks again.

edit flag offensive delete link more
0

answered 2013-04-16 13:55:23 -0500

darragh-oreilly gravatar image

Hi Markus,

your route table looks funny - each network has 2 routes. I'm not sure what that means - I have always used namespaces where each namespace (dhcp and routers) has its own route table. As you are not using namespaces, have you added the router id to quantum.conf as described here http://docs.openstack.org/folsom/open... ? I'm don't know what this does, but maybe it solves the route problem somehow.

Also, I don't see a dnat for a floating ip, but I see metadata support setup - maybe the successful ping from the controller used it.

Darragh.

edit flag offensive delete link more
0

answered 2013-04-16 13:13:57 -0500

markus-sendingthesea gravatar image

Darragh,

I tried to ping from network-node but the icmp-packets aren't forwarded. Before reboot I could reach the instance from network- and also controller-node, as I already have a set up router.

I'm running my test-installation on an ESXi Node, so switching to virtual box wouldn't be that easy. Besides, this is the second try to install OpenStack and both times I got serious problems with the networking stuff. So if it's ok for you I would like to do some more troubleshooting to find out what I'm doing wrong or what has to be done after reboot to get things online again.

I checked the logs under /var/log/quantum, but besides the regular DEBUG messages there is nothing pinpointing to any errors. The only thing that looks suspicious is, that l3-agent.log and openvswitch-agent.log get updates every second. But it seems to be regular agent work that gets logged because of the debug config.

I've also restarted l3-agent, but it hasn't solved the problem.

So here are the requested settings:

root@network:/var/log/quantum# ovs-vsctl show 63aca6ee-33fd-44f5-828c-9ebff952445c Bridge br-int Port "qr-27a2e49b-d0" tag: 1 Interface "qr-27a2e49b-d0" type: internal Port "tap66178edc-18" tag: 1 Interface "tap66178edc-18" type: internal Port patch-tun Interface patch-tun type: patch options: {peer=patch-int} Port br-int Interface br-int type: internal Bridge br-tun Port patch-int Interface patch-int type: patch options: {peer=patch-tun} Port br-tun Interface br-tun type: internal Port "gre-2" Interface "gre-2" type: gre options: {in_key=flow, out_key=flow, remote_ip="10.10.10.233"} Bridge br-ex Port "eth2" Interface "eth2" Port br-ex Interface br-ex type: internal Port "qg-9ba11a5d-c3" Interface "qg-9ba11a5d-c3" type: internal ovs_version: "1.4.0+build0" root@network:/var/log/quantum# ip a 1: lo: <loopback,up,lower_up> mtu 16436 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <broadcast,multicast,up,lower_up> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:50:56:9d:b1:67 brd ff:ff:ff:ff:ff:ff inet 192.168.0.232/24 brd 192.168.0.255 scope global eth0 inet6 fe80::250:56ff:fe9d:b167/64 scope link valid_lft forever preferred_lft forever 3: eth1: <broadcast,multicast,up,lower_up> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:50:56:9d:e6:75 brd ff:ff:ff:ff:ff:ff inet 10.10.10.232/24 brd 10.10.10.255 scope global eth1 inet6 fe80::250:56ff:fe9d:e675/64 scope link valid_lft forever preferred_lft forever 4: eth2: <broadcast,multicast,promisc,up,lower_up> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:50:56:9d:8c:95 brd ff:ff:ff:ff:ff:ff inet6 fe80::250:56ff:fe9d:8c95/64 scope link valid_lft forever preferred_lft forever 6: br-int: <broadcast,multicast> mtu 1500 qdisc noop state DOWN link/ether aa:7a:ae:66:32 ... (more)

edit flag offensive delete link more
0

answered 2013-04-15 19:23:18 -0500

markus-sendingthesea gravatar image

Thanks for the explanation regarding the bridges, that makes things clearer.

OK, as far as I could figure it out, it seems that the flows are set by quantum-plugin-openvswitch-agent. After restarting openvswitch-switch the flows disappeared and the plugin restart brought them back. It currently looks like this:

root@network:~# ovs-ofctl dump-flows br-tun NXST_FLOW reply (xid=0x4): cookie=0x0, duration=229.923s, table=0, n_packets=16, n_bytes=2032, priority=3,tun_id=0x1,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=mod_vlan_vid:1,output:1 cookie=0x0, duration=229.985s, table=0, n_packets=43, n_bytes=3314, priority=4,in_port=1,dl_vlan=1 actions=set_tunnel:0x1,NORMAL cookie=0x0, duration=229.583s, table=0, n_packets=0, n_bytes=0, priority=3,tun_id=0x1,dl_dst=fa:16:3e:40:8d:d8 actions=mod_vlan_vid:1,NORMAL cookie=0x0, duration=229.859s, table=0, n_packets=0, n_bytes=0, priority=3,tun_id=0x1,dl_dst=fa:16:3e:74:b3:d5 actions=mod_vlan_vid:1,NORMAL cookie=0x0, duration=230.887s, table=0, n_packets=75, n_bytes=8492, priority=1 actions=drop

root@compute1:~# ovs-ofctl dump-flows br-tun NXST_FLOW reply (xid=0x4): cookie=0x0, duration=24.642s, table=0, n_packets=2, n_bytes=180, priority=3,tun_id=0x1,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=mod_vlan_vid:1,output:1 cookie=0x0, duration=24.71s, table=0, n_packets=8, n_bytes=1008, priority=4,in_port=1,dl_vlan=1 actions=set_tunnel:0x1,NORMAL cookie=0x0, duration=24.575s, table=0, n_packets=2, n_bytes=92, priority=3,tun_id=0x1,dl_dst=fa:16:3e:64:c0:46 actions=mod_vlan_vid:1,NORMAL cookie=0x0, duration=25.589s, table=0, n_packets=0, n_bytes=0, priority=1 actions=drop root@compute1:~#

Dumping on my eth1 devices shows me, that in- and out-going traffic is now tagged with 0x1. The instance also got an IP from dhcp and I can see a lot of arp and dhcp packets that look good so far. When I now try to reach my instance with ping I don't get an reply. The icmp packet doesn't show up in my dump as well, so it seems to be not forwarded by the bridge. The drop rule on the network-node shows an increasing number of packets. So is there still something missing? I read in some other discussions that enabling namespaces solved some flow problems? I don't know if this is the same issue here and might help?

edit flag offensive delete link more
0

answered 2013-04-15 16:53:43 -0500

darragh-oreilly gravatar image

I think the reason for the hybrid driver is because iptables doesn't work with interfaces that are in OVS bridges. So the hybrid driver creates a new Linux bridge (qbr) just for the tap device and the filtering can be done on it. This bridge is linked to OVS br-int bridge with the qvb and qbo veth pair.

In your case the packets on compute1 should be returning to the qbr bridge via eth1/br-tun/br-int. So now I see that the GRE packets from compute->network have 0x1 for the key, while those coming the opposite direction have 0x0 - which probably means no key, which probably means the ovs-agent on the network node did not setup br-tun with the proper flows.

To see the flows, do this on both nodes: $ sudo ovs-ofctl dump-flows br-tun

I think restarting the ovs agent on netnode should rebuild br-tun.

Darragh.

edit flag offensive delete link more
0

answered 2013-04-28 19:46:53 -0500

markus-sendingthesea gravatar image

Hi all,

I finally figured out what goes wrong during reboot. The interfaces on the open-vswitch created by quantum don't get cleaned up after reboot and aren't accessible after that. As a workaround I patched the upstart scripts to delete the open-vswitch ports before startup, so that they are created again by quantum. With this patches applied I get a working quantum setup even after a reboot.

root@network:/etc/init# diff -u quantum-l3-agent.orig quantum-l3-agent.conf --- quantum-l3-agent.orig 2013-04-28 21:13:23.720191038 +0200 +++ quantum-l3-agent.conf 2013-04-28 21:35:06.739991637 +0200 @@ -9,6 +9,15 @@ pre-start script mkdir -p /var/run/quantum chown quantum:root /var/run/quantum + + ovsvsctl='/usr/bin/ovs-vsctl' + for br in br-int br-ex; do + if IFACE=$(${ovsvsctl} list-ports $br|grep ^q) ; then + for i in $IFACE; do + ${ovsvsctl} del-port $br $i + done + fi + done end script

root@network:/etc/init# diff -u quantum-dhcp-agent.orig quantum-dhcp-agent.conf --- quantum-dhcp-agent.orig 2013-04-28 21:27:59.105279978 +0200 +++ quantum-dhcp-agent.conf 2013-04-28 21:33:18.985337672 +0200 @@ -9,6 +9,14 @@ pre-start script mkdir -p /var/run/quantum chown quantum:root /var/run/quantum + + ovsvsctl='/usr/bin/ovs-vsctl' + if IFACE=$(${ovsvsctl} list-ports br-int|grep ^tap) ; then + for i in $IFACE; do
+ ${ovsvsctl} del-port br-int $i
+ done + fi + end script

BTW: An other thing that I could figure out is, that there shouldn't be an IP address configured on br-ex. As long as an IP was up the SNAT iptables rule wasn't hit.

Thanks again for your help and good luck to anyone trying to setup openstack ;-)

Cheers Markus

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

2 followers

Stats

Asked: 2013-04-12 17:50:06 -0500

Seen: 1,472 times

Last updated: Apr 28 '13