Ask Your Question
2

Instance directly connected to provider network does not receive DHCP reply

asked 2013-07-30 01:59:28 -0500

Michiel K gravatar image

updated 2013-08-05 02:41:24 -0500

Short description of problem

In an envrionment where virtual routers and floating IP addresses are fully working, connecting an instance directly to the provider network does not work. The instance is unable to communicate with the network. More specifically: It is able to send a DHCP request message, but does not receive a DHCP reply. The reply message is visible on the physical network.

Details on the environment

1 controller node, 2 compute nodes, 2 network nodes. OpenStack is configured to use Quantum/Neutron as network service.

All nodes have access to a management network (eth0). The compute and network nodes also have access to a tenant network (OVS GRE on eth1) and a provider network (eth2). eth0 and eth1 have an IP. Eth2 is configured without an IP.

All nodes are running as virtual machines in a manually maintained VMware environment. The only required / specialized change was to allow promisc mode on the provider network, otherwise for example the virtual routers on the network nodes did not receive network traffic on eth2.

What does work

  • create tenant network
  • create provider network
  • create router
  • create instance connected to tenant network
  • create floating ip and assign it to instance
  • test network traffic from instance to provider network (checked if the source ip of the ping is the floating IP: yes)
  • test network traffic from provider network to floating ip (enable port 22 on security group and try ssh'ing to the instance): works

Details of the problem

  • create a new instance connected to provider network: works
  • check (using "nova list" and "nova show") to see if a IP address was provisioned from the provider network: yes.
  • once the instance boots, login (via console) into the instance see if it is able to get the IP address from the DHCP server: No.
  • statically set the IP address to the interface (inside the instance, using "ifconfig eth0 x.x.x.x netmask x.x.x.x") and test communication: fails, both traffic from instance to provider network and the other way around.

Starting from the drawing in http://docs.openstack.org/trunk/openstack-network/admin/content/under_the_hood_openvswitch.html i started debugging where the DHCP packets would get lost.

The DHCP Request packet reaches all the way from the instances to the dnsmasq process on one of the network nodes. One thing which catches my attention is that all packets seem to be duplicated for some reason, but this should not be a problem. This only appears to happen on bridges connected to the provider network. When looking for example in the qdhcp-xxxx-xxx.. namespace on the network node, i only see the packet once, so i will ignore this for now.

dnsmasq on the network node replies with an DHCP Reply packet and reaches onto the provider network, back into the eth2 interface of the compute node

(node2 is the compute node where the instance is running, verified fa:16:3e:47:d9:1b is the mac of the instance, and 192.168.103.230 ... (more)

edit retag flag offensive close merge delete

Comments

How have you set tenant_network_type in ovs_quantum_plugin.ini? gre or vlan? I can't tell from the above - it seems you maybe trying to use both at the same time?

darragh-oreilly gravatar imagedarragh-oreilly ( 2013-08-02 04:48:58 -0500 )edit

tenant_network_type = gre. I also added to more information about provider networks

Michiel K gravatar imageMichiel K ( 2013-08-05 01:59:48 -0500 )edit

Is the provider network/subnet also being used for the floating IPs? Because provider_network1 has router:external=True, and the bridge named br-ex is used for it on the compute node. Is the name br-ex also used for the provider mapping on the network node? It is the default name the L3 agent uses.

darragh-oreilly gravatar imagedarragh-oreilly ( 2013-08-05 05:08:39 -0500 )edit

"Is the provider network/subnet also being used for the floating IPs?" Yes. "Is the name br-ex also used for the provider mapping on the network node?" Yes. Allthough i do not know why, provider networks do work currently for me (problem does not exist anymore). See my answer below.

Michiel K gravatar imageMichiel K ( 2013-08-05 07:20:09 -0500 )edit

After a reinstall i got it working, but unknown how. After some automated deployments, this broke again. Sometimes losing packets already in br-ex. Haven't got it working since and always losing packets somewhere between eth2 and the instance. (i always receive the DHCP Reply on eth2)

Michiel K gravatar imageMichiel K ( 2013-08-13 02:23:34 -0500 )edit

4 answers

Sort by ยป oldest newest most voted
2

answered 2013-08-13 03:54:57 -0500

Michiel K gravatar image

Found the cause: The double traffic (see the TCPdumps) screws up the ARP tables. The double traffic is due to an error in physical networking configuration.

Outgoing traffic from the compute node's eth2 (which is a virtual machine within the vmware environment) is receiving echo's of its outgoing traffic. See the TCP dumps. This screws up the ARP tables in the Open vSwitch switches, causing them to think the mac address of the instance is located on eth2, instead of bridges deeper down into the compute node. As a result, when Open vSwitch receives the DHCP Reply, it sends it out eth2, instead of forwarding it to the instance.

The echo'ing only happens if the policy on VMware's vSwitch is set to allow promisc traffic. This however is not a problem within the VMware environment, but rather a problem in the physical switches connecting those physical VMware servers. I haven't found this cause yet, but i did connect my environment to an isolated set of networks and it started working.

edit flag offensive delete link more

Comments

I have exactly the same problem. Did u find out what was wrong in the configuration of physical switches? When I turn off the promisc. mode on the node side, the instance MAC on the OVS is forwarded correctly via bridge connections, but then the DHCP reply from network node is not seen on the comput

slavonic gravatar imageslavonic ( 2019-03-11 15:15:58 -0500 )edit
0

answered 2013-07-30 14:29:59 -0500

gfa gravatar image

i'm having the same problem, good am not alone.

can you show ovs-ovsctl show on node2?

edit flag offensive delete link more

Comments

Added the output of the command as extra information in the post

Michiel K gravatar imageMichiel K ( 2013-08-05 02:41:58 -0500 )edit

I though i had a solution (allthough unknown how, since the solution was a reinstallation), but after some automated deployments, this failed again.

Michiel K gravatar imageMichiel K ( 2013-08-13 02:33:27 -0500 )edit

Cause has been identified: Double traffic

Michiel K gravatar imageMichiel K ( 2013-08-13 03:59:51 -0500 )edit
0

answered 2013-08-13 03:18:09 -0500

chen-li gravatar image

Which OS and kernel you're working with ?

edit flag offensive delete link more

Comments

Ubuntu 12.04.2 LTS 3.2.0-51-generic #77-Ubuntu SMP Wed Jul 24 20:18:19 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Michiel K gravatar imageMichiel K ( 2013-08-13 03:46:00 -0500 )edit
0

answered 2013-10-01 07:15:03 -0500

Marin gravatar image

I have a somewhat similar setup, with three (external/provider) VMware networks that should be mapped to the VM in DevStack. I should have L2 connectivity to the provider networks. Will that work with GRE tunnels? I'm using VLAN option in OVS plugin for Neutron, but there are no flows created in OVS - I see you have some flows in your switch configuration.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

4 followers

Stats

Asked: 2013-07-30 01:59:28 -0500

Seen: 2,846 times

Last updated: Oct 01 '13