Ask Your Question
0

Instance directly connected to provider network does not receive DHCP reply

asked 2013-07-23 11:39:03 -0500

michiel-a gravatar image

Short description of problem: In an envrionment where virtual routers and floating IP addresses are fully working, connecting an instance directly to the provider network does not work. The instance is unable to communicate with the network. More specifically: It is able to send a DHCP request message, but does not receive a DHCP reply. The reply message is visible on the physical network.

Details on the environment: 1 controller node 2 compute nodes 2 network nodes

All nodes have access to a management network (eth0). The compute and network nodes also have access to a tenant network (OVS GRE on eth1) and a provider network (eth2). eth0 and eth1 have an IP. Eth2 is configured without an IP.

All nodes are running as virtual machines in a manually maintained VMware environment. The only required / specialized change was to allow promisc mode on the provider network, otherwise for example the virtual routers on the network nodes did not receive network traffic on eth2.

What does work: - create tenant network - create provider network - create router - create instance connected to tenant network - create floating ip and assign it to instance - test network traffic from instance to provider network (checked if the source ip of the ping is the floating IP: yes) - test network traffic from provider network to floating ip (enable port 22 on security group and try ssh'ing to the instance): works

The following steps are of interest: - create a new instance connected to provider network - check (using "nova list" and "nova show") to see if a IP address was provisioned from the provider network: yes. - once the instance boots, login (via console) into the instance see if it is able to get the IP address from the DHCP server: No. - statically set the IP address to the interface (inside the instance, using "ifconfig eth0 x.x.x.x netmask x.x.x.x") and test communication: fails, both traffic from instance to provider network and the other way around.

Starting from the drawing in http://docs.openstack.org/trunk/openstack-network/admin/content/under_the_hood_openvswitch.html (http://docs.openstack.org/trunk/opens...) i started debugging where the DHCP packets would get lost.

The DHCP Request packet reaches all the way from the instances to the dnsmasq process on one of the network nodes. One thing which catches my attention is that all packets seem to be duplicated for some reason, but this should not be a problem. This only appears to happen on bridges connected to the provider network. When looking for example in the qdhcp-xxxx-xxx.. namespace on the network node, i only see the packet once, so i will ignore this for now.

dnsmasq on the network node replies with an DHCP Reply packet and reaches onto the provider network, back into the eth2 interface of the compute node:

(node2 is the compute node where the instance is running, verified fa:16:3e:47:d9:1b is the mac of the instance, and 192.168.103.230 is the ip ... (more)

edit retag flag offensive close merge delete

4 answers

Sort by ยป oldest newest most voted
0

answered 2013-07-30 07:01:18 -0500

michiel-a gravatar image

I discovered http://ask.openstack.org and posted the question also there ( https://ask.openstack.org/question/3555/instance-directly-connected-to-provider-network-does-not-receive-dhcp-reply/ (https://ask.openstack.org/question/35...) ) - If this is unwanted, please let me know.

edit flag offensive delete link more
0

answered 2013-08-05 07:58:00 -0500

michiel-a gravatar image

Posted possible solution in https://ask.openstack.org/en/question/3555/instance-directly-connected-to-provider-network-does-not-receive-dhcp-reply/?answer=3740#post-id-3740 (https://ask.openstack.org/en/question...)

Environment might have been externally influenced eventhrough the TCPdumps were strongly suggesting a problem in Open vSwitch. After complete redeploy of the environment in the last few days provider networks works out of the box.

edit flag offensive delete link more
0

answered 2013-08-13 07:18:05 -0500

michiel-a gravatar image

Unfortunately, the possible solution did not work out fully. New environments deployed still have this problem, so i still require some assistance with this problem.

edit flag offensive delete link more
0

answered 2013-08-13 08:55:28 -0500

michiel-a gravatar image

Found the cause: The double traffic (see the TCPdumps) screws up the ARP tables. The double traffic is due to an error in physical networking configuration.

Outgoing traffic from the compute node's eth2 (which is a virtual machine within the vmware environment) is receiving echo's of its outgoing traffic. See the TCP dumps. This screws up the ARP tables in the Open vSwitch switches, causing them to think the mac address of the instance is located on eth2, instead of bridges deeper down into the compute node. As a result, when Open vSwitch receives the DHCP Reply, it sends it out eth2, instead of forwarding it to the instance.

The echo'ing only happens if the policy on VMware's vSwitch is set to allow promisc traffic. This however is not a problem within the VMware environment, but rather a problem in the physical switches connecting those physical VMware servers. I haven't found this cause yet, but i did connect my environment to an isolated set of networks and it started working.

edit flag offensive delete link more

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2013-07-23 11:39:03 -0500

Seen: 118 times

Last updated: Aug 13 '13