Ask Your Question
0

Router interface on provider network replies to ARP requests but nothing else

asked 2016-12-02 04:00:05 -0500

Jurjen gravatar image

Hi,

I'm trying to learn OpenStack by following the OpenStack Installation Tutorial for Ubuntu, which describes the Newton release. I've chosen to use network option 2 in that tutorial, which results in a provider network and a self-service network. Everything looked OK, up until the point where the tutorial asks to ping the IP address of the gateway IP address on the provider network: no matter where I ping from, replies never come.

I noticed that ARP resolution for the gateway IP address did work: when pinging from my desktop machine (which is in the provider network), its ARP table contained the same MAC-address that is shown in the output of neutron router-port-list.

Also, when starting an instance it doesn't seem to receive an IP address via DHCP: if I interpret the log correctly, the instance doesn't seem to receive any replies to the DHCPDISCOVERs it sends.

My guess is that once I solved the ping problem, the DHCP problem will also be solved.

I double checked whether I followed the tutorial correctly, but I wasn't able to find what I did wrong. I would be grateful if somebody could point me in the right direction.


My setup:

I've created a controller node and one compute node. These are Ubuntu 16.04 LTS virtual machines on an ESXi 6.5 host. The ESXi host has just one physical network port. Each VM has two virtual network interfaces: one connected to the default (untagged) VLAN, another connected to a tagged VLAN3. The vSwitch is configured to allow forged transmits and MAC changes, but is not configured for promiscuous mode. This configuration seems to work correctly: when I configure a static address on the virtual interfaces connected to the provider network (the untagged VLAN), I can ping these from my desktop just fine.

Network:
10.0.0.0/24: VLAN3, gateway 10.0.0.253, routed/NATted to the internet
192.168.178.0/24: default VLAN, gateway 192.168.178.253. routed/NATted to the internet

Controller node:
ens160: 10.0.0.100
ens192: unnumbered

Compute node:
ens160: 10.0.0.120
ens192: unnumbered

Pinging from the gateway to 192.168.178.155 results in no ping replies, and this tcpdump fragment and ARP cache contents:

09:27:24.728856 arp who-has 192.168.178.155 tell 192.168.178.253
09:27:24.729116 arp reply 192.168.178.155 is-at fa:16:3e:55:6a:e5

192.168.178.155                      fa:16:3e:55:6a:e5    re0 11m31s

...which is the MAC address of the gateway as shown in the output of neutron router-port-list as shown below. Pinging from my Windows desktop (which is also in the untagged VLAN and the 192.168.178/24 subnet) results in the same: an ARP cache entry but no replies to the ping:

192.168.178.155       fa-16-3e-55-6a-e5     dynamic

The OpenStack network configuration is as follows:

joskam@controller:~$ openstack service list
+----------------------------------+----------+----------+
| ID                               | Name     | Type     |
+----------------------------------+----------+----------+
| 5b8761e7b63e49428254dd21ad546b93 | nova ...
(more)
edit retag flag offensive close merge delete

Comments

Can you ping the other direction? Ping from the router to the gateway? You can get into the router using network namespaces. Do:

# ip netns exec 30c10e23-5d2f-4c71-a305-ad54ce8d92e6 ping 192.168.178.253

You can also do things like checking the routing table and interfaces inside that namespace.

vern gravatar imagevern ( 2016-12-05 17:00:35 -0500 )edit

Thanks. While digging further, I found that pinging the router from within its own namespace did work, and some further tcpdumping of several interfaces later the realization sunk in that I was seeing traffic from the LAN, but only broadcast traffic, and it had to be the vSwitch somehow.

Jurjen gravatar imageJurjen ( 2016-12-06 13:38:28 -0500 )edit

3 answers

Sort by ยป oldest newest most voted
1

answered 2016-12-06 13:34:22 -0500

Jurjen gravatar image

I have found the answer to the problem myself. The solution is to enable the "Allow Promiscuous Mode" policy on the port group of the ESXi vSwitch where the interfaces of the OpenStack public network are connected. See below for the complete story, background and some words about the negative performance impact of setting this policy.

As for the complete story: it turns out two assumptions I had were wrong. The first was that virtual ports on an ESXi vSwitch perform MAC learning just like normal switch ports do. It turns out that isn't the case. Since ESXi knows the MAC address of the virtualized Ethernet adapter, it can use that information in the virtual switch: there's simply no need to do MAC learning. By setting the "Allow forged transmits" policy I allowed the virtual Ethernet adapter to send frames coming from different source MAC addresses, but frames coming in from the network to of those MAC addresses were dropped by the vSwitch - while I was under the assumption that these were delivered.

This explains what I've seen quite nicely: when a host on the physical network pings the OpenStack router interface, the host will send out an ARP request for the OpenStack router interface IP address. Since an ARP request is an Ethernet broadcast, the vSwitch will send it to all virtual switchports in the segment. So, the router interface sees it, and will reply to it with the MAC address OpenStack made up for that router interface. The physical host sees the ARP response, and will start sending ICMP Echo Requests to the MAC address it just found out (instead of sending as an Ethernet broadcast). As explained above, the vSwitch will simply drop these frames so communication will fail.

My second incorrect assumption was that traffic from the controller node to the OpenStack router interface (also on the controller node) would remain entirely within the controller node, and that for that reason the vSwitch configuration would not have any effect on this sort of traffic. Again, this is not the case: I got bitten by Linux network namespaces here. A regular login shell runs in another network namespace than the OpenStack router interface lives in. The network namespace where the regular login shell lives in does not have a route to the OpenStack router interface, so it sends traffic to that address to the default gateway - which means it passes through the virtual vSwitch port of the controller node and is subject to the same phenomenon described above: the vSwitch drops it.

The solution turned out to be very simple: configure the "Allow Promiscuous Mode" policy for ESXi vSwitch port group where the OpenStack public network lives. After doing this, everything worked as it should.

Note that there are downsides to this, since Promiscuous Mode on a vSwitch probably does not do what you think it does (unless you're familiar with vSwitches already). Promiscuous Mode doesn't enable MAC address learning or something ... (more)

edit flag offensive delete link more
0

answered 2017-07-13 02:18:09 -0500

Thanks Jurjen ;

I too had same issue after enabling Promiscuous Mode on Vswitch, Neutron qrouter work as expected.

Again thank you very much.

edit flag offensive delete link more
0

answered 2017-06-30 07:03:47 -0500

Tiago Farias gravatar image

Thanks Jurjen!!!

After 10 days working on it and with no success I was about to forget Openstack. You saved me, my controller was running under vmware and I was not able to find the problem by myself.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

2 followers

Stats

Asked: 2016-12-02 04:00:05 -0500

Seen: 748 times

Last updated: Jun 30 '17