Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

I have found the answer to the problem myself. The solution is to enable the "Allow Promiscuous Mode" policy on the port group of the ESXi vSwitch where the interfaces of the OpenStack public network are connected. See below for the complete story, background and some words about the negative performance impact of setting this policy.

As for the complete story: it turns out two assumptions I had were wrong. The first was that virtual ports on an ESXi vSwitch perform MAC learning just like normal switch ports do. It turns out that isn't the case. Since ESXi knows the MAC address of the virtualized Ethernet adapter, it can use that information in the virtual switch: there's simply no need to do MAC learning. By setting the "Allow forged transmits" policy I allowed the virtual Ethernet adapter to send frames coming from different source MAC addresses, but frames coming in from the network to of those MAC addresses were dropped by the vSwitch - while I was under the assumption that these were delivered.

This explains what I've seen quite nicely: when a host on the physical network pings the OpenStack router interface, the host will send out an ARP request for the OpenStack router interface IP address. Since an ARP request is an Ethernet broadcast, the vSwitch will send it to all virtual switchports in the segment. So, the router interface sees it, and will reply to it with the MAC address OpenStack made up for that router interface. The physical host sees the ARP response, and will start sending ICMP Echo Requests to the MAC address it just found out (instead of sending as an Ethernet broadcast). As explained above, the vSwitch will simply drop these frames so communication will fail.

My second incorrect assumption was that traffic from the controller node to the OpenStack router interface (also on the controller node) would remain entirely within the controller node, and that for that reason the vSwitch configuration would not have any effect on this sort of traffic. Again, this is not the case: I got bitten by Linux network namespaces here. A regular login shell runs in another network namespace than the OpenStack router interface lives in. The network namespace where the regular login shell lives in does not have a route to the OpenStack router interface, so it sends traffic to that address to the default gateway - which means it passes through the virtual vSwitch port of the controller node and is subject to the same phenomenon described above: the vSwitch drops it.

The solution turned out to be very simple: configure the "Allow Promiscuous Mode" policy for ESXi vSwitch port group where the OpenStack public network lives. After doing this, everything worked as it should.

Note that there are downsides to this, since Promiscuous Mode on a vSwitch probably does not do what you think it does (unless you're familiar with vSwitches already). Promiscuous Mode doesn't enable MAC address learning or something, no: it means that all traffic is duplicated and sent to all virtual network ports affected by the policy. This allows a Virtual Machine to set its virtualized Ethernet adapter to promiscuous mode, and see all traffic traversing the vSwitch. Just setting a vSwitch or port group within it to Promiscuous Mode can have quite a performance impact. (See http://www.virtuallyghetto.com/2014/08/new-vmware-fling-to-improve-networkcpu-performance-when-using-promiscuous-mode-for-nested-esxi.html for a hack that implements a rudimentary MAC address learning mechanism, so that this traffic duplication does not happen. It is not advisable for production systems, but can be helpful in testing or experimenting setups.)