Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

As I could see in the output of fdb/show what was happening was that the switch was learning the wrong port number for mac addresses of virtual machines connected to it.

I have now found out why.

We have two interfaces on the host, connected to different switches. They are bonded, in active passive. So packets get only send out on one, and whatever traffic comes in on the standby link is ignored. That works.

Now when a VM tries to connect to another host it will send out a an arp packet, which eventually gets send out over the active link of the bonded pair, but because switches will flood broadcast traffic over all their ports it will be received again on the standby link. I always asumed that it would there just get dropped. But ovswitch puts both the interface in promiscuous mode. And there was the root of our problem. This ignores that they are bonded.

So server AAA wants to contact BBB, and sends out an "who has x.x.x.x, tell y.y.y.y" arp packet, with source AAA and destination FFF.

Ovswitch first sees this packet on port 8, and enters "8 AAA" in it's mac table. The packets is flooded out all ports. Via port 1 it ends up on our switch, which floods it out all its ports, and then via the second switch this packet again arrives back on port 1 of the ovswitch, which then changes "8 AAA" in to "1 AAA". And this messes up everything...

I disabled one of the links in the bonded pair and the problem went away. But this is just a workaround.

I will have to do this differently. I'm thinking along the lines of moving the bonding in to the ovswitch. But this requires a reconfiguration of the network, and I therefore need to first enable an out of band connection to it.