Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Neutron Router HA: Failover Issues

Hi, I have a really strange problem that I cannot seem to get to the ground of. I run L3 Agent on (currently) two network nodes and enable HA by default for every router. Everything seems to work, but from time to time weird stuff happens: I have an instances with a floating IP assigned and a web server running on it. Making an http request from the outside, I can see traffic going into the virtual router on network node 0 (net00) and leaving the router towards the instance. Next thing I see is the response coming from the instance, BUT I see this response traffic hitting network node 1 (net01)!!!

This is the "inner" interface (towards the instance) on net00:

qr-4858d71b-cf Link encap:Ethernet  HWaddr fa:16:3e:90:1e:a3  
          inet addr:172.16.100.1  Bcast:0.0.0.0  Mask:255.255.255.0
          inet6 addr: fe80::f816:3eff:fe90:1ea3/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:8950  Metric:1
          RX packets:10818 errors:0 dropped:15 overruns:0 frame:0
          TX packets:13010 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:2295246 (2.2 MB)  TX bytes:2868936 (2.8 MB)

This is on net01:

qr-4858d71b-cf Link encap:Ethernet  HWaddr fa:16:3e:90:1e:a3  
          UP BROADCAST RUNNING MULTICAST  MTU:8950  Metric:1
          RX packets:930414 errors:0 dropped:44 overruns:0 frame:0
          TX packets:433183 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:1212518844 (1.2 GB)  TX bytes:200079736 (200.0 MB)

You can see that the gateway IP sits on net00, but as said: response traffic from the instance arrives on net01. Of course, the response never reaches the client that requested the website.

I also notices that both MAC addresses of the two vrouters are the same! My first instinct was to reboot the instance to clear it's ARP cache but when I saw this it made sense, that clearing the ARP cache didnt work.

Are the MACs supposed to be equal? If yes, what's the solution to this?