what's better solution for neutron l3 ha(till kilo)

asked 2015-10-19 03:37:03 -0600

Maple Wang gravatar image

I'm studying the network solution in openstack of kilo, there is hot topic about l3 ha. generally, there is three solutions about that as I know:

1) VR rescheduling: this is built-in way of kilo for centralized VR, once VR is down, its namespace will be rescheduled to other working l3-agent, but with slow down-time and no load balance

2) VRRP: VRs will be managed by keepalived with VRRP protocol, once master VR is down, master role with VIP will be scheduled to standby VRs, and VRRP group can provide the load balance solution. but current solution in kilo, i have not seen any VRRP group settings, or just because I missed it?

3) DVR: technically, it's not ha solution but only for distributing traffic from network node to compute node. but I'm thinking once l3 agent on certain compute node fails, how to direct network traffic to others? what's HA for DVR?

so there are two facts for network solution I'm concerning: traffic distributed and High availability, it seems all ways above on current neutron can't do both as I see.

Is there any better solution for that in real product environment? or what's blueprint in community to fix that?

best regards.

answered 2015-10-19 03:59:15 -0600

dbaxps gravatar image

updated 2016-03-13 05:08:51 -0600

UPDATE 03/14/2016
Seems to work for RDO Mitaka M3
Final revisions here
Testing via Delorean trunks (Mitaka M3)
HA support for DVR centralized default SNAT functionality on RDO Mitaka Milestone 3

Addressing 3 what's HA for DVR?

L3 Agent support for routers with HA and DVR. The main difference for DVR HA routers is where  the VRRP/keepalived logic is run and which ports fall in the HA domain for DVR. Instead of running in the qrouter namespace, keepalived will run inside the snat-namespace. Therefore only snat ports will fall under the control of the HA domain.

Regarding VRRP group per

VRRP groups: The VRRP header includes a Virtual Router Identifier,or VRID. Half of the network hosts will configure the first VIP, and the other half the second. In the case of a failure, the VIP previously found on the failing router will transfer to another one.

Now focus on section "Back to Neutron-land"

That is the output for the master instance. The same router on another node would have no IP address on the ha, qr, or qg devices.It would have no floating IPs or routing entries. These are persisted as configuration values in keepalived.conf, and when keepalived detects the master instance failing, these addresses (Or: VIPs) are configured by keepalived on the appropriate devices.
great, thanks for posting this info, it's understandable for including SNAT in HA with keepalived, but how about VR distributed in compute nodes, which I'm really concerning about ? once VR on compute node fail, how to handle the traffic?

Maple Wang gravatar imageMaple Wang ( 2015-10-19 04:11:42 -0600 )edit

First idea comes to my mind is that external traffic from VMs (running on this node) should be routed via centralized SNAT ( at least one from keepalived pair should be up )
All the thread is here

dbaxps gravatar imagedbaxps ( 2015-10-19 05:05:33 -0600 )edit

that's exactly what I want to know, really appreciate your help.

Maple Wang gravatar imageMaple Wang ( 2015-10-19 22:54:12 -0600 )edit

I just don't see any other way outside.Patch is scheduled for Mitaka-m1, I am not expecting upstream's instructions for HA&&DVR any time soon.

dbaxps gravatar imagedbaxps ( 2015-10-20 01:45:02 -0600 )edit

do we need to wait until "M" at least? it will be on April next year.

Maple Wang gravatar imageMaple Wang ( 2015-10-20 02:07:09 -0600 )edit

