Ask Your Question
0

Upgrade single-control environment to high availability

asked 2018-03-08 09:55:35 -0500

eblock gravatar image

Hi experts,

I have an existing cloud (Ocata) which developed from demo to production environment. Now I have to find a way to make it highly available, starting with the control node.

The plan is to leave the existing single-controller up and running while I configure two new servers in HA mode with Pike release in the meantime. There are two main aspects causing some headaches: database and networking. I believe the database part could be tricky but manageable, stop mysql at some point and dump the DB, then import it to the new control node(s) (maybe on shared storage) running a galera cluster and hope that it works.

But what about neutron and the self-service networks with all the virtual routers etc.? Is it even possible to recreate the neutron environment on a different node? I read the guide on how to make neutron ha if you start from scratch, but is my approach realisticly possible?

I would really appreciate any insights from you guys. Is there maybe someone who has done this and could comment my approach?

Thank you in advance for any help!

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted
0

answered 2018-03-08 17:55:17 -0500

A challenging task. I haven't done this myself, though I have experience with OpenStack HA setup "by design". Let's break it down by service type and network components, assuming you're familiar with the OpenStack HA guide and have an idea how to configure the individual services:

  • Database - Bootstrapping Galera cluster from a standalone node is a relatively uncomplicated task with just a short downtime needed. You place in the config, stop the service, start it with service mysql bootstrap (assuming systemd on Ubuntu 16.04) or a more versatile variant service mysql start --wsrep-new-cluster. Then you start the other nodes, wait for SST (initial data transfer) and then restart the first node, just so that it runs in "normal" mode and not in "bootstrap" mode, vaguely said. Have a look at this writeup, for example. WARNING: As always, do a backup before taking any action. Bootstrapping against an empty node by mistake WILL wipe out your database.
  • Message queue - This one also requires some downtime, but you face almost no danger since OpenStack doesn't keep any long-lived persistent data in there. Just configure RabbitMQ according to the HA guide and make a cluster out of the conrollers' instances.
  • Corosync - Virtual IP - This Virtual IP will be a new element in your cluster, so no breaking change here. Again, HA Guide has you covered. It's basically an IP address that's always up and present on the controller that's working.
  • HAProxy and all the "listening" services with it - Although it is possible to just run on the Virtual IP, I strongly recommend putting a traffic balancer in between the Virtual IP and other services. The HA Guide's HAProxy config is a very good start, but you'll have to configure all the services that listen on all interfaces by default to just listen on the controller's management IP address. This is because HAProxy will be listening on the Virtual IP address, which would be impossible if the port would already have been taken. You'll need to search through each project's config refrence; look for the keywords "bind" or "listen" in general (e.g. for Nova, it's osapi_compute_listen and metadata_listen for its two APIs)

And then there's Neutron. Basically this is a change that is independent of your effort to make management highly available, so I advise you not to do this in the same run, just to concentrate on one thing and to do it well.

The same goes for the OpenStack upgrade by the way - my personal approach would be: upgrade first, then do the management HA setup, then fiddle with Neutron.

In general, introducing a new network scenario to Neutron doesn't work without downtime (planned or not, sadly). I advise you to go through:

(more)
edit flag offensive delete link more

Comments

Thanks for your answer! I have 2 new servers that I want to setup in ha mode, the existing control node will be replaced by these two, just to clarify this aspect. We use linuxbridge in our environment, so this is the way to go.

eblock gravatar imageeblock ( 2018-03-09 01:41:37 -0500 )edit

We already have existing self-service networks although most VMs run in provider networks, so a downtime is not a deal breaker. But just to clarify: after configuring neutron according to ha guide I'll have to recreate all existing virtual routers? What about existing ports etc.?

eblock gravatar imageeblock ( 2018-03-09 01:44:20 -0500 )edit

Oh, okay. However take into account that with Linux Bridge + VRRP, all the north-south traffic (be it from a fixed IP or a floating IP) passes through a network node which adds latency. It also introduces an additional point of failure for your instances' traffic, despite being HA.

Peter Slovak gravatar imagePeter Slovak ( 2018-03-09 03:45:33 -0500 )edit

OVS + DVR on the other hand direct N-S traffic with a floating IP directly to/from compute nodes. The SPOF there used to be SNAT traffic from a fixed IP which still passes via network nodes, but the VRRP enhancement solves this. Note that a network node and a controller can be one physical node.

Peter Slovak gravatar imagePeter Slovak ( 2018-03-09 03:48:10 -0500 )edit

I'm not 100% sure on the router recreate, but according to this you have to shut the router down when adding VRRP. I would at least migrate the router to another node to make sure L3 agent recreates it from scratch.

Peter Slovak gravatar imagePeter Slovak ( 2018-03-09 03:51:57 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2018-03-08 09:55:35 -0500

Seen: 34 times

Last updated: Mar 08