Haproxy/Galera shared, cannot connect over VIP

asked 2017-11-23 05:40:41 -0500

updated 2017-12-01 05:26:38 -0500

Summary: I have a highly-available database cluster using galera and haproxy over corosync/pacemaker where I can connect using a node's actual IP address, but cannot using its virtual IP.

The long and full explanation, with relevant configuration files

First off, there are some similar problems to be found on the internet, although not the exact configuration I have. The existing most-similar question/answer to mine is this one: https://ask.openstack.org/en/question/25868/ha-not-able-to-connect-with-virtualip/ (https://ask.openstack.org/en/question...)

There are some subtle differences;

My configuration: 3 servers running as controllers; running all the openstack services on bare metal. That includes haproxy, corosync, and pacemaker. E.g. the database hosts are also the haproxy hosts.

(We want high-availability and no split-brain risk; but have only 5 available machines).

I'm following the default installation guide under https://docs.openstack.org/ha-guide/, installing the current stable version of Openstack on 5 machines running debian-9.

We have a vlan-capable switch so additional networks beyond the two NICs available to each machine can be done this way.

Machines have a network set up for haproxy; the IP was set as a virtual (VIP) address. I can connect from either controller (,, to and verify that it’s currently set as the first machine. I can SSH to it as well, modify a file, and check that this succeeds. I have a working, running Galera cluster. I can connect with say

mysql –h –D keystone –u keystone –p –P 3306.

This works from all machines. (Already implemented part of the ‘keystone config’ from the HA guide). I can connect and view my empty keystone database and do operations on it. These get executed on all cluster nodes.

However, once I try to do this:

mysql –h –D keystone –u keystone –p –P 3306.

This error will occur:

ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 0 "Internal error/check (Not system error)"

Which apparently is some sort of standard 'I could not connect' error. It supplies a reason with the flag constant, but for my case it's 0, or 'sorry, we don't know why'.

We can run some additional shell code to do some checks. Here's some additional information;

root@st01:/etc/mysql/mariadb.conf.d# telnet 3306
Connected to
Escape character is '^]'.
^C Connection closed by foreign host.
root@st01:/etc/mysql/mariadb.conf.d# telnet 3306
Connected to
Escape character is '^]'.
Connection closed by foreign host.
root@st01:/etc/mysql/mariadb.conf.d# ip route get 10 ...
2 answers

answered 2017-12-01 05:27:25 -0500

I seem to have found the issue; it's the port 9200 parts in the galera_cluster stanzas of the /etc/haproxy/haproxy.cfg file. These are for use with 'clustercheck'. Removing them makes everything work.

answered 2018-03-08 21:13:18 -0500

Hi, I had the same issue as you, maybe a little bit different, how did you resolve your issue?

