Ask Your Question
0

Haproxy/Galera shared, cannot connect over VIP

asked 2017-11-23 05:40:41 -0500

netproducts gravatar image

updated 2017-12-01 05:26:38 -0500

Summary: I have a highly-available database cluster using galera and haproxy over corosync/pacemaker where I can connect using a node's actual IP address, but cannot using its virtual IP.

The long and full explanation, with relevant configuration files

First off, there are some similar problems to be found on the internet, although not the exact configuration I have. The existing most-similar question/answer to mine is this one: https://ask.openstack.org/en/question/25868/ha-not-able-to-connect-with-virtualip/ (https://ask.openstack.org/en/question...)

There are some subtle differences;

My configuration: 3 servers running as controllers; running all the openstack services on bare metal. That includes haproxy, corosync, and pacemaker. E.g. the database hosts are also the haproxy hosts.

(We want high-availability and no split-brain risk; but have only 5 available machines).

I'm following the default installation guide under https://docs.openstack.org/ha-guide/, installing the current stable version of Openstack on 5 machines running debian-9.

We have a vlan-capable switch so additional networks beyond the two NICs available to each machine can be done this way.

Machines have a network set up for haproxy; 10.0.44.0/24. the IP 10.0.44.250 was set as a virtual (VIP) address. I can connect from either controller (10.0.44.1, 10.0.44.2, 10.0.44.5) to 10.0.44.250 and verify that it’s currently set as the first machine. I can SSH to it as well, modify a file, and check that this succeeds. I have a working, running Galera cluster. I can connect with say

mysql –h 10.0.44.1 –D keystone –u keystone –p –P 3306.

This works from all machines. (Already implemented part of the ‘keystone config’ from the HA guide). I can connect and view my empty keystone database and do operations on it. These get executed on all cluster nodes.

However, once I try to do this:

mysql –h 10.0.44.250 –D keystone –u keystone –p –P 3306.

This error will occur:

ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 0 "Internal error/check (Not system error)"

Which apparently is some sort of standard 'I could not connect' error. It supplies a reason with the flag constant, but for my case it's 0, or 'sorry, we don't know why'.

We can run some additional shell code to do some checks. Here's some additional information;

root@st01:/etc/mysql/mariadb.conf.d# telnet 10.0.44.1 3306
Trying 10.0.44.1...
Connected to 10.0.44.1.
Escape character is '^]'.
5.5.5-10.1.26-MariaDB-0+deb9u1<sB.]l'xJ-?▒GfL)&#}m**1Xmysql_native_password
^C Connection closed by foreign host.
root@st01:/etc/mysql/mariadb.conf.d# telnet 10.0.44.250 3306
Trying 10.0.44.250...
Connected to 10.0.44.250.
Escape character is '^]'.
Connection closed by foreign host.
root@st01:/etc/mysql/mariadb.conf.d# ip route get 10 ...
(more)
edit retag flag offensive close merge delete

2 answers

Sort by » oldest newest most voted
0

answered 2018-03-08 21:13:18 -0500

Hi, I had the same issue as you, maybe a little bit different, how did you resolve your issue?

edit flag offensive delete link more
0

answered 2017-12-01 05:27:25 -0500

netproducts gravatar image

I seem to have found the issue; it's the port 9200 parts in the galera_cluster stanzas of the /etc/haproxy/haproxy.cfg file. These are for use with 'clustercheck'. Removing them makes everything work.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2017-11-23 05:34:28 -0500

Seen: 754 times

Last updated: Dec 01 '17