Ask Your Question
0

openstack kolla - HA databases unable to start

asked 2018-06-11 01:59:52 -0500

theque42 gravatar image

I've installed an 8 node cloud with openstack kolla ansible, with dual controllers.

It all works fine, until I stop and start it. (Which is related to my earlier question on stopping a cloud)

On boot, the mariadb/galera components dont seem to be able to start/sync.

I started the first controller and maria db log says: (from the previous shutdown)

2018-06-08 17:26:00 140662590556928 [Note] InnoDB: Online DDL : Completed
2018-06-08 20:12:08 140664414825216 [Note] /usr/libexec/mysqld: Normal shutdown

2018-06-08 20:12:08 140664414825216 [Note] WSREP: Stop replication
2018-06-08 20:12:08 140664414825216 [Note] WSREP: Closing send monitor...
2018-06-08 20:12:08 140664414825216 [Note] WSREP: Closed send monitor.
2018-06-08 20:12:11 140664414825216 [Note] WSREP: gcomm: terminating thread
2018-06-08 20:12:11 140664414825216 [Note] WSREP: gcomm: joining thread
2018-06-08 20:12:11 140664414825216 [Note] WSREP: gcomm: closing backend
2018-06-08 20:12:15 140664414825216 [Note] WSREP: (958d6a22, 'tcp://172.16.101.100:4567') turning message relay requesting on, nonlive peers: tcp://172.16.101.109:4567
2018-06-08 20:12:16 140664414825216 [Note] WSREP: (958d6a22, 'tcp://172.16.101.100:4567') reconnecting to 8361901d (tcp://172.16.101.109:4567), attempt 0
180611 08:44:29 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql/
180611 08:44:29 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql//wsrep_recovery.zFLp3L' --pid-file='/var/lib/mysql//ctrl1.lab1.stack-recover.pid'
2018-06-11  8:44:29 139883731843264 [Note] /usr/libexec/mysqld (mysqld 10.1.20-MariaDB) starting as process 184 ...
180611 08:44:51 mysqld_safe WSREP: Recovered position 6f86c600-6b10-11e8-97f6-9b647609adad:43931
2018-06-11  8:44:52 140457847679168 [Note] /usr/libexec/mysqld (mysqld 10.1.20-MariaDB) starting as process 220 ...
2018-06-11  8:44:52 140457847679168 [Note] WSREP: Read nil XID from storage engines, skipping position init
2018-06-11  8:44:52 140457847679168 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
2018-06-11  8:44:52 140457847679168 [Note] WSREP: wsrep_load(): Galera 3.16(r5c765eb) by Codership Oy <info@codership.com> loaded successfully.
2018-06-11  8:44:52 140457847679168 [Note] WSREP: CRC-32C: using "slicing-by-8" algorithm.
2018-06-11  8:44:52 140457847679168 [Note] WSREP: Found saved state: 6f86c600-6b10-11e8-97f6-9b647609adad:-1
2018-06-11  8:44:52 140457847679168 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 172.16.101.100; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.listen_addr = tcp://172.16.101.100:4567; gmcast.segment = 0; gmcast.version = 0; ist.recv_addr = 172.16.1
2018-06-11  8:44:52 140457752131328 [Note] WSREP: Service thread ...
(more)
edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted
1

answered 2018-06-11 03:49:36 -0500

Hi, stopping galera needs to be done in proper order manually. kolla-ansible stop is not enough for it.

Find the master node in your cluster, stop slaves, and finally stop master. To start first master, then slaves.

If this is not done, the cluster is broken due slaves doenst have the last writte.

If now is broken. stop your mariadb containers and execute kolla-ansible -i inventory mariadb_recovery.

This command will find the last master, and start the database recovery process.

Regards

edit flag offensive delete link more

Comments

Since you (well deservedly) set me up for a RTFM last time, I did try harder this time, but I cant find any mention of "kolla-ansible STOP" command in the list at :

https://docs.openstack.org/kolla-ansi...

A million thanks for your help

theque42 gravatar imagetheque42 ( 2018-06-11 07:05:00 -0500 )edit

Sorry, is undocumented, will add a patch later to docs. https://github.com/openstack/kolla-ansible/blob/master/tools/kolla-ansible#L66

See kolla-ansible --help

Eduardo Gonzalez gravatar imageEduardo Gonzalez ( 2018-06-11 08:14:59 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2018-06-11 01:59:52 -0500

Seen: 16 times

Last updated: Jun 11