Ask Your Question

Keystone loses token on a HA setup with Galera. [closed]

asked 2015-03-11 14:45:39 -0500

jpmethot gravatar image

Right now I am building a small HA openstack setup with a Galera database back-end. Galera is set in an active/active configuration, but I may change it to active/passive as active/active is known to cause deadlocks. Anyhow, the main problem I have here is that the HA keystone sometimes fails when it creates a token and then tries to retrieve it in the database.

So for example, when I do glance image-list, about one out of 10 times, it will reply with

Request returned failure status.
Invalid OpenStack Identity credentials.

If I then look at the keystone logs, I find the following error :

2015-03-11 15:31:50.876 20559 WARNING keystone.common.wsgi [-] Could not find token, 35ddb3e08d694dad80f1c52ddba1da62.

I also used to have a "MySQL has gone away error", but that has disappeared since I upgraded the galera server's kernel and mariadb version.

If I search in the db for the token that was "lost", I actually find it, so it is being written and replicated among my nodes. Other interesting fact, if I set haproxy to only redirect to one galera node, keystone has no issue at all finding the token. Additionally, I tested with a lot of timeout values in haproxy and that doesn't seem to be the cause of this problem.

I'd really like to keep it as active/active, as active/passive would make galera pretty useless (I'd use drbd instead). So has anyone any idea what could be causing this problem and how to fix it?

edit retag flag offensive reopen merge delete

Closed for the following reason the question is answered, right answer was accepted by jpmethot
close date 2015-04-07 11:01:47.739265

4 answers

Sort by ยป oldest newest most voted

answered 2015-04-07 10:07:16 -0500

jpmethot gravatar image

I guess I should have posted this earlier, but I did end up solving this issue. There were several misconceptions in my initial question, I will not cover them, but you should be able to see where those are when I explained how I fixed it.

Basically, I figured out that this happens because open connections kept being flushed by haproxy while both openstack and mysql still considered them active. So, connections to the DB would work for a while, but then as openstack would try to reuse previously opened connections, those would get flushed by HAproxy.

To fix it, I ended up setting the HAproxy timeout obscenely high (something like 20 hours) to make sure that it would cut the connection. Lower valuees like 5 or 10 minutes would still trigger timeouts. I am not aware if there really is a danger of too many connections opened to mysql this way, but until now the numbers have been fairly stable.

edit flag offensive delete link more


i had the exact problem only that mysql keeped on closing connections that openstack services still used. That's why i increased the timeout in mysql settings!

If you solved this problem please close the question!

capsali gravatar imagecapsali ( 2015-04-07 10:41:46 -0500 )edit

answered 2015-03-23 14:01:26 -0500

Just a suggestion why not use memcache to store your tokens, it should respond faster and give your db some much needed relief.

edit flag offensive delete link more



This is not an immediate solution because we have multiple keystone servers load-balanced.

MentheAlow gravatar imageMentheAlow ( 2015-04-03 12:19:52 -0500 )edit

Your blog post is wrong. You've to use pt-archiver instead of deleting the expired tokens using standard database manipulation commands because it prevents the Keystone database from being blocked for significant time periods while the rows with expired tokens are deleted.

MentheAlow gravatar imageMentheAlow ( 2015-04-03 12:25:56 -0500 )edit

answered 2015-04-07 09:13:29 -0500

capsali gravatar image

I used to have the same problem with almost the same setup. I use galera mariadb 5.5 cluster since when i installed neutron it had a bug and could not populate db in mariadb 10.

If i used mariadb in haproxy conf i would run in the exact same problem. Instead i configured a VIP on all galera nodes and pointed all [database] sections in openstack configs to that VIP.

So i would try to remove galera cluster from haproxy config and try the VIP route for mysql.

The problem nearly dissapeard. I also increased, in my.cnf, wait_timeout to 28800 and connect_timeout to 20! I did this because i was facing an issue where an openstack agent would try to connect to a socket that mysql closed it due to inactivity.

edit flag offensive delete link more

answered 2015-04-03 12:11:52 -0500

MentheAlow gravatar image

updated 2015-04-04 01:53:07 -0500

Same issue here.

My setup has 3 controllers with MySQL/Galera + HAProxy (Every MySQL server in "backup mode" : only one is actually used). How to reproduce ? Launch 10 instances in Horizon --> 33% will be stuck in "Building - No State" with this error in Keystone logs:

2015-04-03 19:13:46.663 10695 WARNING keystone.common.wsgi [-] Could not find token: 879b4ff1d25a4974ace49cf293734aba

I partially solved it by changing the HAProxy balancing mode from Round-Robin to Source for Keystone Public and Admin clusters. It means that I still have some missing tokens but at least, it doesn't block instance creations anymore. Hope it helps.

edit flag offensive delete link more

Get to know Ask OpenStack

Resources for moderators

Question Tools



Asked: 2015-03-11 14:45:39 -0500

Seen: 2,342 times

Last updated: Apr 07 '15