asked 2020-05-01 09:38:16 -0500

I have 3 control nodes with shared storage. VRRP management address and haproxy installed, all deployed via Kolla so running in containers. For some reason I can only access horizon when the node that I deployed everything from is running. When I test shut it down, I can get to horizon login page fine, but it fails to login, just hangs for ages then, 504 timeout.

I'm not sure where to start looking, somehow I think that the keystone auth is timing out, not sure why.

There are no errors that I can see in keystone or memcached.

At one point I tried meddling with the haproxy files for mariadb/memcached by removing "backup" from the other servers and managed to get a different error, in this case it plain refused login for Admin

Any ideas on where I should start looking first? Iv'e tried reinstalling from scratch thinking I'd done something wrong the first time but still happens!

Thanks, Jon.

2 answers

answered 2020-05-02 08:52:53 -0500

My gut feeling is telling me that something is wrong with the routing table on the controllers. It is possible that he is not using the public GW as the Default GW but the control plane one. IN order to start debugging I would: Stop the deployer node and use the CLI from another machine to access openstack (not horizon yet). Are you able to use the CLI to spawn a VM for example? Another theory is that apache only allows connections from Specific IP ranges which is configurable to allow all (For example)

answered 2020-05-01 09:40:19 -0500

Just to add, everything else has been working fine, storage, compute, live migrations etc. just dropping this particular node kills the login.

Also the "management" and main endpoint network is a wireguard interfaces wg0, although not sure why that would affect anything other than performance of Ceph running over it, which can get a little slow.

