nova-api really slow to respond
I've got a pretty vanilla Queens openstack setup, followed all the defaults from the official docs with the only big difference being that this cloud is backed by ceph storage. Most of the cli commands take 10-40 seconds to respond, and I've narrowed it down to nova-api. All the debugging logs I've been able to dig up always show the commands waiting on GET requests to port 8774. So any commands like "openstack user list" or "openstack token issue" respond immediately, but things like "openstack server list" or "openstack volume list" get hung for 10-40 seconds waiting for nova-api to respond.
Strace shows it as:
connect(4, {sa_family=AF_INET, sin_port=htons(8774), sin_addr=inet_addr("<controller_ip_address>")}, 16) = 0
sendto(4, "GET /v2.1/6b3bf6c001124e5e99e6d3"..., 405, 0, NULL, 0) = 405
fcntl(4, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(4, F_SETFL, O_RDWR) = 0
recvfrom(4,
# Long wait time here
"HTTP/1.1 200 OK\r\nContent-Length:"..., 8192, 0, NULL, NULL) = 2036
Using openstack with the --debug flag shows it here:
REQ: curl -g -i -X GET http://controller:8774/v2.1/6b3bf6c001124e5e99e6d33f4bacdf64/servers/detail -H "User-Agent: python-novaclient" -H "Accept: application/json" -H "X-Auth-Token: {SHA1}04b713b7f7975229a9b12a6915ec4d89a3196670"
Starting new HTTP connection (1): controller
# Long wait time here
http://controller:8774 "GET /v2.1/6b3bf6c001124e5e99e6d33f4bacdf64/servers/detail HTTP/1.1" 200 1636 RESP: [200] Content-Length: 1636 Content-Type: application/json Openstack-Api-Version: compute 2.1 X-Openstack-Nova-Api-Version: 2.1 Vary: OpenStack-API-Version, X-OpenStack-Nova-API-Version X-Openstack-Request-Id: req-4ea8af6e-437c-4f39-a2de-b10cc9e81$72 X-Compute-Request-Id: req-4ea8af6e-437c-4f39-a2de-b10cc9e81672 Date: Tue, 05 Jun 2018 14:06:35 GMT Connection: keep-alive
The server has more than enough resources with a 16 core CPU and 64G of Ram, it's mostly idle because I'm still getting it up and running. I've looked into tuning mysql and aside from mysqltuner.pl complaining about too many JOINs being used without indexes, mysql looks fine. There's no disk contention or cpu steal that I can see, so my theory is nova-api is timing out on something. I read somewhere that someone fixed a similar issue by commenting out the "::1 localhost" line in /etc/hosts because this turned out to be an ipv6 issue for them, but that didn't fix it for me.
Has anyone else come across this and know of a fix for it? It's not preventing the cloud from working because everything responds eventually, but it's affecting Horizon to the point where each page load takes 30-60 seconds so I would like to figure out a fix for it eventually.
Can't tell if it's related, but we had major performance issues with horizon because our memcached was not configured correctly. Sometimes we had timeouts when listing all instances in the admin panel although there were only 20 of them. Fixing the memcache config reduced this to 5 to 8 seconds.
What do you mean by fixing it? The only default option I changed was I removed the "-l" flag altogether to let it listen globally. I just tested updating it to listen specifically on its public ip and loopback ip's, and it does seem like commands are returning within 10 seconds now
We mixed up the configs for memcache:
This boosted our performance! Also we bind memcached to specific addresses:
Thanks eblock, that looked promising but it didn't make any difference in performance. It actually messed something up with how the VNC tokens get generated/verified so the VNC console stopped working until I undid that attempt. I haven't done a lot more testing recently but it's still an issue here