Revision history [back]

click to hide/show revision 1
initial version

nova-api really slow to respond

I've got a pretty vanilla Queens openstack setup, followed all the defaults from the official docs with the only big difference being that this cloud is backed by ceph storage. Most of the cli commands take 10-40 seconds to respond, and I've narrowed it down to nova-api. All the debugging logs I've been able to dig up always show the commands waiting on GET requests to port 8774. So any commands like "openstack user list" or "openstack token issue" respond immediately, but things like "openstack server list" or "openstack volume list" get hung for 10-40 seconds waiting for nova-api to respond.

Strace shows it as:

connect(4, {sa_family=AF_INET, sin_port=htons(8774), sin_addr=inet_addr("<controller_ip_address>")}, 16) = 0
sendto(4, "GET /v2.1/6b3bf6c001124e5e99e6d3"..., 405, 0, NULL, 0) = 405
fcntl(4, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(4, F_SETFL, O_RDWR)               = 0

# Long wait time here

"HTTP/1.1 200 OK\r\nContent-Length:"..., 8192, 0, NULL, NULL) = 2036

Using openstack with the --debug flag shows it here:

REQ: curl -g -i -X GET http://controller:8774/v2.1/6b3bf6c001124e5e99e6d33f4bacdf64/servers/detail -H "User-Agent: python-novaclient" -H "Accept: application/json" -H "X-Auth-Token: {SHA1}04b713b7f7975229a9b12a6915ec4d89a3196670"
Starting new HTTP connection (1): controller

# Long wait time here

http://controller:8774 "GET /v2.1/6b3bf6c001124e5e99e6d33f4bacdf64/servers/detail HTTP/1.1" 200 1636    RESP: [200] Content-Length: 1636 Content-Type: application/json Openstack-Api-Version: compute 2.1 X-Openstack-Nova-Api-Version: 2.1 Vary: OpenStack-API-Version, X-OpenStack-Nova-API-Version X-Openstack-Request-Id: req-4ea8af6e-437c-4f39-a2de-b10cc9e81$72 X-Compute-Request-Id: req-4ea8af6e-437c-4f39-a2de-b10cc9e81672 Date: Tue, 05 Jun 2018 14:06:35 GMT Connection: keep-alive

The server has more than enough resources with a 16 core CPU and 64G of Ram, it's mostly idle because I'm still getting it up and running. I've looked into tuning mysql and aside from complaining about too many JOINs being used without indexes, mysql looks fine. There's no disk contention or cpu steal that I can see, so my theory is nova-api is timing out on something. I read somewhere that someone fixed a similar issue by commenting out the "::1 localhost" line in /etc/hosts because this turned out to be an ipv6 issue for them, but that didn't fix it for me.

Has anyone else come across this and know of a fix for it? It's not preventing the cloud from working because everything responds eventually, but it's affecting Horizon to the point where each page load takes 30-60 seconds so I would like to figure out a fix for it eventually.