Ask Your Question
0

nova-api really slow to respond

asked 2018-06-05 09:20:41 -0600

noxiousnick gravatar image

I've got a pretty vanilla Queens openstack setup, followed all the defaults from the official docs with the only big difference being that this cloud is backed by ceph storage. Most of the cli commands take 10-40 seconds to respond, and I've narrowed it down to nova-api. All the debugging logs I've been able to dig up always show the commands waiting on GET requests to port 8774. So any commands like "openstack user list" or "openstack token issue" respond immediately, but things like "openstack server list" or "openstack volume list" get hung for 10-40 seconds waiting for nova-api to respond.

Strace shows it as:

connect(4, {sa_family=AF_INET, sin_port=htons(8774), sin_addr=inet_addr("<controller_ip_address>")}, 16) = 0
sendto(4, "GET /v2.1/6b3bf6c001124e5e99e6d3"..., 405, 0, NULL, 0) = 405
fcntl(4, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(4, F_SETFL, O_RDWR)               = 0
recvfrom(4,

# Long wait time here

"HTTP/1.1 200 OK\r\nContent-Length:"..., 8192, 0, NULL, NULL) = 2036

Using openstack with the --debug flag shows it here:

REQ: curl -g -i -X GET http://controller:8774/v2.1/6b3bf6c001124e5e99e6d33f4bacdf64/servers/detail -H "User-Agent: python-novaclient" -H "Accept: application/json" -H "X-Auth-Token: {SHA1}04b713b7f7975229a9b12a6915ec4d89a3196670"
Starting new HTTP connection (1): controller

# Long wait time here

http://controller:8774 "GET /v2.1/6b3bf6c001124e5e99e6d33f4bacdf64/servers/detail HTTP/1.1" 200 1636    RESP: [200] Content-Length: 1636 Content-Type: application/json Openstack-Api-Version: compute 2.1 X-Openstack-Nova-Api-Version: 2.1 Vary: OpenStack-API-Version, X-OpenStack-Nova-API-Version X-Openstack-Request-Id: req-4ea8af6e-437c-4f39-a2de-b10cc9e81$72 X-Compute-Request-Id: req-4ea8af6e-437c-4f39-a2de-b10cc9e81672 Date: Tue, 05 Jun 2018 14:06:35 GMT Connection: keep-alive

The server has more than enough resources with a 16 core CPU and 64G of Ram, it's mostly idle because I'm still getting it up and running. I've looked into tuning mysql and aside from mysqltuner.pl complaining about too many JOINs being used without indexes, mysql looks fine. There's no disk contention or cpu steal that I can see, so my theory is nova-api is timing out on something. I read somewhere that someone fixed a similar issue by commenting out the "::1 localhost" line in /etc/hosts because this turned out to be an ipv6 issue for them, but that didn't fix it for me.

Has anyone else come across this and know of a fix for it? It's not preventing the cloud from working because everything responds eventually, but it's affecting Horizon to the point where each page load takes 30-60 seconds so I would like to figure out a fix for it eventually.

edit retag flag offensive close merge delete

Comments

Can't tell if it's related, but we had major performance issues with horizon because our memcached was not configured correctly. Sometimes we had timeouts when listing all instances in the admin panel although there were only 20 of them. Fixing the memcache config reduced this to 5 to 8 seconds.

eblock gravatar imageeblock ( 2018-06-06 02:03:20 -0600 )edit

What do you mean by fixing it? The only default option I changed was I removed the "-l" flag altogether to let it listen globally. I just tested updating it to listen specifically on its public ip and loopback ip's, and it does seem like commands are returning within 10 seconds now

noxiousnick gravatar imagenoxiousnick ( 2018-06-06 06:44:32 -0600 )edit

We mixed up the configs for memcache:

# old:
[memcache]
servers = localhost:11211

# new
[cache]
memcache_servers = localhost:11211

This boosted our performance! Also we bind memcached to specific addresses:

MEMCACHED_PARAMS="-l 127.0.0.1,<CONTROL_IP>"
eblock gravatar imageeblock ( 2018-06-07 03:47:44 -0600 )edit

Thanks eblock, that looked promising but it didn't make any difference in performance. It actually messed something up with how the VNC tokens get generated/verified so the VNC console stopped working until I undid that attempt. I haven't done a lot more testing recently but it's still an issue here

noxiousnick gravatar imagenoxiousnick ( 2018-06-14 10:57:35 -0600 )edit

1 answer

Sort by ยป oldest newest most voted
0

answered 2018-06-11 06:46:06 -0600

vishwanath shivappa gravatar image

facing same issue. only stuff related to key stone response time is good. all other take a log time. [root@cassini ~]# time glance image-list +----+------+ | ID | Name | +----+------+ +----+------+

real 2m3.914s user 0m1.736s sys 0m0.146s

[root@cassini ~]# time openstack endpoint list +----------------------------------+-----------+--------------+--------------+---------+-----------+-------------------------+ | ID | Region | Service Name | Service Type | Enabled | Interface | URL | +----------------------------------+-----------+--------------+--------------+---------+-----------+-------------------------+ | 376a6b2b838b4f868180335fdac9199e | RegionOne | keystone | identity | True | public | http://cassini:5000/v3/ | | 68cb76bcdbb542278f259a516ec8d3d3 | RegionOne | keystone | identity | True | admin | http://cassini:5000/v3/ | | 9a4b9847c55043789fa7f592b2a10261 | RegionOne | glance | image | True | internal | http://cassini:9292 | | ceb3f5611cc64e8084316bbf5e899e9f | RegionOne | keystone | identity | True | internal | http://cassini:5000/v3/ | | e02a468b47ae4109b5786309a3dcb281 | RegionOne | glance | image | True | admin | http://cassini:9292 | | f2e6f18248b14419b74ec0406a04bc23 | RegionOne | glance | image | True | public | http://cassini:9292 | +----------------------------------+-----------+--------------+--------------+---------+-----------+-------------------------+

real 0m3.874s user 0m1.920s sys 0m0.169s [root@cassini ~]# time openstack user list +----------------------------------+--------+ | ID | Name | +----------------------------------+--------+ | 1b5f861b74f24574992c0ccc0cbf6333 | admin | | 8ff793190f9d4829816c457abe65b3d4 | demo | | be859b477bec481da3fc3af78344f033 | glance | +----------------------------------+--------+

real 0m3.649s user 0m1.914s sys 0m0.167s

please let me know fix if you get it.

edit flag offensive delete link more

Comments

keystone seems to work fine indeed, but more than 2 minutes for a glance call is unbelievably long! Is the network infrastructure special in any way? have you tried to trace the packages to your glance host? I don't think the memcache issue can be applied here.

eblock gravatar imageeblock ( 2018-06-14 10:13:30 -0600 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2018-06-05 09:20:41 -0600

Seen: 533 times

Last updated: Jun 11