Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Ocata: cinder-volume causes high CPU load (Ceph backend)

We recently upgraded our Cloud to Ocata, Ceph (Luminous) is our storage backend for glance, cinder and nova. Since the upgrade we are seeing cinder-volume consuming 100% CPU on the control node and lots of TCP connections to the ceph cluster. We expect this from the compute nodes of course, but why does the control node connect to ceph all the time? The same question has been asked here already, but without any answers or comments. Has anyone experienced something similar and could shed some light on this?

Thanks!

Ocata: cinder-volume causes high CPU load (Ceph backend)

We recently upgraded our Cloud to Ocata, Ceph (Luminous) is our storage backend for glance, cinder and nova. Since the upgrade we are seeing cinder-volume consuming 100% CPU on the control node and lots of TCP connections to the ceph cluster. We expect this from the compute nodes of course, but why does the control node connect to ceph all the time? The same question has been asked here already, but without any answers or comments. Has anyone experienced something similar and could shed some light on this?

Thanks!

EDIT: I compared the cinder code from Ocata to Newton and Mitaka, there is indeed a new function implemented. It sends requests to the Ceph cluster for each existing volume to get usage info. So these connections are at least explainable, but I also would like to configure them. I tried to change some of the config options (rados_connection_interval, report_interval, periodic_interval, periodic_fuzzy_delay), but they all had no impact. In fact, these changes caused a flapping of cinder service-list, they were going up and down all the time. Has anyone a hint how to increase the connection interval to the Ceph cluster?