Ask Your Question
1

Ocata: cinder-volume causes high CPU load (Ceph backend)

asked 2017-10-17 03:10:22 -0500

eblock gravatar image

updated 2017-10-18 02:35:33 -0500

We recently upgraded our Cloud to Ocata, Ceph (Luminous) is our storage backend for glance, cinder and nova. Since the upgrade we are seeing cinder-volume consuming 100% CPU on the control node and lots of TCP connections to the ceph cluster. We expect this from the compute nodes of course, but why does the control node connect to ceph all the time? The same question has been asked here already, but without any answers or comments. Has anyone experienced something similar and could shed some light on this?

Thanks!

EDIT: I compared the cinder code from Ocata to Newton and Mitaka, there is indeed a new function implemented. It sends requests to the Ceph cluster for each existing volume to get usage info. So these connections are at least explainable, but I also would like to configure them. I tried to change some of the config options (rados_connection_interval, report_interval, periodic_interval, periodic_fuzzy_delay), but they all had no impact. In fact, these changes caused a flapping of cinder service-list, they were going up and down all the time. Has anyone a hint how to increase the connection interval to the Ceph cluster?

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted
1

answered 2017-10-18 08:37:32 -0500

eblock gravatar image

The problem has obviously already been solved in Pike release. It's only one line and basically replacing diff_iterate() with v.size(). This reduces the CPU load for cinder-volume.

control1:~ # diff -u /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py.dist /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py
--- /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py.dist  2017-10-17 12:16:01.936816297 +0200
+++ /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py       2017-10-18 15:05:39.105953958 +0200
@@ -367,7 +367,8 @@
                     # non-default volume_name_template settings.  Template
                     # must start with "volume".
                     with RBDVolumeProxy(self, t, read_only=True) as v:
-                        v.diff_iterate(0, v.size(), None, self._iterate_cb)
+                        self._total_usage += v.size()

     def _update_volume_stats(self):
         stats = {

https://review.openstack.org/#/c/508455/

edit flag offensive delete link more

Comments

I am running Pike, but my rbd.py did not have the correction. My version of cinder was 11.0.0, and there was a 11.0.1 available, which includes the fix.

jep gravatar imagejep ( 2017-11-20 15:58:03 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

2 followers

Stats

Asked: 2017-10-17 03:10:22 -0500

Seen: 545 times

Last updated: Oct 18 '17