Ask Your Question
0

High si/sys values via top in instances

asked 2016-03-26 14:23:49 -0500

BobH gravatar image

Looking for some help to figure out what's going on here. I'm in the process of creating a third party CI system for our project. I'm initially trying to setup 6 manually created jenkins slaves using diskimage builder and puppet to run gate jobs and will scale from there and eventually move to nodepool.

I don't think this is specific to devstack-gate. I suspect it'll do this with system activity that stresses the instance. So, we can just think of the jenkins slaves as compute node instances that have heavy usage.

My setup is as follows:

  1. Physical Servers(2): Intel 1 socket 12 core (hyperthread so 24 are seen by the hypervisor), 128gb RAM.
  2. Openstack Liberty installed as a 3 node; 1 controller, 1 compute/network (96gb RAM) , 2nd Compute (96gb RAM) as per the liberty installation guide.
  3. Openstack controller, and compute ndoe guests, were created by hand using libvirt on the respective physical server. using provider network, with linuxbridge.
  4. Backing store for jenkins slaves/openstack liberty is the local file system. Jenkins slaves are configured using puppet, images are built using diskimage builder. The standard third party setup described in the CI documentation.
  5. Jenkins slaves are 4 vcpu and 8gb of ram, 3/compute node. CPU/Memory not over-commited I have verified kvm acceleration is being used.
  6. All vm definitions are using virtio for network and disk and virtio-pci is installed. All vms using
    host-passthrough in the cpu-model in the libvirt.xml describing it.

Trying to keep it simple as I learn the ropes...

All systems are using Kernel 3.19.0-56-generic #62~14.04.1-Ubuntu SMP on Ubuntu 14.04.4 LTS (I've seen the same thing on early kernels and earlier 14.04 versions).

My issue is as follows,

If I create a single jenkins slave on a single compute node, the basic setup time (we'll ignore tempest, but a similar thing happens) to run devstack-gate is about roughly 20 minutes, sometimes less. As I scale the number of jenkins slaves on the compute node, up to 3, the setup time increases dramatically on each instance. The last run I did had it at nearly an hour on each (all 3 running concurrently). Clearly something is wrong, as I have not over-comitted memory, nor ram on either of the compute nodes.

What I'm finding is the CPU's are getting overwhelmed as I scale in the jenkins slaves. Top will show sys/si percentages eating up the majority of CPU, sometimes collectively they are taking up 70-80% of the cpu time. This will drop to what's shown below when the system becomes idle.

When the systems are idle (after one run) this is a typical view of top, mongodb is using 9.3% of the cpu, sys is at 9.8% and si at 5.2% of the available cpu (Irix mode off). The compute node and the physical server do not show this sort of load, they ... (more)

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted
0

answered 2016-03-29 14:01:54 -0500

BobH gravatar image

Well, the jury is still out, but after making this change thing are behaving much better.

Here: https://ask.openstack.org/en/question/46303/packets-to-vm-are-dropped-in-case-of-multiple-senders/?answer=47686#post-id-47686 (https://ask.openstack.org/en/question...)

I did this on both my physical servers and the sy/si values are now what I believe to be reasonable levels. Running tests now.

What led me to this was I found was extremely variable ping times from the instances to my router. It ranged from a fraction of a ms, to over 100ms. After this change and restarting everything, ping times are now 1ms or less.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2016-03-26 14:23:49 -0500

Seen: 165 times

Last updated: Mar 29 '16