Ask Your Question
1

Openvswitch stops responding with higher CPU load [closed]

asked 2014-10-25 14:09:49 -0600

Yash gravatar image

updated 2014-10-25 21:25:40 -0600

Hi,

I am facing a problem with openvswitch not responding to user intervention if the system went through a high CPU load at some point in the past. I am running RDO on RHEL. The OVS processes seems to be stuck on something. Even though the ovsdb-server and ovs-vswitchd processes are reported in running state they are non-responsive. I cannot kill/restart these processes and can't gdb into them either. Nothing is logged into the ovs logs once this happens and all ovs commands hang.

I found a thread where someone has reported this but it was closed a few months ago due to inactivity.

There was some heavy network traffic and a CPU spike that led to this problem. Can I recover from the problem without rebooting?

Versions-

RHEL: 6.5 Kernel: 2.6.32-431.el6.x86_64; Openvswitch version and release: 2014.1.2-1.el6 (through openstack Icehouse)

Some information from the ovsdb-server log-

2014-10-24T20:17:33Z|00020|poll_loop|INFO|wakeup due to [POLLIN] on fd 16 (/var/run/openvswitch/db.sock<->) at lib/stream-fd.c:142 (99% CPU usage)
2014-10-24T20:17:33Z|00021|poll_loop|INFO|wakeup due to 0-ms timeout at unix (99% CPU usage)
2014-10-24T20:17:33Z|00022|poll_loop|INFO|wakeup due to 0-ms timeout at unix (99% CPU usage)
2014-10-24T20:17:33Z|00023|poll_loop|INFO|wakeup due to 0-ms timeout at unix (99% CPU usage)
2014-10-24T20:17:33Z|00024|poll_loop|INFO|wakeup due to 0-ms timeout at unix (99% CPU usage)
2014-10-24T20:17:33Z|00025|poll_loop|INFO|wakeup due to 0-ms timeout at unix (99% CPU usage)
2014-10-24T20:17:33Z|00026|poll_loop|INFO|wakeup due to 0-ms timeout at unix (99% CPU usage)
2014-10-24T20:17:33Z|00027|poll_loop|INFO|wakeup due to 0-ms timeout at unix (99% CPU usage)
2014-10-24T20:17:33Z|00028|poll_loop|INFO|wakeup due to 0-ms timeout at unix (99% CPU usage)
2014-10-24T20:17:33Z|00029|poll_loop|INFO|wakeup due to 0-ms timeout at unix (99% CPU usage)
2014-10-24T20:17:39Z|00030|poll_loop|INFO|Dropped 10014 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
2014-10-24T20:17:39Z|00031|poll_loop|INFO|wakeup due to 0-ms timeout at unix (98% CPU usage)
2014-10-24T20:17:42Z|00032|jsonrpc|WARN|unix: send error: Broken pipe
2014-10-24T20:17:42Z|00033|reconnect|WARN|unix: connection dropped (Broken pipe)
2014-10-24T20:18:49Z|00034|fatal_signal|WARN|terminating with signal 15 (Terminated)
edit retag flag offensive reopen merge delete

Closed for the following reason the question is answered, right answer was accepted by SamYaple
close date 2014-10-31 01:16:48.684692

2 answers

Sort by ยป oldest newest most voted
1

answered 2014-10-26 06:20:23 -0600

SamYaple gravatar image

Thats a tough one. If the process is not responding to a SIGTERM/SIGKILL, there isn't much you can do. I can suggest you hot remove the OVS module, but that will likely cause more problems if it works at all.

I don't know what OVS version that is based on the package name, but you should try upgrading kernel/OpenVSwitch as there have been lots of bugfixes and updates.

edit flag offensive delete link more

Comments

My OVS version is 1.11 with commit id 8ce28d. It was built sometime in July 2013. Supposedly this is the RDO supported version, so I'm a little tentative to just upgrade OVS. Thanks for the suggestion though.

I changed the cpu interrupt affinity of diff eth ports to reduce load without success.

Yash gravatar imageYash ( 2014-10-26 10:02:42 -0600 )edit

Using OVS with RDO kernel might not be a really good idea. Actually I find RDO kernel to old for hypervisors and OVS servers. It works quite better with Ubuntu 14.04 and OVS. OVS needs a new kernel to work properly.

xtrill gravatar imagextrill ( 2014-10-27 12:08:20 -0600 )edit

Unfortunately, "use a different distro" isn't normally an option. While I will agree that it works better on Ubuntu due to the newer packages and kernel, Redhat has done alot of work to backport all of the OVS things.

Larger companies sometimes have policies in place that require certain kernels.

SamYaple gravatar imageSamYaple ( 2014-10-28 00:03:31 -0600 )edit

Quite right...I am stuck with RDO/RHEL for now.

Yash gravatar imageYash ( 2014-10-28 01:02:12 -0600 )edit
1

answered 2014-10-30 14:42:28 -0600

Yash gravatar image

So after scratching my head for a week, I decided to try out a newer version of openvswitch. I build the sources for 2.3.0.1, created an RPM and tried it on the system. I have been testing this for a day and it looks good so far.

I am planning to test this out for a few more days and integrate on our systems if all goes well. In the meantime I have requested RDO to support a more recent version of OVS.

We can mark this thread as closed. Thanks everyone.

edit flag offensive delete link more

Comments

Glad to hear it.

I have personally only used up to 2.1.0 of OVS. But that works really well for me. It still has its problems, but I haven't been experiencing those issues myself.

SamYaple gravatar imageSamYaple ( 2014-10-31 01:18:07 -0600 )edit

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2014-10-25 14:09:49 -0600

Seen: 2,153 times

Last updated: Oct 30 '14