Revision history [back]

click to hide/show revision 1
initial version

Openstack Neutron stabilty problems

I have a fairly simple Openstack setup for a PoC. 2 nodes, both running Nova, and everything else on node 1. It is running CentOS 6 and was set up using RDO. Importantly I am using Neutron for the networking, with GRE tenant networks set up from the RDO docs for an existing network.

Periodically (every few days I reckon) I lose all communication with Openvswitch (and thus my instances). I know it OVS, because I can SSH into node 2, then connect to node 1 via their private network. The most telling thing I see in the logs is this:

unix:/var/run/openvswitch/db.sock: database connection failed (Protocol error)

In addition OVS is using HUGE amounts of CPU (800% on my 16-core boxes), and when I try and do a clean shutdown, it just never happens because it cannot kill ovsdb-server.

I have done some Googling and found some old suggestions based on older Openstack releases where people had OVS/kernel version mismatches. As I am running the versions from RDO I reckon I can discount that (unless Red Hat have made a massive screw up).

Anyone else seen this? have any suggestions?

PS: Do not tell me to recompile Openvswitch, for various reasons that is not happening in the immediate future.

Openstack Neutron stabilty problemsstability problems with OpenVSwitch

I have a fairly simple Openstack setup for a PoC. 2 nodes, both running Nova, and everything else on node 1. It is running CentOS 6 and was set up using RDO. Importantly I am using Neutron for the networking, with GRE tenant networks set up from the RDO docs for an existing network.

Periodically (every few days I reckon) I lose all communication with Openvswitch (and thus my instances). I know it OVS, because I can SSH into node 2, then connect to node 1 via their private network. The most telling thing I see in the logs is this:

unix:/var/run/openvswitch/db.sock: database connection failed (Protocol error)

In addition OVS is using HUGE amounts of CPU (800% on my 16-core boxes), and when I try and do a clean shutdown, it just never happens because it cannot kill ovsdb-server.

I have done some Googling and found some old suggestions based on older Openstack releases where people had OVS/kernel version mismatches. As I am running the versions from RDO I reckon I can discount that (unless Red Hat have made a massive screw up).

Anyone else seen this? have any suggestions?

PS: Do not tell me to recompile Openvswitch, for various reasons that is not happening in the immediate future.