I have a fairly simple Openstack setup for a PoC. 2 nodes, both running Nova, and everything else on node 1. It is running CentOS 6 and was set up using RDO. Importantly I am using Neutron for the networking, with GRE tenant networks set up from the RDO docs for an existing network.
Periodically (every few days I reckon) I lose all communication with Openvswitch (and thus my instances). I know it OVS, because I can SSH into node 2, then connect to node 1 via their private network. The most telling thing I see in the logs is this:
unix:/var/run/openvswitch/db.sock: database connection failed (Protocol error)
In addition OVS is using HUGE amounts of CPU (800% on my 16-core boxes), and when I try and do a clean shutdown, it just never happens because it cannot kill ovsdb-server.
I have done some Googling and found some old suggestions based on older Openstack releases where people had OVS/kernel version mismatches. As I am running the versions from RDO I reckon I can discount that (unless Red Hat have made a massive screw up).
Anyone else seen this? have any suggestions?
PS: Do not tell me to recompile Openvswitch, for various reasons that is not happening in the immediate future.