Revision history [back]

Which version OpenStack, which version RDO repo are you using? I'm merely guessing with such little detail, but looks as you indicate some kind of issue with OpenvSwitch and your kernel, a runaway OVS process. Could likely be database or messaging agent related.

Check your qpid logs: /var/log/messages for something that shows a reason for disconnect at the time of your instance communication loss. This could reveal as to why there may be messaging disconnects and whether caused by messaging connect failure (external/tertiary cause); or the other way around, caused by OVS disconnect (likely OVS/kernel build issue).

Since RDO is "...tested on a RHEL 6.4", I would be using CentOS 6.4 minimum, rather than 6 as you state. Even better use 6.5 as there are a number of components included in the kernel, rather than patched as required with RDO.

Additional troubleshooting on your behalf is difficult without logs and details of your config, but after you have assessed this, suffice to say that there are known Neutron configuration challenges to overcome with GRE and MTU settings.

In any case for a successful OpenStack build (no matter how basic, it is complicated), you need to start with a supported and up to date build of OS, kernel and OVS. How can you be sure that you can discount "OVS/kernel version mismatch", what versions are you using?

I'd suggest you configure with latest CentOS 6.5 and RDO, then re-post if issue persists (with updated details, logfiles, etc) additionally on RDO forum: http://openstack.redhat.com/forum/ as then you will get the distro specific details that you may need.

Which version OpenStack, which version RDO repo are you using? I'm merely guessing with such little detail, but looks as you indicate some kind of issue with OpenvSwitch and your kernel, a runaway OVS process. Could likely be database or messaging agent related.

Check your qpid logs: /var/log/messages for something that shows a reason for disconnect at the time of your instance communication loss. This could reveal as to why there may be messaging disconnects and whether caused by messaging connect failure (external/tertiary cause); or the other way around, caused by OVS disconnect (likely OVS/kernel build issue).

Since RDO is "...tested on a RHEL 6.4", I would be using CentOS 6.4 minimum, rather than 6 as you state. Even better use 6.5 as there are a number of components included in the kernel, rather than patched as required with RDO.

Additional troubleshooting on your behalf is difficult without logs and details of your config, but after you have assessed this, suffice to say that there are known Neutron configuration challenges to overcome with GRE and MTU settings.

In any case for a successful OpenStack build (no matter how basic, it is complicated), you need to start with a supported and up to date build of OS, kernel and OVS. How can you be sure that you can discount "OVS/kernel version mismatch", what versions are you using?

I'd suggest you configure with latest CentOS 6.5 and RDO, then re-post if issue persists (with updated details, logfiles, etc) additionally on RDO forum: http://openstack.redhat.com/forum/ as then you will get the distro specific details that you may need.

EDIT: Check dhcp.ini and dnsmask config via these articles for MTU settings, apparrently 1454 should be about right for guest instances when running GRE: http://bderzhavets.blogspot.com.au/2014/01/setting-up-two-physical-node-openstack.html https://ask.openstack.org/en/question/12499/forcing-mtu-to-1400-via-etcneutrondnsmasq-neutronconf-per-daniels/

Don't forget there could still be issues with MTU and GRE depending on your kernel and OVS versions, so please advise what versions you have and update your post, so you can assist with others having similar issues as well, On both nodes show results for: uname -a rpm -qpi | grep openvswitch

Also take a look at your OVS GRE flows and run some tcpdumps in the relevant qrouter namespace when you are making your large 20G transfer, this guide from RDO RDO will help, tale a look at Joe Talerico's great GRE debugging on two node explanation at 60 minutes onwards: http://www.youtube.com/watch?v=wEa_8ESxPAY&feature=share&t=1h20s

And finally you also need to check you aren't being affected by Generic Receive Offload config as per post #24: https://bugs.launchpad.net/neutron/+bug/1252900

Which version OpenStack, which version RDO repo are you using? I'm merely guessing with such little detail, but looks as you indicate some kind of issue with OpenvSwitch and your kernel, a runaway OVS process. Could likely be database or messaging agent related.

Check your qpid logs: /var/log/messages for something that shows a reason for disconnect at the time of your instance communication loss. This could reveal as to why there may be messaging disconnects and whether caused by messaging connect failure (external/tertiary cause); or the other way around, caused by OVS disconnect (likely OVS/kernel build issue).

Since RDO is "...tested on a RHEL 6.4", I would be using CentOS 6.4 minimum, rather than 6 as you state. Even better use 6.5 as there are a number of components included in the kernel, rather than patched as required with RDO.

Additional troubleshooting on your behalf is difficult without logs and details of your config, but after you have assessed this, suffice to say that there are known Neutron configuration challenges to overcome with GRE and MTU settings.

In any case for a successful OpenStack build (no matter how basic, it is complicated), you need to start with a supported and up to date build of OS, kernel and OVS. How can you be sure that you can discount "OVS/kernel version mismatch", what versions are you using?

I'd suggest you configure with latest CentOS 6.5 and RDO, then re-post if issue persists (with updated details, logfiles, etc) additionally on RDO forum: http://openstack.redhat.com/forum/ as then you will get the distro specific details that you may need.

EDIT: Check dhcp.ini and dnsmask config via these articles for MTU settings, apparrently 1454 should be about right for guest instances when running GRE: http://bderzhavets.blogspot.com.au/2014/01/setting-up-two-physical-node-openstack.html https://ask.openstack.org/en/question/12499/forcing-mtu-to-1400-via-etcneutrondnsmasq-neutronconf-per-daniels/

Don't forget there could still be issues with MTU and GRE depending on your kernel and OVS versions, so please advise what versions you have and update your post, so you can assist with others having similar issues as well, On both nodes show results for: for:

uname -a -a

rpm -qpi | grep openvswitch

Also take a look at your OVS GRE flows and run some tcpdumps in the relevant qrouter namespace when you are making your large 20G transfer, this guide from RDO RDO will help, tale a look at Joe Talerico's great GRE debugging on two node explanation at 60 minutes onwards: http://www.youtube.com/watch?v=wEa_8ESxPAY&feature=share&t=1h20s

And finally you also need to check you aren't being affected by Generic Receive Offload config as per post #24: https://bugs.launchpad.net/neutron/+bug/1252900