Partial DHCP Problems after Upgrade from Folsom to Grizzly (nova-network/Ubuntu 12.04.3)
Hallo,
after upgrading an Openstack deployment from Folsom to Grizzly using the Ubuntu Cloud Archive on Ubuntu 12.04.3 LTS we encounter problems with DHCP. When hard rebooting suspended instances, the Fedora 19 and CentOS 6.4 instances get an IP without any issues. The existing Ubuntu 12.04 (cloud-image) instances fail to get an IP. To make it even more weird: New instances from Cirros or Ubuntu get a proper IP and connectivity.
I already found this https://github.com/mseknibilel/OpenStack-Folsom-Install-guide/issues/14 and applied the iptables mangle fix and rebooted the host, but it did not help.
Do you have any ideas where to look next or at least a decent workaround so I can get the existing suspended instances' connectivity back up?
Thanks in advance, Stefan
SOME ADDITIONAL FINDINGS:
We are using FlatDHCP/MultiHost Networking were the host is the DHCP server for its instances. it became obvious that the DHCP problem is not based on the guests' OS. It is because the DHCP communication from the instance back to the DHCP is prevented. DHCPDISCOVER and DHCPOFFER go through, but not DHCPREQUEST. The according iptables filter rule contains the wrong host IP. I just checked a running Folsom deployment - there the IP seems to be the one from the host/DHCP server. There is a nova DB table instance_info_caches which contains infos about the DHCP servers. This information seems to be used for libvirt nwfilter XML files which then seem to build the iptables rules?
For going steps could someone please point me in the right direction?
- How to quickly trigger the change of all Openstack related iptables rules on the host (for testing)?
- Can I delete the content of the DB table instance_info_caches since it is named cache and will Nova rebuild the cache data - namely the right host IPs for the DHCP servers?
- Where is decided which IP a host gets on the bridge? The IPs seem random.
- Can you give me some hint on documentation of nova-network and there especially the DHCP part?
- Is there a best practice regarding pre-configured bridges on Ubuntu?
Many questions I know - I would be glad if at least some of them get an answer.
Thanks
Update:
I narrowed it down even further: On every host there is a rule which only allows itself as DHCP server for its instances basically: One example: (The host has the IP 10.101.0.46 --> that would be the correct one in the rule)
-A nova-compute-inst-393 -s 10.101.0.55/32 -p udp -m udp --sport 67 --dport 68 -j ACCEPT
This rule is already wrong, but after issuing a 'service nova-compute restart' it changes into
-A nova-compute-inst-393 -s 10.101.0.8/32 -p udp -m udp --sport 67 --dport 68 -j ACCEPT
Also wrong, 10.101.0.46 is the one with the right dnsmasq process.
The data comes and is changed in the nova database instance_info_caches as dhcp_server within the network_info column. To me it seems ...
Sounds like a frustrating time. Did you see http://docs.openstack.org/trunk/openstack-ops/content/network_troubleshooting.html yet?
Yes, that is how figured out the DHCP problems and that only a fraction of the DHCP packets are led through. But thanks anyway