Ask Your Question
1

Partial DHCP Problems after Upgrade from Folsom to Grizzly (nova-network/Ubuntu 12.04.3)

asked 2013-08-25 02:35:01 -0500

sschlott gravatar image

updated 2013-08-25 17:26:18 -0500

Hallo,

after upgrading an Openstack deployment from Folsom to Grizzly using the Ubuntu Cloud Archive on Ubuntu 12.04.3 LTS we encounter problems with DHCP. When hard rebooting suspended instances, the Fedora 19 and CentOS 6.4 instances get an IP without any issues. The existing Ubuntu 12.04 (cloud-image) instances fail to get an IP. To make it even more weird: New instances from Cirros or Ubuntu get a proper IP and connectivity.

I already found this https://github.com/mseknibilel/OpenStack-Folsom-Install-guide/issues/14 and applied the iptables mangle fix and rebooted the host, but it did not help.

Do you have any ideas where to look next or at least a decent workaround so I can get the existing suspended instances' connectivity back up?

Thanks in advance, Stefan

SOME ADDITIONAL FINDINGS:

We are using FlatDHCP/MultiHost Networking were the host is the DHCP server for its instances. it became obvious that the DHCP problem is not based on the guests' OS. It is because the DHCP communication from the instance back to the DHCP is prevented. DHCPDISCOVER and DHCPOFFER go through, but not DHCPREQUEST. The according iptables filter rule contains the wrong host IP. I just checked a running Folsom deployment - there the IP seems to be the one from the host/DHCP server. There is a nova DB table instance_info_caches which contains infos about the DHCP servers. This information seems to be used for libvirt nwfilter XML files which then seem to build the iptables rules?

For going steps could someone please point me in the right direction?

  • How to quickly trigger the change of all Openstack related iptables rules on the host (for testing)?
  • Can I delete the content of the DB table instance_info_caches since it is named cache and will Nova rebuild the cache data - namely the right host IPs for the DHCP servers?
  • Where is decided which IP a host gets on the bridge? The IPs seem random.
  • Can you give me some hint on documentation of nova-network and there especially the DHCP part?
  • Is there a best practice regarding pre-configured bridges on Ubuntu?

Many questions I know - I would be glad if at least some of them get an answer.

Thanks

Update:

I narrowed it down even further: On every host there is a rule which only allows itself as DHCP server for its instances basically: One example: (The host has the IP 10.101.0.46 --> that would be the correct one in the rule)

-A nova-compute-inst-393 -s 10.101.0.55/32 -p udp -m udp --sport 67 --dport 68 -j ACCEPT

This rule is already wrong, but after issuing a 'service nova-compute restart' it changes into

-A nova-compute-inst-393 -s 10.101.0.8/32 -p udp -m udp --sport 67 --dport 68 -j ACCEPT

Also wrong, 10.101.0.46 is the one with the right dnsmasq process.

The data comes and is changed in the nova database instance_info_caches as dhcp_server within the network_info column. To me it seems ... (more)

edit retag flag offensive close merge delete

Comments

fifieldt gravatar imagefifieldt ( 2013-08-25 17:23:56 -0500 )edit

Yes, that is how figured out the DHCP problems and that only a fraction of the DHCP packets are led through. But thanks anyway

sschlott gravatar imagesschlott ( 2013-08-25 17:29:02 -0500 )edit

1 answer

Sort by ยป oldest newest most voted
1

answered 2013-08-26 08:34:47 -0500

sschlott gravatar image

Hi, I just found the bug, which seemingly was introduced with Grizzly and fixed in Grizzly 2013.1.3, but the fix is not yet in the Ubuntu Cloud archive package --> bad luck.

The bug descriptions are here:

https://bugs.launchpad.net/nova/+bug/1092347

https://bugs.launchpad.net/nova/+bug/1194178

I temporarily fixed the problem by patching nova/virt/firewall.py in the following way:

import netifaces as ni

and in the method _do_dhcp_rules, which creates the faulty iptables rule, i assigned the local IP of br100 as dhcp_servers:

host_local_ip = ni.ifaddresses('br100')[2][0]['addr']

dhcp_servers = [ host_local_ip ]

It is just a quickfix and will hopefully not be necessary anymore, once we update to 2013.1.3, but if someone else encounters the problem, it is written down. The basic idea comes from Sam Morrison - he mentioned it in the second bug report thread.

Everything now runs as expected with a shiny upgraded Grizzly deployment ;)

edit flag offensive delete link more

Comments

Thanks so much for posting your detailed steps, this is sure to help a lot of people :) Very glad you got it working!

fifieldt gravatar imagefifieldt ( 2013-08-26 12:57:07 -0500 )edit

Note this is still an issue for Havana, we still have a hack in our nova to fix this

sorrison gravatar imagesorrison ( 2013-11-27 20:36:37 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2013-08-25 02:35:01 -0500

Seen: 531 times

Last updated: Aug 26 '13