Queens DHCP Agent stops providing addresses on some networks

asked 2019-01-11 10:25:33 -0500

Rumbles gravatar image

We have been using openstack for around a year now, and in the last few days we started experiencing some major troubles. Currently, when we start a server the DHCP agent appears to be running on that network, but we don't get an IP address. This seems to be sporadic, we have one DHCP agent running for all of our networks, and some are getting addresses while other networks do not get IP addresses.

We tried running neutron-dhcp-agent in debug mode, but we didn't get any useful logs as you might hope.

$ grep -v -e ^# -e ^$ /etc/neutron/dhcp_agent.ini 
[DEFAULT]
interface_driver = linuxbridge 
debug = false 
dhcp_driver = neutron.agent.linux.dhcp.Dnsmasq
enable_isolated_metadata = True
[agent] [ovs]

We have disabled DHCP on all our networks, and removed the agent from each network, then reactivated DHCP on each, which appears to fix the issue for a few minutes, but then we return to square one.

We tried restarting all the neutron services, and the dhcp-agent gets stuck in a startup loop for some time.

I'm struggling to even debug the issue currently, what are the best steps to take to debug this problem?

Systemd is showing some errors, but I'm not clear they are related:

[/lib/systemd/system/neutron-dhcp-agent.service:12] Runtime directory is not valid, ignoring assignment: neutron lock/neutron
[/lib/systemd/system/neutron-dhcp-agent.service:13] Unknown lvalue 'CacheDirectory' in section 'Service'

The agent we're using is from ubuntu repos:

neutron-dhcp-agent                   2:12.0.5-0ubuntu1~cloud0
edit retag flag offensive close merge delete

Comments

I would look into two details. Is there connectivity from the DHCP network namespace to the instance's static IP, and is anything logged by the dnsmasq processes that implement your DHCP servers.

the dhcp-agent gets stuck in a startup loop

Is there anything in the agent's log file?

Bernd Bausch gravatar imageBernd Bausch ( 2019-01-11 19:32:13 -0500 )edit

I didn't find any useful logs, in either the dnsmasq logs or the agent, but after a colleague restarted neutron-linuxbridge-cleanup the DHCP server seems to be working again.

Rumbles gravatar imageRumbles ( 2019-01-14 07:02:32 -0500 )edit