Neutron network node: DHCP requests hit interface, but not dnsmasq for some tenants while ok for others
I am struggling with network node Neutron implementation not giving IP addresses in certain tenants (after controller reboot some tenants start to work while other stop to, looks like issue is not related to any particular setup in tenant).
In both cases I see DHCP requests arriving on tap interface, dnsmasq process running correctly in tenant, but in case of some tenants DHCP replies are not seen and it seems nothing is delivered to dnsmasq process. Restarting neutron-dhcp-agent nor killing dnsmasq proces (and then restarting neutron-dhcp-agent) does not help.
I am out of ideas...any help greatly appreciated !
Not working tenant:
ip netns exec qdhcp-363efdba-9c30-4305-8a6a-7639a66a13fd ip a
...
190: tap51abb5e4-c1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
...
ip netns exec qdhcp-363efdba-9c30-4305-8a6a-7639a66a13fd tcpdump port 67 or port 68 -i tap51abb5e4-c1 -nnvvvNXs 512
... I see DHCP requests, but no DHCP reply
ps -efa | grep tap51abb5e4-c1
nobody 23933 1 0 08:00 ? 00:00:00 dnsmasq --no-hosts --no-resolv --strict-order --bind-interfaces --interface=tap51abb5e4-c1 --except-interface=lo --pid-file=/var/run/neutron/dhcp/363efdba-9c30-4305-8a6a-7639a66a13fd/pid --dhcp-hostsfile=/var/run/neutron/dhcp/363efdba-9c30-4305-8a6a-7639a66a13fd/host --addn-hosts=/var/run/neutron/dhcp/363efdba-9c30-4305-8a6a-7639a66a13fd/addn_hosts --dhcp-optsfile=/var/run/neutron/dhcp/363efdba-9c30-4305-8a6a-7639a66a13fd/opts --leasefile-ro --dhcp-range=set:tag0,192.168.2.0,static,172800s --dhcp-lease-max=256 --conf-file=/etc/neutron/dnsmasq/dnsmasq-neutron.conf --domain=openstacklocal
Looking at process, no packets are arriving strace -p 23933 -e network,write -s 4096
I have noticed there is no tag in OVS assigned (example of 2 not-working networks):
Port "tap51abb5e4-c1"
Interface "tap51abb5e4-c1"
type: internal
Port "tap5d9b6dc3-32"
Interface "tap5d9b6dc3-32"
type: internal
Doing the same for working tenant:
I see request and offer + packets reported on process via strace -p 24505 -e network,write -s 4096
Process 24505 attached
recvmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(68), sin_addr=inet_addr("0.0.0.0")}, msg_iov(1)=[{"\1\1\6\0\0022\341M\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\372\26>\37.\211\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 ...
search the ovs agent logs for these taps to find why they don't have a tag
Thanks !
I can confirm tags are causing this problem. In logs I can see ovs-vsctl command that creates tap interfaces for all four. But later for woring ones I get:
/usr/bin/ovs-vsctl --timeout=10 set Port tapb8d8b645-21 tag=10
How is tag defined and when is configured by what? Where to look?
search the neutron logs (server and ovs agent) for the ids (eg 51abb5e4) to see if they say why the binding fails. I believe there is a bug where if the port binding fails once, then the port is unbindable for ever
With the admin credentials sourced, can you provide the output of
neutron port-show <DHCP-PORT-ID>
. You can find DHCP-PORT-ID by runningneutron port-list --device_owner network:dhcp
I have edited question and added port-show for all dhcp ports in my environment (needed to ad as text as upload was not available for me). That includes some working, some not - I have not spotted differences here.
Thanks for your help so far