Ask Your Question
3

DHCP request does not reach tapXXX (qdhcp-XXX) network interface

asked 2015-03-02 15:55:33 -0500

Luis Bravo gravatar image

updated 2015-03-07 08:00:26 -0500

To learn a little about OpenStack, a have installed 3 VMs using libvirt/KVM. I followed the http://docs.openstack.org/juno/install-guide/install/yum/content/

I am using CentOS 7, OPenStack Juno and neutron with openvswitch. One VM is the controller, the other is network node and the third is compute node. I am using a cirros image (cirros-0.3.3-x86_64).

Right now I can start an instance but DHCP does not work.
I can configure manually the IP DHCP allocates and it works, I can access the instance using ssh from outside and also can access external network from the instance.

I found that the instance DHCP request is reaching network node using the wrong interface:

[root@network neutron]# ip netns<br>
qrouter-89fe919a-0659-4e18-a609-aa698a110c9c<br>
qdhcp-2a920ddf-60ac-4e9f-8731-ca64b02f37df

[root@network neutron]# ip netns exec **qrouter-89fe919a-0659-4e18-a609-aa698a110c9c** ip link<br>
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT<br> 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00<br>
11: **qr-27520527-5c**: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT<br>
    link/ether fa:16:3e:b0:3d:2c brd ff:ff:ff:ff:ff:ff<br>
12: qg-79330b81-77: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT <br>
    link/ether fa:16:3e:56:e3:d2 brd ff:ff:ff:ff:ff:ff<br>

[root@network neutron]# ip netns exec qrouter-89fe919a-0659-4e18-a609-aa698a110c9c tcpdump -n -i qr-27520527-5c<br>
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode<br>
listening on qr-27520527-5c, link-type EN10MB (Ethernet), capture size 65535 bytes<br>
**(at this point I ran /etc/init.d/S40network start at instance console)**<br>
18:38:27.335127 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:63:f5:86, length 290<br>
18:39:27.414937 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:63:f5:86, length 290<br>

"fa:16:3e:63:f5:86" is the instance MAC address.<br>
So the DHCP request reaches the network node, but using the wrong interrface.

Dnsmasq is listening on tap8dc583ac-23 interface that is on qdhcp-XXX namespace.

root@network neutron]# ip netns exec **qdhcp-2a920ddf-60ac-4e9f-8731-ca64b02f37df** ip link<br>
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT <br>
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00<br>
9: **tap8dc583ac-23**: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT <br>
    link/ether fa:16:3e:69:91:f2 brd ff:ff:ff:ff:ff:ff<br>

As far as I know, the correct interface is tap8dc583ac-23, but there are something very strange maybe on openvswitch configuration.

Thanks for any help

edit retag flag offensive close merge delete

Comments

Please, activate dnsmasq logging && provide dnsmasq.log

dbaxps gravatar imagedbaxps ( 2015-03-05 09:21:40 -0500 )edit

Add to dhcp_agent.ini line :-
dnsmasq_config_file = /etc/neutron/dnsmasq.conf
Create /etc/neutron/dnsmasq.conf
log-facility = /var/log/neutron/dnsmasq.log
log-dhcp

dbaxps gravatar imagedbaxps ( 2015-03-05 10:30:46 -0500 )edit

the dhcp request is a broadcast, so it is not wrong that it is seen on the router interface. My guess is that the neutron ovs agent failed to bind the dhcp tap and it now does not have the right vlan tag on br-int . Check the neutron ovs agent log and ovs-vsctl show - 4095 is a dead vlan.

darragh-oreilly gravatar imagedarragh-oreilly ( 2015-03-05 13:48:21 -0500 )edit

4 answers

Sort by ยป oldest newest most voted
1

answered 2015-03-06 15:05:04 -0500

Luis Bravo gravatar image

Well, after more than a week trying to fix this problem, suddenly it get fixed, well at least now I have an workaround. I was messing around with the neutron configuration and had removed all networks, To collect the logs dbaxps asked me for, of course I had to reconfigure the networks. When I launched an instance to generate de logs, the instance gets its IP ! Some strange error happened (a bug?) during the initial install leaving the initial dhcp port on a error state.

dbaxps, you said that 4095 on ovs-vsctl show output indicates a dead vlan. I remember seeing this on the original dhcp port, the one that was not working before I remove the initial network config. The current port shows "tag: 2". Also, the original port showed "binding:vif_type binding_failed" on neutron port-show <dhcp-port-id> output. The current one shows "binding:vif_type ovs"

I alread rebooted all machines many times, launched/stopped the instance many times and the problem didn't happened again. I also removed and reconfigured all networks again without problem. I don't have an explanation but everything is working fine now.

edit flag offensive delete link more

Comments

darragh-oreilly warned you about tag 4095 not me. Regarding , recreating private nets very often brings DHCP back to life ( gets dnsmasq active). You are not the first person experiencing this miracle,

dbaxps gravatar imagedbaxps ( 2015-03-06 15:14:03 -0500 )edit
1

answered 2015-03-05 09:16:48 -0500

I think I am having the same problem.

I'll try to explain it on a different way:

NETWORK NODE:

 [root@network ~]# ip netns
 qrouter-02ec0122-0fe7-4f6e-bb90-e00655fc9f22
 qdhcp-daf16c13-c18e-4ad7-b0a6-cf625da81b5d

[root@network ~]# ip netns exec qdhcp-daf16c13-c18e-4ad7-b0a6-cf625da81b5d ip a  |grep tap
9: tapd83ea76c-c3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
    inet 192.168.1.3/24 brd 192.168.1.255 scope global tapd83ea76c-c3

COMPUTE NODE:

[root@compute1 ~]# virsh dumpxml instance-00000009 |grep tap
      <target dev='tapc7e97925-58'/>

How can it be that the Test Access Point (TAP) ID of the interface generated by neutron is different of the one used by the instance?

Consequently, as in the case of Luis, the VM tries to connect to a wrong interface, and does not get any IP from the DHCP.

Does anybody know how to solve this?

(My set up is exactly as the recommended one in the documentation: CentOS 7, Juno, 5 VMs ==> controller, network, compute, block, object)

edit flag offensive delete link more
0

answered 2017-04-14 13:40:42 -0500

Jackbmg gravatar image

updated 2017-04-14 14:07:03 -0500

I am having the same issue. In my case, when the problem occurs, I am seeing the DHCPDISCOVER coming from the compute to the network (controller) node, and a DHCPOFFER returned, but it doesn't make it back to the compute node.

When it does work, I see the DHCPREQUEST back to the network node, and subsequent DHCPACK. (enable dnsmasq logging to see this outside of the syslog).

However, there does not seem to be any consistency. Sometimes it works, sometimes not. I tried on a new subnet..same thing. Yet, on an older subnet, it works. My controller and compute are on a trunked interface, allowing all vlans specified in the /etc/neutron/plugins/ml2/ml2_conf.ini

network_vlan_ranges = physnet2:30:33

I also have dhcp snooping enabled, but have trusted the interface for the controller node, so dhcp responses are allowed. However, for some reason, the DHCPOFFER doesn't make it back to the compute node when the problem occurs.

edit flag offensive delete link more
0

answered 2015-03-09 13:28:56 -0500

rlrevell gravatar image

I have been having the exact same problem - DHCP failing for instances, DHCP requests not seen by the qdhcp-xxx interface on the network node (but visible GRE encapsulated at eth1) and the VLAN showing as 4095 in ovs-vsctl. Deleting and recreating the instance networks solved the problem.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2015-03-02 15:55:33 -0500

Seen: 2,669 times

Last updated: Apr 14 '17