Ask Your Question
0

network issues: instance cannot talk to 10.0.0.1

asked 2014-01-05 14:43:56 -0500

Mathias Ewald gravatar image

updated 2014-01-07 08:14:36 -0500

Hi, I am running my cloud with VMwareESXDriver with nova-network running on the compute node for that ESXi host I have. My instance (cirros) boots successfully and receives the correct IP which I can ping from the controller node as it has in interface in that 10.0.0.0/24 network. But from that instance I cannot ping the default gateway 10.0.0.1 (br100 on the compute node). Sniffing eth1 (uplink for br100) I can see the ARP request but no response. Sniffing on br100 I cannot see anything. Any ideas on how to proceed regarding this?

root@compute1:~# brctl show
bridge name     bridge id               STP enabled     interfaces
br100           8000.00505601006b       no              eth1
root@compute1:~# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 00:50:56:01:00:6b  
          inet6 addr: fe80::250:56ff:fe01:6b/64 Scope:Link
          UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
          RX packets:6402 errors:0 dropped:9 overruns:0 frame:0
          TX packets:175 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:483429 (483.4 KB)  TX bytes:59066 (59.0 KB)

root@compute1:~# ifconfig br100
br100     Link encap:Ethernet  HWaddr 00:50:56:01:00:6b  
          inet addr:10.0.0.1  Bcast:10.0.0.255  Mask:255.255.255.0
          inet6 addr: fe80::74a3:42ff:fe32:2fbe/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6259 errors:0 dropped:0 overruns:0 frame:0
          TX packets:865 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:374154 (374.1 KB)  TX bytes:87878 (87.8 KB)

root@compute1:~#

Any other IP on the network the instance can reach, so it has to be something on the compute node. Looking at ebtables it makes a lot more sense to see this behavior: http://pastebin.com/8bZJG6zi Sense meaning I now why it behaves like this but this confuses me to be honest: I was thinking nova-network (in my case the compute node) is the one all traffic goes through (the default gateway). Seeing nova-network assigning br100 the IP of 10.0.0.1 would confirm this assumption. But ebtables look like it is not intended to see 10.0.0.1 on br100 on the network. Can anyone shine a light on this?

Sniffing on eth1 (the br100 uplink), I can see the ARP requests coming in:

root@compute1:~# tcpdump -i eth1 
tcpdump: WARNING: eth1: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
15:59:28.064071 ARP, Request who-has 10.0.0.1 tell 10.0.0.3, length 46
15:59:29.065973 ARP, Request who-has 10.0.0.1 tell 10.0.0.3, length 46
15:59:30.063757 ARP, Request who-has 10.0.0.1 tell 10.0.0.3, length 46
15:59:31.064268 ARP, Request who-has ...
(more)
edit retag flag offensive close merge delete

Comments

Please post one question at the time: this site is a lot more useful if there is one answer per one problem/question. You can edit the question to add details and make it more readable instead of adding comments.

smaffulli gravatar imagesmaffulli ( 2014-01-06 14:01:51 -0500 )edit

You are absolutely right, my apologies! I fixes this - now the original answer text is a single question with the additional information from my comment.

Mathias Ewald gravatar imageMathias Ewald ( 2014-01-06 14:57:55 -0500 )edit

1 answer

Sort by ยป oldest newest most voted
0

answered 2014-01-08 07:12:52 -0500

Mathias Ewald gravatar image

I have found out (by brute forcing nova.conf settings :D) that

share_dhcp_address=True

causes the ebtables rules that drop ARP requests and ARP responses for 10.0.0.1. In the documentation, I read

If True in multi_host mode, all compute hosts share the same dhcp address. The same IP address used for DHCP will be added on each nova-network node which is only visible to the vms on the same host.

I though in multi_host mode nova-network runs on every nova-compute node. Why does the text distinguish between "compute hosts" and "nova-network nodes"?

Here is what I understand from this: In multi_host mode, we ca an isolated layer 2 network for every compute node with nova-network on it that we run. The compute node with nova-network serves as the default gateway for the instances controlled by that nova-compute instance. So the compute node will be e.g. 10.0.0.1/24 and DHCP serves IP address to instances from 10.0.0.2. With shared_dhcp_address all compute nodes will have 10.0.0.1 on their flat network which - the way I see it - is a requirement for live migration. During my testing I found out that setting

shared_dhcp_address=False

(which is the default) makes br100 on the nova-compute/nova-network node get 10.0.0.3 instead of 10.0.0.1 - no idea why, i just thought that would be worth mentioning.

None of theses thoughts, lead me to a point where I would say "OK, dropping ARP requests makes sense!". Any ideas/explanations?

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

2 followers

Stats

Asked: 2014-01-05 14:43:56 -0500

Seen: 1,059 times

Last updated: Jan 08 '14