I have a very confusing issue which I can't seem to resolve.
I followed a guide by Loic Dachary on installing Openstack Folsom on Wheezy, and deployed it on two hosts: a cluster node, and my workstation.
On these two hosts I'm running a benchmarking application that communicated from one host to the other in the following fashion:
The following are the routing tables for the node host, the workstation, and the internal VMs:
(Note: the 10.0.1.0 entry was for an additional network interface. It ended up being unneeded and is not up on any of the VMs. Hence, it has no impact since nothing is being routed to destination 10.0.1.x)
Now, my issue is the following:
The benchmarking application starts an RMI call from one of the VMs on the workstation, (say 10.0.0.2) to one of the VMs on the node (say, 172.23.3.100).
It is my understanding that the following should occur:
The VM sees desination 172.23.x.x network route and goes through the default route 10.0.0.1 (the workstation compute host)
The nova network service sees that 10.0.0.2 is mapped to some IP (say, 172.23.12.1) on the local network. So it changes the source IP to that.
The workstation host sees destination 172.23.x.x and routes through 172.23.1.1 to 172.23.3.8.
Nova network sees the IP and since the mapping say, 172.23.3.100 -> 10.0.0.7 exists, it changes destination to 10.0.0.7.
VM on node with internal IP 10.0.0.7 gets the the request from workstation VM. (@ 172.23.12.1).
This works fine. The actual request gets sent. However, there is something wrong with the reply.
Here's the log of my application run just after the request is sent:
RemoteException was: java.rmi.ServerException: RemoteException occurred in server thread; nested exception is: -> java.rmi.ConnectIOException: Exception creating connection to: 10.0.0.7; nested exception is: -> java.net.NoRouteToHostException: No route to host
So it must be the case that the node VM replied with it's source being 10.0.0.7. In other words, the nova network service never changed the source IP of the node VM to its routable 172.23.x.x address!
How is this possible? Both VM's can ping themselves (node->workstation, workstation->node), and I don't see anything wrong with the routing tables.
The only odd factor I can see is the following traceroute information:
Workstation VM -> Node
root@client-01:~# traceroute 172.23.3.8 traceroute to 172.23.3.8 (172.23.3.8), 30 hops max, 60 byte packets 1 10.0.0.1 (10.0.0.1) 0.406 ms 0.399 ms 0.390 ms 2 172.23.3.8 (172.23.3.8) 0.381 ms 0.378 ms 0.422 ...