Icehouse Neutron ML2 (OVS & GRE) MTU Problem?
Hi all,
I have built an OpenStack IceHouse cluster on Ubuntu 14.04 by following the instructions http://docs.openstack.org/icehouse/install-guide/install/apt/content/ (here).
My environment looks like the following:
- One (1) Controller node
- One (1) Network node
- Three (3) Compute nodes
This http://docs.openstack.org/icehouse/install-guide/install/apt/content/figures/1/figures/installguide_arch-neutron.png (diagram) depicts how OpenStack services are distributed on the nodes.
I can launch VMs on a tenant network. They successfully get an IP on the internal network via dhcp. I can successfully attach a floating IP public to the VMs. I can ping both the internal and floating IPs. So far so good.
Now my problems:
- I can't ssh into the instances. ssh -v shows connection but just hangs.
- During bootup, the VMs can't reach the metadata server at http://169.254.169.254 . No key injection. Explains #1.
But after bootup, I can get into a cirros image through console and curl http://169.254.169.254 successfully. The VM can also reach the outside world (the internet). I can resolve IP's using google's 8.8.8.8. But telnet http://yahoo.com 80 fails.
Searching this forum, I have found two clues:
- The known MTU problem. Something like https://ask.openstack.org/en/question/31911/metadata-query-hanging/ (this). But the fix of trying to set VM MTU to 1400 is not working. Cirros doesn't respect mtu via DHCP. On the other Ubuntu images, I can't login to check.
- Metadata proxy server misconfiguration. I've gone over this several times and I can't note a misconfiguration. All logs seem to show successful metadata being given out.
Here are some pertinent config files from the environment:
nova.conf from Controller node
root@controller:~# cat /etc/nova/nova.conf
[DEFAULT]
dhcpbridge_flagfile=/etc/nova/nova.conf
dhcpbridge=/usr/bin/nova-dhcpbridge
logdir=/var/log/nova
state_path=/var/lib/nova
lock_path=/var/lock/nova
force_dhcp_release=True
iscsi_helper=tgtadm
libvirt_use_virtio_for_bridges=True
connection_type=libvirt
root_helper=sudo nova-rootwrap /etc/nova/rootwrap.conf
verbose=True
ec2_private_dns_show_ip=True
api_paste_config=/etc/nova/api-paste.ini
volumes_path=/var/lib/nova/volumes
enabled_apis=ec2,osapi_compute,metadata
#
rpc_backend = rabbit
rabbit_host = controller
rabbit_password = Fusiondc10!
#
my_ip = 10.20.81.101
vncserver_listen = 10.20.81.101
vncserver_proxyclient_address = 10.20.81.101
#
auth_strategy = keystone
#
network_api_class = nova.network.neutronv2.api.API
neutron_url = http://controller:9696
neutron_auth_strategy = keystone
neutron_admin_tenant_name = service
neutron_admin_username = neutron
neutron_admin_password = Fusiondc10!
neutron_admin_auth_url = http://controller:35357/v2.0
linuxnet_interface_driver = nova.network.linux_net.LinuxOVSInterfaceDriver
firewall_driver = nova.virt.firewall.NoopFirewallDriver
security_group_api = neutron
service_neutron_metadata_proxy = true
neutron_metadata_proxy_shared_secret = e773b738013e04efd8f1
#
libvirt_images_type=rbd
libvirt_images_rbd_pool=vms
libvirt_images_rbd_ceph_conf=/etc/ceph/ceph.conf
rbd_user=cinder
rbd_secret_uuid=94b68ab4-a9d7-4a53-8d49-c80fde07a2bc
libvirt_inject_password=false
libvirt_inject_key=false
libvirt_inject_partition=-1
libvirt_live_migration_flag="VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_PERSIST_DEST"
[database]
connection = mysql://nova:Fusiondc10!@controller/nova
[keystone_authtoken]
auth_uri = http://controller:5000
auth_host = controller
auth_port = 35357
auth_protocol = http
admin_tenant_name = service
admin_user = nova
admin_password = Fusiondc10!
root@controller:~#
neutron.conf on Network Server
root@neutron1:~# egrep -v '^$|^#' /etc/neutron/neutron.conf
[DEFAULT]
verbose = True
state_path = /var/lib/neutron
lock_path = $state_path/lock
bind_host = 10.20.81.101 ...
Here is more info I found that points towards an MTU fragmentation issue. Here is what happens in tcpdump when I'm doing curl to 169.254.169.254/2009-04-04/meta-data from my cirros guest. It returns successfully. But it takes about 3 minutes. This causes the timeout during boot up.
I see tons of cksum (incorrect) and [DF] "do not fragment" flags. Changing the mtu of the VM doesn't help.
(more)have you had any success in solving this issue?
I finally made it worked last week by clean installing under the updated OpenStack Installation Guide for Ubuntu per Sep 19, 2014. There was an addition to the dhcp_agent.ini on Network node regarding the MTU. All is good.:)