Ask Your Question
0

live migration fails with nfs4 mounted /var/lib/nova/instances

asked 2011-07-13 17:21:41 -0500

p-spencer-davis gravatar image

I'm attempting to set up live migration of instances, I have two nodes in my pod, both running ubuntu 11.04, and using http://ppa.launchpad.net/nova-core/trunk/ubuntu (http://ppa.launchpad.net/nova-core/tr...) ppa to install nova. They have a public 10.4.78.0/24 network attached to eth0 and a 192.168.0.0/24 private network. I'm using glance for image storage. The master node with glance, nova-api, nova-compute, nova-network and nova-volume on it is sharing /var/lib/nova/instances with the compute node via nfs4. Both nodes have kvm virtualization enabled.

I can start instances on the master node, but not the compute node and I can not live migrate from the master to the compute. When I run

nova-manage vm live_migrate i-00000016 csvirt-2

I get the following errors in csvirt-2's /var/log/nova/nova-compute.log

2011-07-13 10:58:42,549 DEBUG nova.compute.manager [-] instance network_info: |[[{u'injected': False, u'bridge': u'br_vlan1', u'cidr_v6': None, u'cidr': u'172.16.1.0/24', u'id': 1}, {u'label': u'vlan1', u'broadcast': u'172.16.1.255', u'ips': [{u'ip': u'172.16.1.9', u'netmask': u'255.255.255.0', u'enabled': u'1'}], u'mac': u'02:16:3e:62:f0:91', u'rxtx_cap': 0, u'dns': [None], u'gateway': u'172.16.1.7'}]]| from (pid=1037) _run_instance /usr/lib/pymodules/python2.7/nova/compute/manager.py:295 2011-07-13 10:58:42,553 DEBUG nova.utils [-] Attempting to grab semaphore "ensure_vlan" for method "ensure_vlan"... from (pid=1037) inner /usr/lib/pymodules/python2.7/nova/utils.py:600 2011-07-13 10:58:42,553 DEBUG nova.utils [-] Attempting to grab file lock "ensure_vlan" for method "ensure_vlan"... from (pid=1037) inner /usr/lib/pymodules/python2.7/nova/utils.py:605 2011-07-13 10:58:42,554 DEBUG nova.utils [-] Running cmd (subprocess): ip link show dev vlan1 from (pid=1037) execute /usr/lib/pymodules/python2.7/nova/utils.py:143 2011-07-13 10:58:42,558 DEBUG nova.utils [-] Result was 255 from (pid=1037) execute /usr/lib/pymodules/python2.7/nova/utils.py:161 2011-07-13 10:58:42,558 DEBUG nova.linux_net [-] Starting VLAN inteface vlan1 from (pid=1037) ensure_vlan /usr/lib/pymodules/python2.7/nova/network/linux_net.py:465 2011-07-13 10:58:42,559 DEBUG nova.utils [-] Running cmd (subprocess): sudo vconfig set_name_type VLAN_PLUS_VID_NO_PAD from (pid=1037) execute /usr/lib/pymodules/python2.7/nova/utils.py:143 2011-07-13 10:58:42,713 DEBUG nova.utils [-] Running cmd (subprocess): sudo vconfig add eth1 1 from (pid=1037) execute /usr/lib/pymodules/python2.7/nova/utils.py:143 2011-07-13 10:58:42,722 DEBUG nova.utils [-] Running cmd (subprocess): sudo ip link set vlan1 up from (pid=1037) execute /usr/lib/pymodules/python2.7/nova/utils.py:143 2011-07-13 10:58:42,733 DEBUG nova.utils [-] Attempting to grab semaphore "ensure_bridge" for method "ensure_bridge"... from (pid=1037) inner /usr/lib/pymodules/python2.7/nova/utils.py:600 2011-07-13 ... (more)

edit retag flag offensive close merge delete

13 answers

Sort by » oldest newest most voted
0

answered 2012-06-10 02:28:17 -0500

Thanks, it's worked for me.

/etc/fstab cloud03:/var/lib/nova/instances /var/lib/nova/instances nfs defaults,nfsvers=3 0 0

It's a problem with nfsv4?

edit flag offensive delete link more
0

answered 2011-07-13 23:44:38 -0500

p-spencer-davis gravatar image

On further reading, I set entries for /etc/hosts for both nodes, for the public and private interfaces I edited /etc/default/libvirt-bin as follows from: libvirtd_opts="-d" to: libvirtd_opts="-d -l"

and made sure that both nodes had the same uids for nova

root@csvirt-1:~# id nova uid=107(nova) gid=65534(nogroup) groups=65534(nogroup),114(libvirtd) csadmin@csvirt-2:~$ id nova uid=107(nova) gid=65534(nogroup) groups=65534(nogroup),114(libvirtd)

I ran updates

root@csvirt-1:~# nova-manage version list 2011.3-dev (2011.3-workspace:tarmac-20110713210648-78jfe9lv1w9r29c7)

and just to be on the safe side restarted both servers.

When I started an instance it went to the compute only node and failed to start with the following error: 2011-07-13 19:30:28,353 DEBUG nova.rpc [-] received {u'_context_request_id': u'5ZP4AWS0NLP3WX3F-HCL', u'_context_read_deleted': False, u'args': {u'instance_id': 31}, u'_context_is_admin': True, u'_context_timestamp': u'2011-07-13T23:30:28Z', u'_context_user': u'cscloud', u'method': u'terminate_instance', u'_context_project': u'cscloud-base', u'_context_remote_address': u'10.4.78.190'} from (pid=1155) process_data /usr/lib/pymodules/python2.7/nova/rpc.py:202 2011-07-13 19:30:28,353 DEBUG nova.rpc [-] unpacked context: {'timestamp': u'2011-07-13T23:30:28Z', 'msg_id': None, 'remote_address': u'10.4.78.190', 'project': u'cscloud-base', 'is_admin': True, 'user': u'cscloud', 'request_id': u'5ZP4AWS0NLP3WX3F-HCL', 'read_deleted': False} from (pid=1155) _unpack_context /usr/lib/pymodules/python2.7/nova/rpc.py:451 2011-07-13 19:30:28,353 INFO nova.compute.manager [5ZP4AWS0NLP3WX3F-HCL cscloud cscloud-base] check_instance_lock: decorating: |<function terminate_instance="" at="" 0x21d7de8="">| 2011-07-13 19:30:28,353 INFO nova.compute.manager [5ZP4AWS0NLP3WX3F-HCL cscloud cscloud-base] check_instance_lock: arguments: |<nova.compute.manager.computemanager object="" at="" 0x1bda8d0="">| |<nova.rpc.rpccontext object="" at="" 0x371cf50="">| |31| 2011-07-13 19:30:28,354 DEBUG nova.compute.manager [5ZP4AWS0NLP3WX3F-HCL cscloud cscloud-base] instance 31: getting locked state from (pid=1155) get_lock /usr/lib/pymodules/python2.7/nova/compute/manager.py:959 2011-07-13 19:30:28,390 INFO nova.compute.manager [5ZP4AWS0NLP3WX3F-HCL cscloud cscloud-base] check_instance_lock: locked: |False| 2011-07-13 19:30:28,390 INFO nova.compute.manager [5ZP4AWS0NLP3WX3F-HCL cscloud cscloud-base] check_instance_lock: admin: |True| 2011-07-13 19:30:28,391 INFO nova.compute.manager [5ZP4AWS0NLP3WX3F-HCL cscloud cscloud-base] check_instance_lock: executing: |<function terminate_instance="" at="" 0x21d7de8="">| 2011-07-13 19:30:28,428 AUDIT nova.compute.manager [5ZP4AWS0NLP3WX3F-HCL cscloud cscloud-base] Terminating instance 31 2011-07-13 19:30:28,428 DEBUG nova.rpc [-] Making asynchronous cast on network... from (pid=1155) cast /usr/lib/pymodules/python2.7/nova/rpc.py:554 2011-07-13 19:30:28,466 DEBUG nova.utils [-] Attempting to grab semaphore "iptables" for method "apply"... from (pid=1155) inner /usr/lib/pymodules/python2.7/nova/utils.py:600 2011-07-13 19:30:28,467 DEBUG nova.utils [-] Attempting to grab file lock "iptables" for method "apply"... from (pid=1155) inner /usr/lib/pymodules/python2.7/nova/utils.py:605 2011-07-13 19:30:28,467 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-save -t filter from (pid=1155) execute /usr/lib/pymodules/python2.7/nova/utils.py:143 2011-07-13 19:30:28,474 INFO nova.virt.libvirt_conn [-] Instance instance-0000001f destroyed successfully. 2011-07-13 19:30:28,478 ... (more)

edit flag offensive delete link more
0

answered 2011-07-13 23:57:10 -0500

p-spencer-davis gravatar image

After the above process I was able to get an instance to run on the master node and attempted to migrate it to the compute node using nova-manage

nova-manage vm live_migration i-00000022 csvirt-2 Migration of i-00000022 initiated.Check its progress using euca-describe-instances.

it failed silently on the master and produced the following on the compute node:

2011-07-13 19:51:22,852 DEBUG nova.rpc [-] received {u'_msg_id': u'3cd149504434430db40709b417b07de9', u'_context_read_deleted': False, u'_context_request_id': u'IZFUV30YHBK4U6JQ2BMI', u'_context_timestamp': u'2011-07-13T23:51:22Z', u'_context_is_admin': True, u'_context_user': None, u'method': u'create_shared_storage_test_file', u'_context_project': None, u'_context_remote_address': None} from (pid=1155) process_data /usr/lib/pymodules/python2.7/nova/rpc.py:202 2011-07-13 19:51:22,852 DEBUG nova.rpc [-] unpacked context: {'timestamp': u'2011-07-13T23:51:22Z', 'msg_id': u'3cd149504434430db40709b417b07de9', 'remote_address': None, 'project': None, 'is_admin': True, 'user': None, 'request_id': u'IZFUV30YHBK4U6JQ2BMI', 'read_deleted': False} from (pid=1155) _unpack_context /usr/lib/pymodules/python2.7/nova/rpc.py:451 2011-07-13 19:51:23,038 DEBUG nova.compute.manager [-] Creating tmpfile /var/lib/nova/instances/tmpm8_vWK to notify to other compute nodes that they should mount the same storage. from (pid=1155) create_shared_storage_test_file /usr/lib/pymodules/python2.7/nova/compute/manager.py:1126 2011-07-13 19:51:23,321 DEBUG nova.rpc [-] received {u'_msg_id': u'858d833cf1014f398fb24e7eea8adbb3', u'_context_read_deleted': False, u'_context_request_id': u'IZFUV30YHBK4U6JQ2BMI', u'args': {u'filename': u'tmpm8_vWK'}, u'_context_is_admin': True, u'_context_timestamp': u'2011-07-13T23:51:22Z', u'_context_user': None, u'method': u'cleanup_shared_storage_test_file', u'_context_project': None, u'_context_remote_address': None} from (pid=1155) process_data /usr/lib/pymodules/python2.7/nova/rpc.py:202 2011-07-13 19:51:23,322 DEBUG nova.rpc [-] unpacked context: {'timestamp': u'2011-07-13T23:51:22Z', 'msg_id': u'858d833cf1014f398fb24e7eea8adbb3', 'remote_address': None, 'project': None, 'is_admin': True, 'user': None, 'request_id': u'IZFUV30YHBK4U6JQ2BMI', 'read_deleted': False} from (pid=1155) _unpack_context /usr/lib/pymodules/python2.7/nova/rpc.py:451 2011-07-13 19:51:23,452 DEBUG nova.rpc [-] received {u'_msg_id': u'd9bd3c99ca00412ca5fb3e25eb0dbf79', u'_context_read_deleted': False, u'_context_request_id': u'IZFUV30YHBK4U6JQ2BMI', u'args': {u'cpu_info': u'{"vendor": "Intel", "model": "Westmere", "arch": "x86_64", "features": ["rdtscp", "pdpe1gb", "dca", "xtpr", "tm2", "est", "vmx", "ds_cpl", "monitor", "pbe", "tm", "ht", "ss", "acpi", "ds", "vme"], "topology": {"cores": "6", "threads": "2", "sockets": "2"}}'}, u'_context_is_admin': True, u'_context_timestamp': u'2011-07-13T23:51:22Z', u'_context_user': None, u'method': u'compare_cpu', u'_context_project': None, u'_context_remote_address': None} from (pid=1155) process_data /usr/lib/pymodules/python2.7/nova/rpc.py:202 2011-07-13 19:51:23,452 DEBUG nova.rpc [-] unpacked context: {'timestamp': u'2011-07-13T23:51:22Z', 'msg_id': u'd9bd3c99ca00412ca5fb3e25eb0dbf79', 'remote_address': None, 'project': None, 'is_admin': True, 'user': None, 'request_id': u'IZFUV30YHBK4U6JQ2BMI', 'read_deleted': False} from (pid=1155) _unpack_context /usr/lib/pymodules/python2.7/nova/rpc.py:451 2011-07-13 19:51:23,452 INFO nova.virt.libvirt_conn [-] Instance launched has CPU info: {"vendor": "Intel", "model": "Westmere", "arch": "x86_64", "features": ["rdtscp", "pdpe1gb", "dca", "xtpr", "tm2", "est", "vmx", "ds_cpl", "monitor", "pbe", "tm", "ht", "ss", "acpi", "ds", "vme"], "topology": {"cores": "6", "threads": "2", "sockets": "2"}} 2011-07-13 19:51:23,453 INFO nova.virt.libvirt_conn [-] to xml... :<cpu> <arch>x86_64</arch> <model>Westmere</model> <vendor ... (more)

edit flag offensive delete link more
0

answered 2011-07-14 00:00:37 -0500

p-spencer-davis gravatar image

Here is the log from /var/log/libvirt/qemu/instance-00000022.log

2011-07-13 19:47:12.512: starting up LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-0.14 -enable-kvm -m 2048 -smp 1,sockets=1,core s=1,threads=1 -name instance-00000022 -uuid 6c22f054-25e4-3f5d-3274-286692e1feec -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu /instance-00000022.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=readline -rtc base=utc -boot c -kernel /var/lib/nova/instances/instance-00000022/ kernel -append root=/dev/vda console=ttyS0 -drive file=/var/lib/nova/instances/instance-00000022/disk,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk -pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -drive file=/var/lib/nova/instances/instance-00000022/disk.local,if=none,id=drive-virtio-disk1,for mat=qcow2 -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1 -netdev tap,fd=17,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0, mac=02:16:3e:1f:ce:b0,bus=pci.0,addr=0x3 -chardev file,id=charserial0,path=/var/lib/nova/instances/instance-00000022/console.log -device isa-serial,chardev=charser ial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -usb -vnc 0.0.0.0:0 -k en-us -vga cirrus -device virtio-balloon-pci,i d=balloon0,bus=pci.0,addr=0x6 char device redirected to /dev/pts/1 kvm: -device rtl8139,netdev=hostnet0,id=net0,mac=02:16:3e:1f:ce:b0,bus=pci.0,addr=0x3: pci_add_option_rom: failed to find romfile "pxe-rtl8139.bin"

edit flag offensive delete link more
0

answered 2011-07-14 00:05:44 -0500

p-spencer-davis gravatar image

According to http://docs.openstack.org/cactus/openstack-compute/admin/content/configuring-live-migrations.html (http://docs.openstack.org/cactus/open...) the file permissions on /var/lib/nova/instances should be

ls -ld NOVA-INST-DIR/instances/

drwxr-xr-x 2 root root 4096 2010-12-07 14:34 nova-install-dir/instances/

I have

root@csvirt-1:/var/log/libvirt/qemu# ls -ld /var/lib/nova/instances/ drwxr-xr-x 4 nova root 4096 2011-07-13 19:51 /var/lib/nova/instances/

Is this the issue, or is the documentation out of date?

edit flag offensive delete link more
0

answered 2011-07-14 00:23:46 -0500

p-spencer-davis gravatar image

The user for libvirt-qemu is uid 105 on the master and 106 on the compute nodes, and creation of new virtual machines on the compute node fails as uid 106 (libvirt-qemu) doesn't have write permission on the nfs mounted /var/lib/nova/instances...

Here is the error I get when I try to manually migrate i-00000022 with virsh:

csadmin@csvirt-1:~$ virsh migrate --live instance-00000022 qemu+ssh://csvirt-2/system The authenticity of host 'csvirt-2 (10.4.78.172)' can't be established. ECDSA key fingerprint is ad:17:f3:d4:29:01:f7:bb:91:ad:22:8e:c1:49:b3:cb. Are you sure you want to continue connecting (yes/no)? yes csadmin@csvirt-2's password: error: unable to set user and group to '106:112' on '/var/lib/nova/instances/instance-00000022/disk': Invalid argument

edit flag offensive delete link more
0

answered 2011-07-14 02:14:01 -0500

p-spencer-davis gravatar image

I unmounted the nfs share on the compute node, disabled nova-compute on that node and shutdown libvirt-bin and nova-compute on the compute node. I then used /usr/sbin/usermod to change the uid of libvirt-qemu and nova to be the same on both the master and compute nodes. I then restarted idmap on the master node and nfs mounted /var/lib/nova/instances on the compute node, restart libvirt-bin and nova-compute on the compute node and re-enabled nova-compute on the compute node.

Now, I can start instances on both nodes and they work properly. I can manually use virsh and qemu+ssh to migrate instances between nodes, but not nova-manage. I believe that I do not have libvirt properly configured for qemu+tcp.

Am I the only one having issues with the uids for these services? Would it make sense to set aside some system uids for nova and libvirt-qemu so that they are consistent across installations, or to make a note of this in the install guides? Is this just something that I should have been aware of when installing the services?

edit flag offensive delete link more
0

answered 2011-07-14 02:14:42 -0500

p-spencer-davis gravatar image

TLDR, Not a problem with nova

edit flag offensive delete link more
0

answered 2012-04-03 12:48:13 -0500

mandarvaze gravatar image

I ran into the same issue - but this is several months after above problem was resolved. This is close to Essex-release

I am not trying live migration.

I have nfs mounted instances_path - so when I try to spawn an instance I run into the above errors. Especially following :

File "/usr/lib/python2.7/dist-packages/libvirt.py", line 372, in createWithFlags 40842 2012-04-03 05:42:27 TRACE nova.rpc.amqp if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self) 40843 2012-04-03 05:42:27 TRACE nova.rpc.amqp libvirtError: internal error Process exited while reading console log output: chardev: opening backend "file" failed

But as you can see below, several files are created in this folder, so I am not sure if mine if permissions issue (Else none of the files would get created) The problem is reported when libvirt tries to write to console.log (File itself is created with correct permissions - just that this is zero byte file)

mandar@ubuntu-dev-mandar:/nfs_shared_instances_path/instance-00000005$ ll total 10944 drwxrwxr-x 2 mandar libvirtd 4096 2012-04-03 05:42 ./ drwxrwxrwx 4 root root 4096 2012-04-03 05:42 ../ -rw-rw---- 1 mandar libvirtd 0 2012-04-03 05:42 console.log -rw-r--r-- 1 mandar libvirtd 6291968 2012-04-03 05:42 disk -rw-rw-r-- 1 mandar libvirtd 4731440 2012-04-03 05:42 kernel -rw-rw-r-- 1 mandar libvirtd 1067 2012-04-03 05:42 libvirt.xml -rw-rw-r-- 1 mandar libvirtd 2254249 2012-04-03 05:42 ramdisk

I'm suspecting : https://bugs.launchpad.net/ubuntu/maverick/+source/libvirt/+bug/632696 (https://bugs.launchpad.net/ubuntu/mav...) But I the above doesn't show itself in non-NFS setup

edit flag offensive delete link more
0

answered 2012-06-19 08:44:46 -0500

We got the same error and after a lot of searchs we have remounted the partition on NFS4 to NFS3. We algo have put the idmpad user/group to nova/nova and finally ...

Itś working now

Thanks for the info, it was very helpful.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2011-07-13 17:21:41 -0500

Seen: 782 times

Last updated: Jun 19 '12