cold migration on Queens with local disks not working due to permission issues
Hello,
I am running OpenStack Queens on a Fedora 28 environment as a test environment to understand how it works.
One of the items I have been struggling with is cold migration with local disks.
Here are my steps:
- Launch a VM with 20GB local disk and 20GB ephemeral storage. The VM comes up and is functional.
- Initiate a cold migration using nova migrate <instance-uuid> --poll
- The VM itself is assigned to a new hyper-visor, the disk is copied as well, however, the permissions for the disk are different on the destination hyper-visor.
Here is the output of openstack server show <uuid> after launching the VM:
[root@control2015 ~]# openstack server show ebde3173-cc7f-401f-8d6c-16f89621a285 -f json | jq .
{
"OS-EXT-STS:task_state": null,
"addresses": "network-72=10.189.72.102",
"image": "OL-7 (b5add240-b20b-452e-855f-ef01ed49d138)",
"OS-EXT-STS:vm_state": "active",
"OS-EXT-SRV-ATTR:instance_name": "instance-00000024",
"OS-SRV-USG:launched_at": "2019-10-21T18:00:40.000000",
"flavor": "test-flavor (7ec79eb8-dc43-4b4a-8ac2-81e18b67ae82)",
"id": "ebde3173-cc7f-401f-8d6c-16f89621a285",
"security_groups": "name='open'",
"volumes_attached": "",
"user_id": "c17f28d0bd654d9ba04671ca72ee625f",
"OS-DCF:diskConfig": "AUTO",
"accessIPv4": "",
"accessIPv6": "",
"progress": 0,
"OS-EXT-STS:power_state": "Running",
"OS-EXT-AZ:availability_zone": "devstack2",
"config_drive": "",
"status": "ACTIVE",
"updated": "2019-10-21T18:00:40Z",
"hostId": "674690363f457e075023e145885db5f1b8f174a516891854fcc1c7f0",
"OS-EXT-SRV-ATTR:host": "compute2004",
"OS-SRV-USG:terminated_at": null,
"key_name": "kkanjee-general",
"properties": "",
"project_id": "8bba4dea354648d0ada7c4781c6306a5",
"OS-EXT-SRV-ATTR:hypervisor_hostname": "compute2004",
"name": "test",
"created": "2019-10-21T18:00:28Z"
}
The disk permissions and ownership on compute2004 where the VM was originally launched:
[root@compute2004 ~]# ls -alrt /var/lib/nova/instances/ebde3173-cc7f-401f-8d6c-16f89621a285
total 41943100
-rw-r--r-- 1 nova nova 162 Oct 21 14:00 disk.info
drwxrwxr-x 7 nova nova 193 Oct 21 14:00 ..
drwxr-xr-x 2 nova nova 71 Oct 21 14:00 .
-rw------- 1 root root 55760 Oct 21 14:01 console.log
-rw-r--r-- 1 qemu qemu 393216 Oct 21 14:01 disk.eph0
-rw-r--r-- 1 qemu qemu 121438208 Oct 21 14:06 disk
The migration command fails:
[root@control2015 ~]# nova migrate ebde3173-cc7f-401f-8d6c-16f89621a285 --poll
Server migrating... 0% complete
Error migrating server
ERROR (ResourceInErrorState):
The VM is scheduled on a different hypervisor as can be seen below, the disk is also transfered but the permissions and ownerships are different causing the permission denied error:
[root@control2015 ~]# openstack server show ebde3173-cc7f-401f-8d6c-16f89621a285 -f json | jq .
{
"OS-EXT-STS:task_state": null,
"addresses": "network-72=10.189.72.102",
"image": "OL-7 (b5add240-b20b-452e-855f-ef01ed49d138)",
"OS-EXT-STS:vm_state": "error",
"OS-EXT-SRV-ATTR:instance_name": "instance-00000024",
"OS-SRV-USG:launched_at": "2019-10-21T18:00:40.000000",
"flavor": "test-flavor (7ec79eb8-dc43-4b4a-8ac2-81e18b67ae82)",
"id": "ebde3173-cc7f-401f-8d6c-16f89621a285",
"security_groups": "name='open'",
"volumes_attached": "",
"user_id": "c17f28d0bd654d9ba04671ca72ee625f",
"OS-DCF:diskConfig": "AUTO",
"accessIPv4": "",
"accessIPv6": "",
"OS-EXT-STS:power_state": "Running",
"OS-EXT-AZ:availability_zone": "devstack2",
"config_drive": "",
"status": "ERROR",
"updated": "2019-10-21T18:08:54Z",
"hostId": "1209685e863f9c7368ac3407e593289ef88a9cc2aa3f12f6f7037fbd",
"OS-EXT-SRV-ATTR:host": "compute2001",
"OS-SRV-USG:terminated_at": null,
"key_name": "kkanjee-general",
"properties": "",
"project_id": "8bba4dea354648d0ada7c4781c6306a5",
"OS-EXT-SRV-ATTR:hypervisor_hostname": "compute2001",
"name": "test",
"created": "2019-10-21T18:00:28Z",
"fault": {
"message": "libvirtError",
"code": 500,
"details": "Traceback (most recent call last):\n File \"/usr/lib/python2.7/site-packages/nova/compute/manager.py\", line 203, in decorated_function\n return function(self, context, *args, **kwargs)\n File \"/usr/lib/python2.7/site-packages/nova/compute/manager.py\", line 4570, in finish_resize\n self._revert_allocation(context, instance, migration)\n File \"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py\", line 220, in __exit__\n self.force_reraise()\n File \"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py\", line 196, in force_reraise\n six.reraise(self.type_ ...
I made some progress by changing nova.conf and using rsync as the remote_file_transport driver:
The above allows cold migrations to complete successfully. However, the permission on the destination are not the same as the source.
Check that the SELinux permissions are correct for the /var/lib/nova directory on the dest.