Ask Your Question

Abnormally long VM live migration time in juno

asked 2016-07-08 15:10:43 -0600

fifi gravatar image

updated 2016-07-20 19:02:05 -0600

I have installed Juno on a 4 node topology (1 controller node, 1 neutron node, 2 compute nodes). All nodes have similar hardware configuration. They are Dell PowerEdge R720 machines with intel Xeon cpu (16 cores each), 4TB HDD, and 16 GB RAM. I also have Three separate physical networks in my Openstack. One for the management issues, the other one dedicated for VMs traffic and the last one is an external network which have access to the internet. You can see a schematic feature of my Openstack topology here.

I have configured my compute nodes for block live migration. I can successfully live migrate a tiny instance (Cirros-0.3.3-x86-64 image, 1 vcpu, 512 MB RAM, and 1GB disk) which is not loaded (nothing is installed or running on top of it). However, the migration takes around 90-100 seconds or even more while it should be something around 5-10 seconds at most. This is very large migration time, especially for a tiny instance like this. I checked many online resources and couldn't find any reason for that. By the way, when the migration process starts, I checked the migration process with the virsh. For the first 80 seconds (out of the whole migration time which is 90 seconds) nothing happens. In the last 10 seconds, suddenly migration starts progressing and then finishes very fast. It's noteworthy that I'm using QEMU and Libvirt.

I just wanna ask what causes this abnormal large time and what I can do to reduce it to a reasonable time. Is this a hypervisor problem or it's an orchestration problem from Openstack side? I have included my nova.conf file along with nova compute log files for both compute nodes (they just include logs from the time migration starts to the time it finishes).



nova-compute log for compute1 (migration destination):

nova-compute log for compute2 (migration source):

edit retag flag offensive close merge delete

2 answers

Sort by ยป oldest newest most voted

answered 2016-09-07 14:23:32 -0600

fifi gravatar image

Increasing the bandwidth solved the problem. There were no bottleneck in the network. With a 100Mbps LAN, almost the whole bandwidth was consumed just for the migration purpose.

edit flag offensive delete link more

answered 2016-07-12 22:02:14 -0600

kaustubh gravatar image

Live migration uses management network. So a possible bottleneck would be its bandwidth. Also, you might want to check if your host CPUs are consumed by some process(es).

You have configured live_migration_uri=qemu+tcp://%s/system. So, The %s will be the hostname of your node which should be resolved. Perhaps the DNS lookup is taking time?

edit flag offensive delete link more


Thanks for your answer. Since i) there is no load on the system, ii) the VM memory size is very small, and iii) there is only one VM in the whole system, the network cannot be a bottleneck. The DNS lookup works fine. However, in migration, the source compute node consumes more than 90% of its RAM.

fifi gravatar imagefifi ( 2016-07-13 12:56:04 -0600 )edit

While performing live migration, can you run top command on your source compute to see which process is using the RAM?

kaustubh gravatar imagekaustubh ( 2016-07-13 17:27:39 -0600 )edit

I'm usuing Ubuntu 14.04 lts with Gnome on compute nodes. The most cpu and memory consuming process are : libvirtd, and qemu-system.

fifi gravatar imagefifi ( 2016-07-13 19:35:49 -0600 )edit

During migration, the migration traffic load on management network is always a constant value (11MB/S==88Mbps). This has something to do with a hypervisor parameter called Maximum Transfer Rate Limit. I just dont know how to manipulate this value in my compute nodes' hypervisors.

fifi gravatar imagefifi ( 2016-07-20 19:01:27 -0600 )edit

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower


Asked: 2016-07-08 15:10:43 -0600

Seen: 367 times

Last updated: Sep 07 '16