Ask Your Question

Weird VM migration behavior

asked 2016-08-05 16:18:25 -0500

fifi gravatar image

updated 2016-08-06 19:48:40 -0500

I have a four-node Juno (1 controller, 1 network, and 2 compute nodes) with live migration enabled on it. I can perform block live migration with no problem and in a reasonable time. My management network is also a 1GB network.

The only weird thing I faced is that when a VM migrates for the first time after its creation, the migration time along with the down time are considerably shorter than those in any further migration attempts. For example, for a VM with a "Small" flavor, the migration time and down time for the first migration attempt are 80 ms and 206 ms respectively. However, for the second migration attempt and any attempt beyond that, the migration time and down time range between 120-130 ms and 300-500 ms respectively. This also happens for the other VMs with different flavors.

Please notice that I soft reboot the VM after each migration attempt and there is at least 3 minutes gap between each migration. There is also just one VM in my cluster at a time and the VM is not loaded.

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted

answered 2016-08-07 02:28:02 -0500

pawel-koniszewski gravatar image

So this isn't weird at all, I'd even say that it is expected behavior. I assume that your OpenStack installation is based on QEMU and that you have default configuration.

You start a fresh VM. At the very beginning it uses 0 MB of memory. During boot time, while loading OS and configuring all the stuff, it starts to reserve more and more memory. Basically when VM wants more memory QEMU is calling malloc so that it allocates block of N bytes in memory for this particular VM. Let's say that after first boot your VM is using 400 MB and this is exactly (circa about) what QEMU allocated using malloc. If you live migrate such VM back and forth it should take the same time. However, if you restart a VM, during the process of restarting it might request more memory, so that QEMU will call malloc again. The point here is that (by default) once memory is requested by VM it is never relinquished to the host. It means that after reboot your VM might be using (from VM perspective) 400 MB of memory again, but in fact it will have, e.g., 800 MB of memory allocated, due to higher demand for memory during the reboot. This means that QEMU needs to send 800 MB of memory when live migrating VM to another host, not 400 MB as it was before reboot. This is IMO why live migration after the reboot takes longer. You can also observe network traffic and amount of data transferred and try to confirm my theory (using e.g. iftop).

You can also try to configure ballooning to automatically relinquish memory to the host if you really need it.

edit flag offensive delete link more

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower


Asked: 2016-08-05 16:18:25 -0500

Seen: 441 times

Last updated: Aug 07 '16