Ask Your Question
0

Why live migration when using XCP and openstack has so much downtime?

asked 2013-07-15 16:03:44 -0600

cyberang3l gravatar image

In my setup I use two XCP servers on top of Debian Wheezy (xcp-xapi package) and the openstack nova compute VM is running on Ubuntu 12.04 with OpenStack Grizzly.

I configured live migration based on the documentation and I had to apply some patches to solve these issues: https://bugs.launchpad.net/nova/+bug/... https://bugs.launchpad.net/nova/+bug/... https://bugs.launchpad.net/nova/+bug/...

Eventually the migration works, but I experience very long downtime. I made a simple script running in a while loop that it will simply print the time and ping from within the vm-on-migration to another pingable IP and this is the result:

Mon Jul 15 09:45:50 MDT 2013 64 bytes from 192.168.30.4: seq=0 ttl=64 time=0.523 ms Mon Jul 15 09:45:51 MDT 2013 64 bytes from 192.168.30.4: seq=0 ttl=64 time=0.504 ms Mon Jul 15 09:45:52 MDT 2013 64 bytes from 192.168.30.4: seq=0 ttl=64 time=0.520 ms Mon Jul 15 09:48:58 MDT 2013 64 bytes from 192.168.30.4: seq=0 ttl=64 time=0.569 ms Mon Jul 15 09:48:59 MDT 2013 64 bytes from 192.168.30.4: seq=0 ttl=64 time=0.510 ms Mon Jul 15 09:49:00 MDT 2013 64 bytes from 192.168.30.4: seq=0 ttl=64 time=0.484 ms

As you see one ping before the initiation of the migration is at 09:45:52 and the next one comes more than 3 minutes after at 09:48:58

+--------------------------------------+------+----------------------+-------------------+----------------------+--------+ root@controller:~# nova list --fields name,host,instance_name,networks,status +--------------------------------------+------+----------------------+-------------------+----------------------+--------+ | ID | Name | Host | Instance Name | Networks | Status | +--------------------------------------+------+----------------------+-------------------+----------------------+--------+ | d0958165-767e-425e-a9cd-ff7f501be76d | KVM1 | kvmcompute1 | instance-00000037 | novanet=192.168.30.4 | ACTIVE | | b69eeb2d-7737-40fb-a5b8-a71a582d8f73 | XCP1 | openstackxcpcompute2 | instance-00000044 | novanet=192.168.30.2 | ACTIVE | +--------------------------------------+------+----------------------+-------------------+----------------------+--------+ root@controller:~# nova live-migration b69eeb2d-7737-40fb-a5b8-a71a582d8f73 openstackxcpcompute1 root@controller:~# nova list --fields name,host,instance_name,networks,status +--------------------------------------+------+----------------------+-------------------+----------------------+-----------+ | ID | Name | Host | Instance Name | Networks | Status | +--------------------------------------+------+----------------------+-------------------+----------------------+-----------+ | d0958165-767e-425e-a9cd-ff7f501be76d | KVM1 | kvmcompute1 | instance-00000037 | novanet=192.168.30.4 | ACTIVE | | b69eeb2d-7737-40fb-a5b8-a71a582d8f73 | XCP1 | openstackxcpcompute2 | instance-00000044 | novanet=192.168.30.2 | MIGRATING | +--------------------------------------+------+----------------------+-------------------+----------------------+-----------+ root@controller:~# nova list --fields name,host,instance_name,networks,status +--------------------------------------+------+----------------------+-------------------+----------------------+--------+ | ID | Name | Host | Instance Name | Networks | Status | +--------------------------------------+------+----------------------+-------------------+----------------------+--------+ | d0958165-767e-425e-a9cd-ff7f501be76d | KVM1 | kvmcompute1 | instance-00000037 | novanet=192.168.30.4 | ACTIVE | | b69eeb2d-7737-40fb-a5b8-a71a582d8f73 | XCP1 | openstackxcpcompute1 | instance-00000044 | novanet=192.168.30.2 | ACTIVE | +--------------------------------------+------+----------------------+-------------------+----------------------+--------+

If I migrate exactly the same vm, using the command "xe vm-migrate vm=instance-00000044 host=xcpcompute2 live=true" directly from the hypervisor's console, the downtime is only 3 seconds as you see here:

Mon Jul 15 09:40:26 MDT 2013 64 bytes from 192.168.30.4: seq=0 ttl=64 time=0.492 ms Mon Jul 15 09:40:27 MDT 2013 64 bytes from 192.168.30.4: seq=0 ttl=64 time=0.610 ms Mon Jul 15 09:40:28 MDT 2013 64 bytes from 192.168.30.4: seq=0 ttl=64 time=0.753 ms Mon Jul 15 ... (more)

edit retag flag offensive close merge delete

8 answers

Sort by ยป oldest newest most voted
0

answered 2013-07-16 13:07:09 -0600

johngarbutt gravatar image

Ah, that would do it.

Good spot.

edit flag offensive delete link more
0

answered 2013-07-16 12:05:11 -0600

cyberang3l gravatar image

I used the statistical breakdowns of iperf to the hypervisors to check the number of packets during the migration and what I realized is that either when I migrate with the xe command or using openstack, the packets that are 1426-1500+ bytes are increasing for the same amount of time.

I guess this is when the contents of the memory are transfered to the other hypervisor so we send large TCP packets to finish as quickly as possible.

So the migration process takes the same amount of time no matter how it is initiated (reasonable conclusion).

The difference is that for this amount of time, when the migration is issued using openstack, the vm is inaccessible while when issuing the migration using the xe command there is no such problem.

edit flag offensive delete link more
0

answered 2013-07-15 16:43:23 -0600

johngarbutt gravatar image

Sorry about the bugs, no one is testing the XCP pool integration at the moment. There was talk of removing this functionality, so it would be good to better understand your use case for going this path, and not using local storage, with all shared storage managed by Cinder. In addition, more help debugging/testing this setup would be very welcome!

Short answer, I have no idea, nova is making the same call as the CLI: https://github.com/openstack/nova/blob/master/nova/virt/xenapi/vmops.py#L1890 (https://github.com/openstack/nova/blo...)

Probably need a few more details on what networking setup are you using in Nova? I suspect it could be the wait for nova to correct apply the latest networking rules to the XenServer OVS, but I could be very wrong on that one.

edit flag offensive delete link more
0

answered 2013-07-15 16:44:27 -0600

johngarbutt gravatar image

Pressing the correct button this time - needs more info.

edit flag offensive delete link more
0

answered 2013-07-15 18:51:02 -0600

cyberang3l gravatar image

Thanks for the answer John,

When talking about using local storage I guess you mean the block migration which needs support for the XenMotion feature, right? If this is the case, as I said I used xcp from the debian repository and as far as I read it is not updated to XCP 1.6 yet so I cannot use XenMotion. If not, can you please post some links to documentation for using local storage, with all shared storage managed by Cinder? How can this be configured?

A few more words about my setup:

My first attempt was to setup XCP with Quantum and OVS but I concluded that this is not supported at the moment and it will be first supported in the Havana release. I made a question for clarification here but I didn't get an answer: https://answers.launchpad.net/neutron...

Then I moved on and used Nova Network instead and FlatDHCP with bridges (no OVS) as described in the official documentation here: http://docs.openstack.org/grizzly/ope...

This works as advertised :)

Next step is the live migration which I just achieved today.

As for the help on debugging/testing, I can do that since I have the setup and I need to work with this for a project. If you have any request for more specific details, please let me know.

edit flag offensive delete link more
0

answered 2013-07-16 12:19:24 -0600

cyberang3l gravatar image

And I think I just found the problem:

If I initiate the migration from the console like this: "xe vm-migrate vm=instance-0000004a host=xcpcompute1" (notice there is not live=true) then I get the same behaviour as the one I get when I migrate using openstack.

So I guess that openstack is not sending a signal to perform a "live" migration but just a migration.

edit flag offensive delete link more
0

answered 2013-07-16 12:58:30 -0600

cyberang3l gravatar image

And what fixes the problem for me is this:

--- nova-orig/virt/xenapi/vmops.py 2013-07-15 14:21:05.532868954 +0200 +++ nova/virt/xenapi/vmops.py 2013-07-16 14:54:10.865301101 +0200 @@ -1727,7 +1727,7 @@ host_ref = self._get_host_opaque_ref(context, destination_hostname) self._session.call_xenapi("VM.pool_migrate", vm_ref, - host_ref, {}) + host_ref, { "live": "true" }) post_method(context, instance, destination_hostname, block_migration) except Exception:

edit flag offensive delete link more
0

answered 2013-07-16 13:36:16 -0600

cyberang3l gravatar image

Hi John,

Can you please elaborate a little bit more or direct me to documentation on what you mentioned in your previous post about using local storage instead of shared, with all shared storage managed by Cinder?

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2013-07-15 16:03:44 -0600

Seen: 92 times

Last updated: Jul 16 '13