Ask Your Question
1

Afetr a successful Live migration, instance is still in migration status

asked 2016-09-15 11:30:36 -0500

fifi gravatar image

updated 2016-09-15 11:33:05 -0500

Hi,

I successfully block live migrate an instance. I checked the nova-compute log files on both source and destination compute nodes. They show that migration was successful. I also use virsh list on destination compute node and can see that migrated vm is running there. I can also ping the migrated vm and ssh to it. However, in horizon, the migrated vm is still in migration status. I have restarted all nova and horizon services on controller and it didn't help. If I restart nova-compute or libvirt-bin on compute nodes, the vm goes to error state and I have to terminate it then. I will be grateful if any one let me know how I can solve this problem.

edit retag flag offensive close merge delete

Comments

what version?

darragh-oreilly gravatar imagedarragh-oreilly ( 2016-09-15 12:34:06 -0500 )edit

It's Juno.

fifi gravatar imagefifi ( 2016-09-15 12:51:48 -0500 )edit

can you try nova reset-state --active <server>

darragh-oreilly gravatar imagedarragh-oreilly ( 2016-09-16 10:00:49 -0500 )edit

I got this error:

Reset state for server compute1 failed: The server has either erred or is incapable of performing the requested operation. (HTTP 500) (Request-ID: req-9b951c20-6711-4b1a-bc48-04e9761ef9a6)
ERROR (CommandError): Unable to reset the state for the specified server(s)
fifi gravatar imagefifi ( 2016-09-16 12:54:48 -0500 )edit

I also run it for my second compute node and got this error:

Reset state for server compute2 failed: No server with a name or ID of 'compute2' exists.
ERROR (CommandError): Unable to reset the state for the specified server(s).
fifi gravatar imagefifi ( 2016-09-16 12:57:35 -0500 )edit

1 answer

Sort by ยป oldest newest most voted
0

answered 2016-09-20 16:42:19 -0500

fifi gravatar image

updated 2016-10-13 11:26:58 -0500

I found out that this problem has been caused by the nova conductor. The instance stuck in migrating task_state when unexpected exception happens and nova conductor cannot set the vm state to a proper one .

To change the problem I followed a patch offered at here

Here is my manager.py file (located at /usr/lib/python2.7/dist-packages/nova/conductor/) before and after applying this patch.

1- Before applying the patch:

def _live_migrate(self, context, instance, scheduler_hint,
                      block_migration, disk_over_commit):
        destination = scheduler_hint.get("host")
        try:
            live_migrate.execute(context, instance, destination,
                             block_migration, disk_over_commit)
        except (exception.NoValidHost,
                exception.ComputeServiceUnavailable,
                exception.InvalidHypervisorType,
                exception.InvalidCPUInfo,
                exception.UnableToMigrateToSelf,
                exception.DestinationHypervisorTooOld,
                exception.InvalidLocalStorage,
                exception.InvalidSharedStorage,
                exception.HypervisorUnavailable,
                exception.InstanceNotRunning,
                exception.MigrationPreCheckError,
                exception.LiveMigrationWithOldNovaNotSafe) as ex:
            with excutils.save_and_reraise_exception():
                # TODO(johngarbutt) - eventually need instance actions here
                request_spec = {'instance_properties': {
                    'uuid': instance['uuid'], },
                }
                scheduler_utils.set_vm_state_and_notify(context,
                        'compute_task', 'migrate_server',
                        dict(vm_state=instance['vm_state'],
                             task_state=None,
                             expected_task_state=task_states.MIGRATING,),
                        ex, request_spec, self.db)
        except Exception as ex:
            LOG.error(_('Migration of instance %(instance_id)s to host'
                       ' %(dest)s unexpectedly failed.'),
                       {'instance_id': instance['uuid'], 'dest': destination},
                       exc_info=True)
            raise exception.MigrationError(reason=ex)

    def build_instances(self, context, instances, image, filter_properties,
            admin_password, injected_files, requested_networks,
            security_groups, block_device_mapping=None, legacy_bdm=True):

2- After applying the patch:

def _live_migrate(self, context, instance, scheduler_hint,
                      block_migration, disk_over_commit):
        destination = scheduler_hint.get("host")
######################################################ADDED############################################################################
    def _set_vm_state(context, instance, ex, vm_state=None,  task_state=None):
        request_spec = {'instance_properties': {'uuid': instance['uuid'], },}
        scheduler_utils.set_vm_state_and_notify(context,'compute_task', 'migrate_server',
                            dict(vm_state=vm_state,task_state=task_state,expected_task_state=task_states.MIGRATING,)
                                 ,ex, request_spec, self.db)
########################################################END###########################################################################
        try:
            live_migrate.execute(context, instance, destination,
                             block_migration, disk_over_commit)
        except (exception.NoValidHost,
                exception.ComputeServiceUnavailable,
                exception.InvalidHypervisorType,
                exception.InvalidCPUInfo,
                exception.UnableToMigrateToSelf,
                exception.DestinationHypervisorTooOld,
                exception.InvalidLocalStorage,
                exception.InvalidSharedStorage,
                exception.HypervisorUnavailable,
                exception.InstanceNotRunning,
                exception.MigrationPreCheckError,
                exception.LiveMigrationWithOldNovaNotSafe) as ex:
            with excutils.save_and_reraise_exception():
                # TODO(johngarbutt) - eventually need instance actions here

#########################################################ADDED##################################################################################
                _set_vm_state(context, instance, ex, instance['vm_state'])
##########################################################END##################################################################################


#######################################################REMOVED#############################################################################
                #request_spec = {'instance_properties': {
                #    'uuid': instance['uuid'], },
                #}
                #scheduler_utils.set_vm_state_and_notify(context,
                #        'compute_task', 'migrate_server',
                #        dict(vm_state=instance['vm_state'],
                #             task_state=None,
                #             expected_task_state=task_states.MIGRATING,),
                #           ex, request_spec, self.db)
#########################################################END#######################################################################################

        except Exception as ex:
            LOG.error(_('Migration of instance %(instance_id)s to host'
                       ' %(dest)s unexpectedly failed.'),
                       {'instance_id': instance['uuid'], 'dest': destination},
                       exc_info=True)

#############################################################ADDED##################################################################################
            _set_vm_state(context, instance, ex, vm_states.ERROR,instance['task_state'])
###############################################################END####################################################################################

            raise exception.MigrationError(reason=ex)

    def build_instances(self, context, instances, image, filter_properties,
            admin_password, injected_files, requested_networks,
            security_groups, block_device_mapping=None, legacy_bdm=True):

After applying that, i restarted nova-api and nova-conductor. I also conducted block live migration for 10 consecutive time for different vms and I didn't face with the problem anymore. There is just one exception. If the keystone token expires in the middle of migration process, this problem occurs regardless of applying all those above-mentioned fixes.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

2 followers

Stats

Asked: 2016-09-15 11:30:36 -0500

Seen: 773 times

Last updated: Oct 13 '16