Revision history [back]

click to hide/show revision 1
initial version

What about "controller" migration ?

Exactly. Since i dont seem to find this anywere ( im really sorry if the answer existed and i couldnt find it ), supose a controller fails ( hardware failure ), so, we had on one of the compute nodes, the other necessary services, so ... what we did ?

1 disabled network & scheduler services on the already dead controller using "nova-manage service disable"

2 modified the nova.conf file ( all our compute nodes are using the same nova.conf using an NFS share ) to point to the "new" controller

3 started the network & scheduler services on the "new" controller

4 ran "nova-manage service list" to see that everything was just fine :

melicloud@compute10:~$ sudo nova-manage service list compute10 nova-scheduler enabled :-) 2011-08-11 03:17:44 compute10 nova-network enabled :-) 2011-08-11 03:17:46 compute10 nova-compute enabled :-) 2011-08-11 03:17:37 compute11 nova-compute enabled :-) 2011-08-11 03:17:37 deadcontroller nova-scheduler disabled XXX 2011-07-21 21:53:11 deadcontroller nova-network disabled XXX 2011-07-21 21:53:00

restarted all libvirt-bin & nova-compute services on all nodes and what happened ? when we tried to start a new instance, the compute node is selected by the new scheduler, but from the compute node side, we see that it keeps trying to connect to the "deadcontroller" nova-network service to get an available ip address for the instance, so the process fails.

The thing is, we didnt find anything on the nova database (mysql) that point us that we did something wrong. So, anyone had the chance to migrate a failed controller or imagine by any chance what we are doing wrong ?

Best regards.