How stable is stable/diablo?

I have a long-running (several months) cluster based on stable/diablo, ubuntu 11.10 and kvm that is configured similarly to trystack. It usually works fine ( and I run devstack on the vms) but several times another user has reported that they lose ssh connectivity to various vms. When they try to nova roboot them the vms get stuck in REBOOT state as reported by nova list but some are actually still running. I go to the compute nodes and usually, but not always discover that 'virsh list' hangs. Sometimes restarting libvirt fixes the problem and sometimes I have to reboot the compute nodes. After that, most vms recover after being rebooted. I don't feel like I could deploy this technology in production and wondered what the trystack experience is or if any one has deployed diablo for a long period and not had these kind of problems?

I don't know whether these issues are hypervisor-specific either. This has happened 3 or 4 times since the cluster started running.

1 answer

Its not the answer, but when instance hangs you may use euca-reboot-instance or stop nova-compute service and make virsh destroy <domain_name>. then virsh start <d_n>. and start nova-compute. (sometimes better than restart all libvirt, or node). Didnt noticed any ssh problems for months, but there were several planned or other nodes restart. Which also thing - when nova/instances live on nfs directory. It`s not all that good for me. Sometimes instance OS may go to sleep or other - I saw for WIn VM.

