RetryFilter: What to do with a failed host?

asked 2015-04-28 19:09:14 -0500

If I understand RetryFilter correctly, a host that once failed will be filtered out forever. Is that correct? If so, how can I convince the filter that the host is available again?

By the way, on my Juno installation on Centos 7 I find the following in the nova-scheduler.log:

Host [u'compute1', u'compute1'] fails.  Previously tried hosts: [[u'compute1', u'compute1']] host_passes /usr/lib/python2.7/site-packages/nova/scheduler/filters/

It's a bit unclear to me what that means. First it says that compute1 fails, then it says that it passes. Which of the two?

answered 2017-11-14 23:11:50 -0500

Sam Song gravatar image

The log doesn't mean compute1 has ever passed but just tried and failed. I have the same problem as yours. Do you find out how to convince the filter the the host is available again?

Today I know more than 2.5 years ago. It works as follows: compute1 was selected by scheduler, instance launch on compute1 failed, scheduler tries again but this time RetryFilter excludes compute1 to avoid endless loops.

At the next instance launch, compute1 is again included in the list.

Bernd Bausch gravatar imageBernd Bausch ( 2017-11-21 17:59:09 -0500 )edit

answered 2015-04-29 00:54:44 -0500

dbaxps gravatar image

updated 2015-04-29 00:55:55 -0500

It might be Nova reaction on KVM/Libvirt failure to start instance on Compute Node.

1. Check `nova hypervisor-list`  contains compute node names 
2. Check openstcak-nova-compute is up and running on Compute nodes
3. Check instances logs under /var/log/libvirt/qemu ( or /var/log/libvirt/libxl case of Xen)

Via my experience with nova libvirt-xen driver real errors to troubleshoot came from libvirtd daemon on Compute node. Nova just reported "unable schedule instance"

