Revision history [back]

click to hide/show revision 1
initial version

Instances with large images sometimes fail to start/spawn

OpenStack Kilo 2015.1.1 (Mirantis)
Glance with Swift as backend (running on controllers)

When creating instances with a CentOS 6.6 image sized at 1.1 GB it fails consistently for all compute nodes but one. However when using an Ubuntu 14.04 image sized at ~250 MB it works all the time. This setup used to work, so it may have been triggered by a config change or something else.

Looking at the traceback it looks like Glance is having some issues. Tried restarting both the glance-api and glance-registry service but didn't help. Using Horizon or novaclient it initially reports as 'building', and looking at nova --debug boot [..] looks ok (checked API calls using curl).

Not sure what could be wrong, or where to look next.

Logs filtered for readability.

controller1:/var/log/nova/nova-conductor.log:

Error from last host: compute1 (compute1): [...] u'RescheduledException: Build of instance <hash> was rescheduled: HTTPInternalServerError (HTTP 500)\n'
Failed to compute_task_build_instances: No valid host was found. Exceeding max scheduling attempts 3 for instance <hash>. [...]
Setting instance to ERROR state.

compute1:/var/log/nova-compute.log:

INFO nova.compute.manager [...] Starting instance...
WARNING nova.compute.resource_tracker [...] Host field should not be set on the instance until resources have been claimed.
WARNING nova.compute.resource_tracker [...] Host field should not be set on the instance until resources have been claimed.
INFO nova.compute.claims [...] Attempting claim: memory 4096 MB, disk 10 GB
INFO nova.compute.claims [...] Total memory: ...
INFO nova.compute.claims [...] memory limit: ...
INFO nova.compute.claims [...] Total disk: ...
INFO nova.compute.claims [...] disk limit: ...
INFO nova.compute.claims [...] Claim succesful
INFO nova.virt.libvirt.driver [...] Creating image
ERROR nova.compute.manager [...] Instance failed to spawn
TRACE nova.compute.manager [...] Traceback (most recent call):

Traceback:

  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2471, in _build_resources
    yield resources
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2343, in _build_and_run_instance
    block_device_info=block_device_info)
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2405, in spawn
    admin_pass=admin_password)
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2803, in _create_image
    instance, size, fallback_from_host)
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5930, in _try_fetch_image_cache
    size=size)
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/imagebackend.py", line 231, in cache
    *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/imagebackend.py", line 480, in create_image
    prepare_template(target=base, max_size=size, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 445, in inner
    return f(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/imagebackend.py", line 221, in fetch_func_sync
    fetch_func(target=target, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/utils.py", line 507, in fetch_image
    max_size=max_size
  File "/usr/lib/python2.7/dist-packages/nova/virt/images.py", line 87, in fetch_to_raw
    max_size=max_size
  File "/usr/lib/python2.7/dist-packages/nova/virt/images.py", line 77, in fetch
    IMAGE_API.download(context, image_href, dest_path=path)
  File "/usr/lib/python2.7/dist-packages/nova/virt/api.py", line 182, in download
    dst_path=dest_path
  File "/usr/lib/python2.7/dist-packages/nova/image/glance.py", line 352, in download
    _reraise_translated_image_exception(image_id)
  File "/usr/lib/python2.7/dist-packages/nova/image/glance.py", line 350, in download
    image_chunks = self._client.call(context, 1, 'data', image_id)
  File "/usr/lib/python2.7/dist-packages/nova/image/glance.py", line 219, in call
    return getattr(client.images, method)(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/glanceclient/v1/images.py", line 143, in data
    % urlparse.quote(str(image_id)))
  File "/usr/lib/python2.7/dist-packages/glanceclient/common/http.py", line 262, in get
    return self._request('GET', url, **kwargs)
  File "/usr/lib/python2.7/dist-packages/glanceclient/common/http.py", line 230, in _request
    raise exc.from_response(resp, resp.text)
HTTPInternalServerError: HTTPInternalServerError (HTTP 500)

Continued log:

INFO nova.compute.manager [...] Terminating instance
INFO nova.virt.libvirt.driver [...] During wait destroy, instance disappeared.
INFO nova.virt.libvirt.driver [...] Deleting instance files /var/lib/nova/instances/<hash>_del
INFO nova.virt.libvirt.driver [...] Deletion of /var/lib/nova/instances/<hash>_del complete

controller1:/var/log/nova/nova-scheduler.log:

At the same time, also seeing this from time to time.

WARNING nova.scheduler.host_manager [...] Host compute6 has more disk space than database expected (446gb > 438gb)
WARNING nova.scheduler.host_manager [...] Host compute1 has more disk space than database expected (237gb > 218gb)
WARNING nova.scheduler.host_manager [...] Host compute4 has more disk space than database expected (276gb > 123gb)
...