Who is responsible for 'deleting' volume attachments when attach_volume fails

asked 2015-10-27 13:10:12 -0500

dmunro gravatar image

updated 2015-10-28 13:20:51 -0500

Hi all,

I'm hoping someone might be able to help clarify a scenario I came across while debugging an issue with attaching a volume to an instance. Specifically, in a Nova initiated workflow to attach the volume, I'm not clear as to who should be deleting attachment records that are created by Cinder's attach_volume if the underlying driver raises and exception:

cinder/cinder/volume/manager.py:

attach_volume()
…
attachment = self.db.volume_attach(context.elevated(), values)
...
try:
    # NOTE(flaper87): Verify the driver is enabled
    # before going forward. The exception will be caught
    # and the volume status updated.
    utils.require_driver_initialized(self.driver)

    self.driver.attach_volume(context,
                              volume,
                              instance_uuid,
                              host_name_sanitized,
                              mountpoint)
except Exception:
    with excutils.save_and_reraise_exception():
        self.db.volume_attachment_update(
            context, attachment_id,
            {'attach_status': 'error_attaching'})

In above snippet, note that the volume_attachment record is created as part of the overall flow, but if an exception is raised by the driver only the 'attach_status' of the volume_attachment is updated. The volume_attachment record itself is not (marked as) deleted so this begs the question - who should be doing so?

This shows up as an issue from what I am seeing as the attach_volume call comes from Nova and the exception handling there seems to suffer from the fact that the failed volume_attachment record still shows up in 'by volume' queries. Consider the following snippet to clarify:

nova/nova/compute/manager.py:

def _attach_volume(self, context, instance, bdm):
    context = context.elevated()
    LOG.info(_LI('Attaching volume %(volume_id)s to %(mountpoint)s'),
              {'volume_id': bdm.volume_id,
              'mountpoint': bdm['mount_device']},
             context=context, instance=instance)
    try:
        bdm.attach(context, instance, self.volume_api, self.driver,
                   do_check_attach=False, do_driver_attach=True)
    except Exception:
        with excutils.save_and_reraise_exception():
            LOG.exception(_LE("Failed to attach %(volume_id)s "
                              "at %(mountpoint)s"),
                          {'volume_id': bdm.volume_id,
                           'mountpoint': bdm['mount_device']},
                          context=context, instance=instance)
            self.volume_api.unreserve_volume(context, bdm.volume_id)

In the above, ‘bdm.attach()’ ultimately results in an API call to cinder to trigger the attach_volume() execution that was noted, first above. If any exceptions are encountered in Cinder, the exception handling here results in another call to Cinder to ‘unreserve_volume'. It is here that there seems to be some confusion (perhaps just on my part) as to what volume_attachment records should still exist.

cinder/cinder/volume/api.py:

def unreserve_volume(self, context, volume):
    volume = self.db.volume_get(context, volume['id'])
    if volume['status'] == 'attaching':
        attaches = self.db.volume_attachment_get_used_by_volume_id(
            context, volume['id'])
        if attaches:
            self.update(context, volume, {"status": "in-use"})
        else:
            self.update(context, volume, {"status": "available"})

Note that this function begins with a query to identify all attachments for the given volume. This result set will include that attachment record that was just marked with an attach_status of 'error_attaching' and because the result set is none empty, the status of the volume will be marked as 'in-use'. Is this expected? Is the volume really 'in-use' when the attempt to attach it failed?

That leaves me with a couple of questions:

  • should the volume attachment record be marked as deleted when a failure is encountered? If so what component ...
(more)
edit retag flag offensive close merge delete