Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

nova libvirtError: Unable to add bridge brqxxx-xx port tapxxx-xx: Device or resource busy

Hello,

When I launch several VM at one host, one of them is failed to spawn with error below,

[nova-compute.log]

2015-05-11 17:16:11.613 5273 ERROR nova.compute.manager [-] [instance: 6197cd88-6486-4776-b26f-280951b09716] Instance failed to spawn
2015-05-11 17:16:11.613 5273 TRACE nova.compute.manager [instance: 6197cd88-6486-4776-b26f-280951b09716] Traceback (most recent call last):
2015-05-11 17:16:11.613 5273 TRACE nova.compute.manager [instance: 6197cd88-6486-4776-b26f-280951b09716]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2243, in _build_resources
2015-05-11 17:16:11.613 5273 TRACE nova.compute.manager [instance: 6197cd88-6486-4776-b26f-280951b09716]     yield resources
2015-05-11 17:16:11.613 5273 TRACE nova.compute.manager [instance: 6197cd88-6486-4776-b26f-280951b09716]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2113, in _build_and_run_instance
2015-05-11 17:16:11.613 5273 TRACE nova.compute.manager [instance: 6197cd88-6486-4776-b26f-280951b09716]     block_device_info=block_device_info)
2015-05-11 17:16:11.613 5273 TRACE nova.compute.manager [instance: 6197cd88-6486-4776-b26f-280951b09716]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2643, in spawn
2015-05-11 17:16:11.613 5273 TRACE nova.compute.manager [instance: 6197cd88-6486-4776-b26f-280951b09716]     raise ex
2015-05-11 17:16:11.613 5273 TRACE nova.compute.manager [instance: 6197cd88-6486-4776-b26f-280951b09716] libvirtError: Unable to add bridge brq17c6134d-42 port tapaab35204-b5: Device or resource busy

At that time libvirt also prints same error :

[/var/log/libvirt/libvirtd.log]

2015-05-11 08:15:51.969+0000: 4145: warning : virGetGroupIDByName:938 : Group record for user '107' was not found: No such file or directory
2015-05-11 08:16:07.983+0000: 4144: error : virNetDevBridgeAddPort:374 : Unable to add bridge brq17c6134d-42 port tapaab35204-b5: Device or resource busy
2015-05-11 08:16:14.298+0000: 4144: warning : qemuDomainObjTaint:1558 : Domain id=16 name='VMME12_VIPA0' uuid=e1d4e7bd-6de3-4c3c-9fe3-db605b0182a8 is tainted: host-cpu
2015-05-11 08:16:14.298+0000: 4144: warning : virGetUserIDByName:858 : User record for user '107' was not found: No such file or directory

Here's my test environment :

  • Juno-2014.2.1-1 (RDO)
  • RHEL7 (kernel 3.10.0-229.el7.x86_64)
  • libvirt-1.1.1-29.el7_0.4.x86_64
  • 1 controller + 2 compute hosts (one of them is for network node)

I've met this issue recently. Quite annoying me though it is hard to reproduce but when it happens once it happens very often since then. I guess this error only happens when LB is used for Neutron agent. OVS seems not in this case.

I've found similar report in bugs.launchpad.net but it seems no one answered that. (https://bugs.launchpad.net/nova/+bug/1312016) So, I started to look into this errors and found some suspicious code in Neutron agent.

[neutron/plugins/linuxbridge/agent/linuxbridge_neutron_agent.py:372]

    def add_tap_interface(self, network_id, network_type, physical_network,
                          segmentation_id, tap_device_name):
        """Add tap interface.

        If a VIF has been plugged into a network, this function will
        add the corresponding tap device to the relevant bridge.
        """
----------((fold))------------------
        # Check if device needs to be added to bridge
        tap_device_in_bridge = self.get_bridge_for_tap_device(tap_device_name)
        if not tap_device_in_bridge:
            data = {'tap_device_name': tap_device_name,
                    'bridge_name': bridge_name}
            LOG.debug("Adding device %(tap_device_name)s to bridge "
                      "%(bridge_name)s", data)
            if utils.execute(['brctl', 'addif', bridge_name, tap_device_name],
                             run_as_root=True):
                return False
        else:
            data = {'tap_device_name': tap_device_name,
                    'bridge_name': bridge_name}
            LOG.debug("%(tap_device_name)s already exists on bridge "
                      "%(bridge_name)s", data)
        return True

When I removed above code (if not... else;) the error was disappeared. I guess this portion of code can cause race condition with libvirt. Wen a VM is created by nova, port creation and adding it to bridge is performed by libvirt. Just before libvirt try to add port to bridge, it will fail if Neutron (above code) succeed to add the interface to the bridge.

So now I'm wondering why above codes are required for what purpose. Shall I just simply remove that?

Please someone who expert in Neutron shed some light on me.