libVirtError when trying to launch multiple VMs with vGPU

asked 2019-04-04 19:03:55 -0500

andiariffin gravatar image

Hi Community,

By following this guide: https://docs.openstack.org/nova/queens/admin/virtual-gpu.html (https://docs.openstack.org/nova/queen...) I was able to launch VM from Dashboard with NVIDIA vGPU.

However, when I tried to launch VMs with Count number > 1, some of them were failed because of the mediated device already used by another instance (see the following log).

It seems that this issue happens due to the race condition of getting available mdev/vgpu device because this error never happened if I provision just one VM or set max_concurrent_builds to 1 in nova.conf.

/var/log/nova/nova-compute.log

2019-04-05 00:43:48.954 27825 ERROR nova.compute.manager [req-89e2afca-e612-4f36-a253-5c0114cfc70e 772295addc9949988b36057fee3b31c7 57e37d5723
cd4e69852537d2e91df159 - default default] [instance: dcede179-4002-45f1-9fe1-9d08b895ed64] Instance failed to spawn: libvirtError: Requested o
peration is not valid: mediated device /sys/bus/mdev/devices/93504417-62bc-4199-bdb3-31ca3f3f6bdf is in use by driver QEMU, domain instance-00
0001fc

This bug report: https://bugs.launchpad.net/nova/+bug/1780225 (https://bugs.launchpad.net/nova/+bug/...) might be relevant since it also discuss similar thing, but the provided temporary workaround by setting max_concurrent_builds to 1 is not an acceptable solution for me since I'm dealing with provisioning lot of VMs most of the time (>16 at once) and the time that is taken to complete the whole provisioning process would be tremendously long.

Any alternative solution to this issue so far?

I also have tried to set the max_attempts value in nova.conf into something big like 16 or even 1600, in hope that at least those failed ones will be retried to build but it seems to be not working (somehow it seems that they only tried to provision only one time)

my nova.conf: https://pastebin.com/2QNH5zf5

Any feedback/comments would be greatly appreciated!

Thank you ~

edit retag flag offensive close merge delete