Assign different GPUs from same system to VMs

asked 2016-09-27 08:44:33 -0500

kvasko gravatar image

updated 2016-09-27 12:23:07 -0500

I have a system with 8GPUs in a single box. We are trying to allow VMs to request access to GPU resources via this box.

I know that with PCI Passthrough you can only have a device assigned to a single VM (e.g. 1 device <-> 1 VM). However, this box has 8 GPUs (8 separate devices). So I want support (1GPU -> 1VM) * 8, or (2GPU -> 1VM) * 4, (4GPU -> 1VM) * 2, or (8GPU -> 1VM) * 1.

I have successfully been able to get the system to have 1 GPU <-> 1 VM, however when I go to create another VM with a GPU I get "No valid host was found. There are not enough hosts available." This is what I have done so far.

/etc/nova/nova.conf

Add:

Pic_passthrough_whitelist = [{"vendor_id": "10de", "product_id": "17c2"}]

sudo gedit /etc/modules and add:
 pci_stub
 vfio
 vfio_iommu_type1
 vfio_pci
 kvm
 kvm_intel

Sudo vi /etc/default/grub
 GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=on vfio_iommu_type1.allow_unsafe_interrupts=1"

//BLACKLIST

sudo gedit /etc/initramfs-tools/modules
 pci_stub ids=10de:17c2
 sudo update-initramfs -u

On Controller Node:

Edit nova.conf

Add specifically for GPU you want to use!

pci_alias={"vendor_id":"10de", "product_id":"17c2", "name":"titanx"}

Add

scheduler_driver=nova.scheduler.filter_scheduler.FilterScheduler
 scheduler_available_filters=nova.scheduler.filters.all_filters
 scheduler_available_filters=nova.scheduler.filters.pci_passthrough_filter.PciPassthroughFilter
 scheduler_default_filters=RamFilter,ComputeFilter,AvailabilityZoneFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,PciPassthroughFilter

 #: source openrc
 Nova flavor-key g1.xlarge set "pci_passthrough:alias"="titanx:1"

If I create 1 VM it works. When I go to create my second VM with the same flavor it errors out with this message.

Message: No valid host was found. There are not enough hosts available.

 Code: 500
 File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 392, in build_instances context, request_spec, filter_properties) File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 436, in _schedule_instances hosts = self.scheduler_client.select_destinations(context, spec_obj) File "/usr/lib/python2.7/dist-packages/nova/scheduler/utils.py", line 372, in wrapped return func(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/__init__.py", line 51, in select_destinations return self.queryclient.select_destinations(context, spec_obj) File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/__init__.py", line 37, in __run_method return getattr(self.instance, __name)(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/query.py", line 32, in select_destinations return self.scheduler_rpcapi.select_destinations(context, spec_obj) File "/usr/lib/python2.7/dist-packages/nova/scheduler/rpcapi.py", line 121, in select_destinations return cctxt.call(ctxt, 'select_destinations', **msg_args) File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call retry=self.retry) File "/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 91, in _send timeout=timeout, retry=retry) File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 512, in send retry=retry) File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 503, in _send raise result

Running SELECT * FROM pci_devices; on the nova database I get the following

http://imgur.com/a/voGki

As you can see it shows 7 are available.

edit retag flag offensive close merge delete