Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Pike - SR-IOV instance - not enough hosts available

Hi

I have installed OpenStack Pike on CentOS 7 (Linux networkingnode 3.10.0-862.6.3.el7.x86_64), I can create instances, network, router, volume etc..... absolutely no issue here, everything works fine

EXCEPT if I try to create an instance with a SR-IOV port, in that case only, I'll get the error: "There are not enough hosts available".

I dedicated my Mellanox Technologies MT27520 Family [ConnectX-3 Pro] for the SR-IOV

This is how I configure the controller and compute node:

1- Enable SR-IOV in BIOS

2- Modify the Kernel with the options intel_iommu=on iommu=pt:

/etc/sysconfig/grub: 
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="rd.lvm.lv=centos_computenode/root rd.lvm.lv=centos_computenode/swap rhgb quiet intel_iommu=on iommu=pt"
GRUB_DISABLE_RECOVERY="true"

and then

[root@computenode ~]# dracut --regenerate-all --force

and reboot the server

3- NIC driver installation:

[root@computenode ~]# lspci | grep Mellanox
04:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

[root@computenode ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.4 (Maipo)

download MLNX_OFED_LINUX-4.4-1.0.0.0-rhel7.5-x86_64.tar

tar -xvf MLNX_OFED_LINUX-4.4-1.0.0.0-rhel7.5-x86_64.tar
./mlnxofedinstall 

[root@computenode]#  modprobe -rv  ib_isert rpcrdma ib_srpt
rmmod ib_isert
rmmod iscsi_target_mod
rmmod rpcrdma
rmmod ib_srpt
[root@computenode]#  /etc/init.d/openibd restart
Unloading HCA driver:                                      [  OK  ]
Loading HCA driver and Access Layer:                       [  OK  ]
[root@computenode MLNX_OFED_LINUX-4.4-1.0.0.0-rhel7.5-x86_64]#

reboot the server

[root@computenode ~]# mst start
Starting MST (Mellanox Software Tools) driver set
Loading MST PCI module - Success
Loading MST PCI configuration module - Success
Create devices

[root@computenode ~]# mst status
MST modules:
------------
    MST PCI module loaded
    MST PCI configuration module loaded

MST devices:
------------
/dev/mst/mt4103_pciconf0         - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:04:00.0 addr.reg=88 data.reg=92
                                   Chip revision is: 00
/dev/mst/mt4103_pci_cr0          - PCI direct access.
                                   domain:bus:dev.fn=0000:04:00.0 bar=0x96400000 size=0x100000
                                   Chip revision is: 00


[root@computenode ~]# mlxconfig -d /dev/mst/mt4103_pciconf0 q

Device #1:
----------

Device type:    ConnectX3Pro
Device:         /dev/mst/mt4103_pciconf0

Configurations:                              Next Boot
         SRIOV_EN                            True(1)
         NUM_OF_VFS                          8
         LOG_BAR_SIZE                        3
         BOOT_OPTION_ROM_EN_P1               False(0)
         BOOT_VLAN_EN_P1                     False(0)
         BOOT_RETRY_CNT_P1                   0
         LEGACY_BOOT_PROTOCOL_P1             None(0)
         BOOT_VLAN_P1                        1
         BOOT_OPTION_ROM_EN_P2               False(0)
         BOOT_VLAN_EN_P2                     False(0)
         BOOT_RETRY_CNT_P2                   0
         LEGACY_BOOT_PROTOCOL_P2             None(0)
         BOOT_VLAN_P2                        1
         IP_VER_P1                           IPv4(0)
         IP_VER_P2                           IPv4(0)
         CQ_TIMESTAMP                        True(1)

[root@computenode ~]# ibstat
CA 'mlx4_0'
        CA type: MT4103
        Number of ports: 2
        Firmware version: 2.42.5000
        Hardware version: 0
        Node GUID: 0xec0d9a0300e78930
        System image GUID: 0xec0d9a0300e78930
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0xee0d9afffee78930
                Link layer: Ethernet
        Port 2:
                State: Down
                Physical state: Disabled
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0xee0d9afffee78931
                Link layer: Ethernet

Create (or edit) /etc/modprobe.d/mlx4_core.conf

options mlx4_core num_vfs=8 port_type_array=2,2 probe_vf=0

Restart the driver

/etc/init.d/openibd restart

Check that the VFs can be seen via lspci

[root@computenode modprobe.d]# lspci | grep Mellanox
04:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
04:00.1 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
04:00.2 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
04:00.3 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
04:00.4 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
04:00.5 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
04:00.6 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
04:00.7 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
04:01.0 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

4- Configure OpenSTack according to https://docs.openstack.org/neutron/pike/admin/config-sriov.html

echo '8' > /sys/class/net/p3p1/device/sriov_numvfs

Modifying the /sbin/ifup file:

#!/bin/sh
if [[ "$1" == "p3p1" ]]
then
    echo '8' > /sys/class/net/p3p1/device/sriov_numvfs
fi

Check sriov_totalvfs

cat /sys/class/net/eth3/device/sriov_totalvfs
8

Verify that the VFs have been created and are in up state

[root@computenode ~]# lspci | grep Ethernet
01:00.0 Ethernet controller: Broadcom Limited NetXtreme BCM5720 Gigabit Ethernet PCIe
01:00.1 Ethernet controller: Broadcom Limited NetXtreme BCM5720 Gigabit Ethernet PCIe
02:00.0 Ethernet controller: Broadcom Limited NetXtreme BCM5720 Gigabit Ethernet PCIe
02:00.1 Ethernet controller: Broadcom Limited NetXtreme BCM5720 Gigabit Ethernet PCIe
04:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
04:00.1 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
04:00.2 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
04:00.3 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
04:00.4 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
04:00.5 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
04:00.6 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
04:00.7 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
04:01.0 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

Persist created VFs on reboot:

echo "echo '8' > /sys/class/net/p3p1/device/sriov_numvfs" >> /etc/rc.local

Whitelist PCI devices nova-compute (Compute), /etc/nova/nova.conf file

[default]
pci_passthrough_whitelist = { "devname": "p3p1", "physical_network": "physnet"}

Restart the nova-compute service

Configure neutron-server (Controller), /etc/neutron/plugins/ml2/ml2_conf.ini:

mechanism_drivers = linuxbridge,l2population,openvswitch,sriovnicswitch

Restart the neutron-server service

Configure nova-scheduler (Controller), /etc/nova/nova.conf

[DEFAULT]
scheduler_default_filters = RetryFilter, AvailabilityZoneFilter, RamFilter, ComputeFilter, ComputeCapabilitiesFilter, ImagePropertiesFilter, ServerGroupAntiAffinityFilter, ServerGroupAffinityFilter, PciPassthroughFilter
scheduler_available_filters = nova.scheduler.filters.all_filters

Restart the nova-scheduler service

Enable neutron sriov-agent (Compute)

yum install openstack-neutron-sriov-nic-agent -y

Edit the /etc/neutron/plugins/ml2/sriov_agent.ini

[securitygroup]
firewall_driver = neutron.agent.firewall.NoopFirewallDriver

[sriov_nic]
physical_device_mappings = physnet:p3p1

Enable the neutron sriov-agent service

systemctl enable neutron-sriov-nic-agent.service
systemctl start neutron-sriov-nic-agent.service

FDB L2 agent extension, 1. Edit the /etc/neutron/plugins/ml2/linuxbridge_agent.ini:

[agent]
extensions = fdb
[FDB]
shared_physical_device_mappings = physnet:p3p1

Update Controller, /etc/neutron/plugins/ml2/sriov_agent.ini

[sriov_nic]
physical_device_mappings = physnet:p3p1

Update Controller, /etc/neutron/plugins/ml2/ml2_conf.ini

[ml2_type_flat]
flat_networks = provider,physnet

reboot the controller

Check the controller:

[root@networkingnode ~]# openstack network agent list
+--------------------------------------+--------------------+----------------+-------------------+-------+-------+---------------------------+
| ID                                   | Agent Type         | Host           | Availability Zone | Alive | State | Binary                    |
+--------------------------------------+--------------------+----------------+-------------------+-------+-------+---------------------------+
| 67cf9ef3-6613-4d49-b781-550d7c1eff31 | Linux bridge agent | computenode    | None              | :-)   | UP    | neutron-linuxbridge-agent |
| a0c0fbbe-ed50-4233-95cc-a8acfbe2ad86 | L3 agent           | networkingnode | nova              | :-)   | UP    | neutron-l3-agent          |
| c9a98eff-efeb-4a97-9a2e-ecb48f0760fa | Linux bridge agent | networkingnode | None              | :-)   | UP    | neutron-linuxbridge-agent |
| e7d005a2-1987-4d6d-a45c-fc6b21292e2c | NIC Switch agent   | computenode    | None              | :-)   | UP    | neutron-sriov-nic-agent   |
| eba6a2c0-229c-4ab2-be5d-0975c2a45e3f | Metadata agent     | networkingnode | None              | :-)   | UP    | neutron-metadata-agent    |
| f6443b89-96a1-4c43-a275-86a6496e9445 | DHCP agent         | networkingnode | nova              | :-)   | UP    | neutron-dhcp-agent        |
+--------------------------------------+--------------------+----------------+-------------------+-------+-------+---------------------------+

AND then finally to create an instance with my SR-IOV network, I get the error: not enough hosts available

So I found another article, so I tried on top of that:

On the controller:

/etc/neutron/plugins/ml2/ml2_conf.ini

type_drivers = flat,vlan,vxlan
tenant_network_types = vxlan
mechanism_drivers = linuxbridge,l2population,openvswitch,sriovnicswitch
extension_drivers = port_security

[securitygroup]
enable_ipset = true

/etc/nova/nova.conf

[Default]

scheduler_default_filters = RetryFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,PciPassthroughFilter
scheduler_available_filters = nova.scheduler.filters.all_filters
scheduler_available_filters = nova.scheduler.filters.pci_passthrough_filter.PciPassthroughFilter

/usr/lib/systemd/system/neutron-server.service

ExecStart=/usr/bin/neutron-server --config-file /usr/share/neutron/neutron-dist.conf --config-dir /usr/share/neutron/server --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugin.ini --config-file /etc/neutron/plugins/ml2/sriov_agent.ini --config-dir /etc/neutron/conf.d/common --config-dir /etc/neutron/conf.d/neutron-server --log-file /var/log/neutron/server.log

On the compute node:

yum install openstack-neutron-ml2 -y

etc/neutron/plugins/ml2/ml2_conf.ini

[securitygroup]
enable_security_group = True
firewall_driver = neutron.agent.firewall.NoopFirewallDriver

/etc/neutron/plugins/ml2/sriov_agent.ini

[sriov_nic]
physical_device_mappings = physnet:p3p1

[securitygroup]
firewall_driver = neutron.agent.firewall.NoopFirewallDriver

/usr/lib/systemd/system/neutron-sriov-nic-agent.service

ExecStart=/usr/bin/neutron-sriov-nic-agent --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/sriov_agent.ini --config-dir /etc/neutron/conf.d/common --config-dir /etc/neutron/conf.d/neutron-sriov-nic-agent --log-file /var/log/neutron/sriov-nic-agent.log

AND then finally to create an instance with my SR-IOV network, I get the error: not enough hosts available

So, I'm not sure what else, I have been looking around and can't find anything else. What should I do?

Note, from the nova-scheduler.log I got the error (but I have no idea what to do with it):

2018-07-17 16:35:01.692 1121 INFO nova.scheduler.filters.retry_filter [req-3340d0f0-e90e-4c76-9932-d92c801edd08 409a679ba4c840eeb46b12768c6ef60a a72a5d6b06d14b63acec9774146b0f6e - default default] Host [u'computenode', u'computenode'] fails.  Previously tried hosts: [[u'computenode', u'computenode']]
2018-07-17 16:35:01.693 1121 INFO nova.filters [req-3340d0f0-e90e-4c76-9932-d92c801edd08 409a679ba4c840eeb46b12768c6ef60a a72a5d6b06d14b63acec9774146b0f6e - default default] Filter RetryFilter returned 0 hosts
2018-07-17 16:35:01.693 1121 INFO nova.filters [req-3340d0f0-e90e-4c76-9932-d92c801edd08 409a679ba4c840eeb46b12768c6ef60a a72a5d6b06d14b63acec9774146b0f6e - default default] Filtering removed all hosts for the request with instance ID 'e17bb991-cba8-4cf8-91df-99f6501bb7c8'. Filter results: ['RetryFilter: (start: 1, end: 0)']

Any help will be greatly appreciated (if you need further log and configuration, let me know, I'll be happy to provide)

Thanks