Ask Your Question
0

VNF deployment (with 12 vms) is failing with 64 GB ram flavor but passes with 48GB ram.

asked 2017-11-06 02:27:08 -0500

udayutkarsh gravatar image

updated 2017-11-06 04:38:57 -0500

As part of application on boarding we needed to onboard 12 VMs with vcpu=32, disk=300gb and ram=64gb. But it failed everytime. However it passes when we reduce ram size to 48GB:

                I have modified page_size in flavor as follows::

                       [stack@undercloud ~(pcrf)]$nova flavor-show DSC17R4_PCRF1
                +----------------------------+--------------------------------------+
                | Property                   | Value                                |
                +----------------------------+--------------------------------------+
                | OS-FLV-DISABLED:disabled   | False                                |
                | OS-FLV-EXT-DATA:ephemeral  | 0                                    |
                | disk                       | 300                                  |
                | extra_specs                | {"hw:mem_page_size": "any"}          |
                | id                         | 8d81f732-4768-4495-833d-608ffbb05d9f |
                | name                       | DSC17R4_PCRF1                        |
                | os-flavor-access:is_public | True                                 |
                | ram                        | 65536                                |
                | rxtx_factor                | 1.0                                  |
                | swap                       |                                      |
                | vcpus                      | 32                                   |
                +----------------------------+--------------------------------------+
                [stack@undercloud ~(pcrf)]$

            nova-conductor.log has following info::

            2017-11-06 13:42:19.208 6062 WARNING nova.scheduler.utils [req-9fefb7bd-70ed-43ce-a391-26ec5289b837 b28fc20a882d4d64acdc79b263860852 adc3298a1a154893bfad05d73014a8aa - - -] Failed to compute_task_build_instances: Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instance 59c90da5-caeb-4d0b-84be-4f7718e0be13. Last exception: [u'Traceback (most recent call last):\n', u'  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1905, in _do_build_and_run_instance\n    filter_properties)\n', u'  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2044, in _build_and_run_instance\n    instance_uuid=instance.uuid, reason=e.format_message())\n', u'RescheduledException: Build of instance 59c90da5-caeb-4d0b-84be-4f7718e0be13 was re-scheduled: Insufficient compute resources: Requested instance NUMA topology cannot fit the given host NUMA topology.\n']
            2017-11-06 13:42:19.209 6062 WARNING nova.scheduler.utils [req-9fefb7bd-70ed-43ce-a391-26ec5289b837 b28fc20a882d4d64acdc79b263860852 adc3298a1a154893bfad05d73014a8aa - - -] [instance: 59c90da5-caeb-4d0b-84be-4f7718e0be13] Setting instance to ERROR state.
            "nova-conductor.log" 10724L, 2214480C        


        I have chosen an empty compute node and tried launching a cirros test instance with same flavor as follows:

        nova --debug boot  cirros --image cirros_image --flavor DSC17R4_PCRF1 --nic net-id=f64f4495-1953-4ff8-8420-7a55c42a2d55 --availability_zone=zone2:overcloud-compute-25.localdomain

        But this also failed with same reason as follows::

        """                                                                                                                                                                                |
        | fault                                | {"message": "Build of instance 12032a8b-436e-49c7-8bcf-8ad8077bccc3 was re-scheduled: Insufficient compute resources: Requested instance NUMA topology cannot fit the given host NUMA topology.", "code": 500, "details": "  File \"/usr/lib/python2.7/site-packages/nova/compute/manager.py\", line 1905, in _do_build_and_run_instance |
        |                                      |     filter_properties)      """"

        My hypervisor stats::

        +-----+----------------------------------+-------+---------+
        [cbis-admin@overcloud-controller-0 ~(pcrf)]$nova hypervisor-show overcloud-compute-25.localdomain
        +---------------------------+------------------------------------------+
        | Property                  | Value                                    |
        +---------------------------+------------------------------------------+
        | cpu_info_arch             | x86_64                                   |
        | cpu_info_features         | ["pge", "avx", "xsaveopt", "clflush",    |
        |                           | "sep", "syscall", "tsc_adjust", "vme",   |
        |                           | "dtes64", "invpcid", "msr", "sse",       |
        |                           | "xsave", "vmx", "erms", "xtpr", "cmov",  |
        |                           | "smep", "nx", "est", "pat", "monitor",   |
        |                           | "smx", "pbe", "lm", "tsc", "fpu",        |
        |                           | "fxsr", "tm", "sse4.1", "pae", "sse4.2", |
        |                           | "pclmuldq", "pcid", "fma", "tsc-         |
        |                           | deadline", "mmx", "osxsave", "cx8",      |
        |                           | "mce", "de", "tm2", "ht", "dca", "pni",  |
        |                           | "abm", "popcnt", "mca", "pdpe1gb",       |
        |                           | "apic", "fsgsbase", "f16c", "pse", "ds", |
        |                           | "invtsc", "lahf_lm", "aes", "avx2",      |
        |                           | "sse2", "ss", "ds_cpl", "arat", "bmi1",  |
        |                           | "bmi2", "acpi", "ssse3", "rdtscp",       |
        |                           | "cx16", "pse36", "mtrr", "movbe",        |
        |                           | "pdcm", "cmt", "rdrand", "x2apic"]       |
        | cpu_info_model            | Haswell-noTSX                            |
        | cpu_info_topology_cells   | 2                                        |
        | cpu_info_topology_cores   | 12                                       |
        | cpu_info_topology_sockets | 1                                        |
        | cpu_info_topology_threads | 2                                        |
        | cpu_info_vendor           | Intel                                    |
        | current_workload          | 0                                        |
        | disk_available_least      | 121804                                   |
        | free_disk_gb              | 122640                                   |
        | free_ram_mb               | 194434                                   |
        | host_ip                   | 172.31.255.67                            |
        | hypervisor_hostname       | overcloud-compute-25.localdomain         |
        | hypervisor_type           | QEMU                                     |
        | hypervisor_version        | 2006000                                  |
        | id                        | 149                                      |
        | local_gb                  | 122640                                   |
        | local_gb_used             | 0                                        |
        | memory_mb                 | 196482                                   |
        | memory_mb_used            | 2048                                     |
        | running_vms               | 0                                        |
        | service_disabled_reason   | None                                     |
        | service_host              | overcloud-compute-25.localdomain         |
        | service_id                | 197                                      |
        | state                     | up                                       |
        | status                    | enabled                                  |
        | vcpus                     | 42                                       |
        | vcpus_used                | 0                                        |
        +---------------------------+------------------------------------------+
        [cbis-admin@overcloud-controller-0 ~(pcrf)]$

    [cbis-admin@overcloud-compute-25 ~]$ cat /proc/cpuinfo | grep processor
    processor       : 0
    processor       : 1
    processor       : 2
    processor       : 3
    processor       : 4
    processor       : 5
    processor       : 6
    processor       : 7
    processor       : 8
    processor       : 9
    processor       : 10
    processor       : 11
    processor       : 12
    processor       : 13
    processor       : 14
    processor       : 15
    processor       : 16
    processor       : 17
    processor       : 18
    processor       : 19
    processor       : 20
    processor       : 21 ...
(more)
edit retag flag offensive close merge delete

2 answers

Sort by ยป oldest newest most voted
1

answered 2017-11-07 02:42:18 -0500

Mohit gravatar image

Try the same with 60 GB ram instead of 64 GB, This may work.

edit flag offensive delete link more

Comments

No luck getting different error this time::

 fault                                | {"message": "Build of instance f032bf21-c3d4-4e26-8694-af720672fb51 was re-scheduled: internal error: qemu unexpectedly closed the monitor: warning: host doesn't support requested feature: CPUID.01H:EDX.ds
udayutkarsh gravatar imageudayutkarsh ( 2017-11-07 05:45:49 -0500 )edit
Mohit gravatar imageMohit ( 2017-11-08 02:27:41 -0500 )edit
0

answered 2017-11-06 12:11:40 -0500

zaneb gravatar image

Umm, it seems pretty clear that you don't have enough RAM?

Insufficient compute resources: Requested instance NUMA topology cannot fit the given host NUMA topology.
edit flag offensive delete link more

Comments

I have 192gb ram available on compute node. Moreover its passes with a flavor with 20vcpu and 64gb ram and 32vcpu and 48gb ram but fails with 32vcpu and 64gb ram combination

udayutkarsh gravatar imageudayutkarsh ( 2017-11-07 01:21:53 -0500 )edit

With NUMA architectures RAM is in banks so that not all of the processors can access all of it. I'm guessing the VM is requesting uniform memory access, and the host cannot supply it with that much RAM across that many vCPUs.

zaneb gravatar imagezaneb ( 2017-11-07 07:58:29 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2017-11-06 02:27:08 -0500

Seen: 383 times

Last updated: Nov 07 '17