High load on instance preventing live migration (Juno)

asked 2016-05-13 12:47:18 -0600

RedCricket gravatar image

Hi,

I have run into situations where live migrations never seem to complete or error out.

Here is how I have been able to reproduce the problem.

Here is the instance I am migrating:

[root@osc1-mgmt-001 tmp]# nova show gb72-net-002-org-001
+--------------------------------------+---------------------------------------------------------------------+
| Property                             | Value                                                               |
+--------------------------------------+---------------------------------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                                              |
| OS-EXT-AZ:availability_zone          | nova                                                                |
| OS-EXT-SRV-ATTR:host                 | osc1-net-002.example.com                                          |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | osc1-net-002.example.com                                          |
| OS-EXT-SRV-ATTR:instance_name        | gb72-net-002-org-001                                                |
| OS-EXT-STS:power_state               | 1                                                                   |
| OS-EXT-STS:task_state                | migrating                                                           |
| OS-EXT-STS:vm_state                  | active                                                              |
| OS-SRV-USG:launched_at               | 2016-05-12T20:01:23.000000                                          |
| OS-SRV-USG:terminated_at             | -                                                                   |
| accessIPv4                           |                                                                     |
| accessIPv6                           |                                                                     |
| config_drive                         |                                                                     |
| created                              | 2016-05-12T20:00:58Z                                                |
| flavor                               | gb72_vm (668ca3b4-a7c0-4309-a11e-4fb5377e4180)                      |
| hostId                               | 44206a2390a038b0ede2a4375f1239b0cef917149bd5976fcada6781            |
| id                                   | 3b176ee2-fcf3-41a6-b658-361ffd19639e                                |
| image                                | CentOS-7-x86_64-GenericCloud (588e035d-2e1e-4720-94c4-8b000bf9d2ef) |
| key_name                             | nk                                                                  |
| metadata                             | {}                                                                  |
| name                                 | gb72-net-002-org-001                                                |
| os-extended-volumes:volumes_attached | [{"id": "16afe52c-31b0-4a3a-b718-aa1789df2852"}]                    |
| public-47 network                    | 10.29.105.13                                                        |
| security_groups                      | default                                                             |
| status                               | MIGRATING                                                           |
| tenant_id                            | 9d011b7c8d104af1b887e229cee436d2                                    |
| updated                              | 2016-05-13T17:07:48Z                                                |
| user_id                              | fa8b956c89304124967bb4bcea54124b                                    |
+--------------------------------------+---------------------------------------------------------------------+

The flavor gb72_vm is one I created and looks like this:

[root@osc1-mgmt-001 tmp]# nova flavor-show gb72_vm
+----------------------------+--------------------------------------+
| Property                   | Value                                |
+----------------------------+--------------------------------------+
| OS-FLV-DISABLED:disabled   | False                                |
| OS-FLV-EXT-DATA:ephemeral  | 0                                    |
| disk                       | 20                                   |
| extra_specs                | {}                                   |
| id                         | 668ca3b4-a7c0-4309-a11e-4fb5377e4180 |
| name                       | gb72_vm                              |
| os-flavor-access:is_public | True                                 |
| ram                        | 72000                                |
| rxtx_factor                | 1.0                                  |
| swap                       | 16000                                |
| vcpus                      | 8                                    |
+----------------------------+--------------------------------------+

After I launched the instance I installed stress and I am running stress on the instance like so:

[centos@gb72-net-002-org-001 stress-1.0.4]$ stress -c 6 -m 4 --vm-bytes 512M

I am also running top on the instance and this is what that looks like:

top - 17:17:02 up 21:15,  1 user,  load average: 10.11, 10.08, 10.06
Tasks: 149 total,  12 running, 137 sleeping,   0 stopped,   0 zombie
%Cpu(s): 62.0 us, 38.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 72323392 total, 70503632 free,  1344768 used,   474988 buff/cache
KiB Swap: 16383996 total, 16383996 free,        0 used. 70740048 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
10273 centos    20   0    7260     96      0 R  86.7  0.0   1008:21 stress
10276 centos    20   0    7260     96      0 R  84.7  0.0   1008:22 stress
10271 centos    20   0    7260     96      0 R  84.1  0.0   1008:00 stress
10275 centos    20   0    7260     96      0 R  82.1  0.0   1009:28 stress
10270 centos    20   0  531552 218716    176 R  80.7  0.3   1011:42 stress
10272 centos    20   0  531552 142940    176 R  80.4  0.2   1012:40 stress
10269 centos    20   0    7260     96      0 R  78.7  0.0   1008:38 stress
10274 centos    20   0  531552 333404    176 R  73.1  0.5   1012:32 stress
10267 centos    20   0    7260     96      0 R  70.4  0.0   1008:41 stress
10268 centos    20   0  531552  38452    176 R  65.8  0.1   1011:29 stress
    1 root      20   0  191352   6652   3908 S   0.0  0.0   0:06.00 systemd
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.02 kthreadd
    3 root      20   0       0      0      0 S   0.0  0.0   0:01.45 ksoftirqd/0
    5 root ...
(more)
edit retag flag offensive close merge delete