计算节点内存不足导致openstack故障

提问于 2017-07-06 04:31:29 -0500

叮当 图像

计算节点swap内存使用了100%,然后操作系统运行了oom-killer机制,把云主机的进程杀死了,然后再次启动云主机时,发现ceph存储已找不到实例的相关文件。 message日志: Jul 1 22:06:04 node7 kernel: [52503] 0 52503 26974 24 11 0 0 sleep Jul 1 22:06:04 node7 kernel: Out of memory: Kill process 59364 (qemu-kvm) score 60 or sacrifice child Jul 1 22:06:04 node7 kernel: Killed process 59364 (qemu-kvm) total-vm:18006708kB, anon-rss:16825612kB, file-rss:0kB Jul 1 22:06:04 node7 journal: internal error: 监控程序的文件结尾 Jul 1 22:06:06 node7 kernel: qbrf555ac02-51: port 2(tapf555ac02-51) entered disabled state Jul 1 22:06:06 node7 kernel: device tapf555ac02-51 left promiscuous mode Jul 1 22:06:06 node7 kernel: qbrf555ac02-51: port 2(tapf555ac02-51) entered disabled state Jul 1 22:06:07 node7 NetworkManager[1374]: <info> (tapf555ac02-51): device state change: activated -> unmanaged (reason 'removed') [100 10 36] Jul 1 22:06:07 node7 NetworkManager[1374]: <warn> (qbrf555ac02-51): failed to detach bridge port tapf555ac02-51 Jul 1 22:06:07 node7 systemd-machined: Machine qemu-198-instance-000002a0 terminated. Jul 1 22:06:07 node7 NetworkManager[1374]: <warn> (tapf555ac02-51): failed to disable userspace IPv6LL address handling Jul 1 22:06:07 node7 dbus-daemon: dbus[1286]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' Jul 1 22:06:07 node7 dbus[1286]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' Jul 1 22:06:07 node7 systemd: Starting Network Manager Script Dispatcher Service... Jul 1 22:06:07 node7 kvm: 50 guests now active Jul 1 22:06:07 node7 journal: 读取数据时进入文件终点: 输入/输出错误 Jul 1 22:06:07 node7 dbus-daemon: dbus[1286]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher' Jul 1 22:06:07 node7 dbus[1286]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'

nova-compute日志: 2017-07-01 22:06:22.710 18891 INFO nova.compute.manager [-] [instance: 7e7e826b-c7cf-4361-8aaa-c88a8976ecc0] VM 已停止 (生命周期事件) 2017-07-01 22:06:22.950 18891 INFO nova.compute.manager [req-4da3e116-8d96-4d52-9468-0ee18e59ed89 - - - - -] [instance: 7e7e826b-c7cf-4361-8aaa-c88a8976ecc0] 在同步实例 电源状态期间,DB电源状态 (1) 与监测器上虚拟机电源状态 (4)不一致。更新DB中的电源状态与监测器匹配 2017-07-01 22:06:23.063 18891 WARNING nova.compute.manager [req-4da3e116-8d96-4d52-9468-0ee18e59ed89 - - - - -] [instance: 7e7e826b-c7cf-4361-8aaa-c88a8976ecc0] 实例被>它自己关闭。调用stop API。当前虚拟机虚拟机状态: active,当前任务状态:None,原始DB 电源状态: 1,当前VM 电源状态:4 2017-07-01 22:06:23.280 18891 INFO nova.compute.manager [req-4da3e116-8d96-4d52-9468-0ee18e59ed89 - - - - -] [instance: 7e7e826b-c7cf-4361-8aaa-c88a8976ecc0] 对实例发出 停止指令时,该实例在虚拟机管理程序中已被关闭电源。 2017-07-01 22:06:23.387 18891 INFO nova.virt.libvirt.driver [req-4da3e116-8d96-4d52-9468-0ee18e59ed89 - - - - -] [instance: 7e7e826b-c7cf-4361-8aaa-c88a8976ecc0] 实例已 经关闭。 2017-07-01 22:06:23.401 18891 INFO nova.virt.libvirt.driver [-] [instance: 7e7e826b-c7cf-4361-8aaa-c88a8976ecc0] 实例销毁成功。

当再次启动云主机时,nova-compute 有报错日志: 2017-07-03 09:21:44.348 18891 ERROR nova.virt.libvirt.driver [req-b9cd04f0-59ee-46ea-bc3f-8d5adcc57dae 3fc63b48ddd548a39eccc02594a928e8 9b21771470814e599c107f326f102374 - - -] [instance: 7e7e826b-c7cf-4361-8aaa-c88a8976ecc0] Failed to start libvirt guest 2017-07-03 09:21:44.545 18891 INFO os_vif [req-b9cd04f0-59ee-46ea-bc3f-8d5adcc57dae 3fc63b48ddd548a39eccc02594a928e8 9b21771470814e599c107f326f102374 - - -] Successfully unplugged vif VIFBridge(active=True,address=fa:16:3e:d6:4f:b1,bridge_name='qbrf555ac02-51',has_traffic_filtering=True,id=f555ac02-517c-460f-9e8c-bd0e0e433ec4,network=Network(6fffe4d0-21e1-496e-a462-800f8230fbd7),plugin='ovs',port_profile=VIFPortProfileBase,preserve_on_delete=False,vif_name='tapf555ac02-51') 2017-07-03 09:21:50.077 18891 INFO nova.compute.resource_tracker [req-b7069818-7930-4cd8-bb89-834332ec5990 - - - - -] 正审计节点node7本地可用的计算资源 2017-07-03 09:21:51.700 18891 INFO nova.compute.resource_tracker [req-b7069818-7930-4cd8-bb89-834332ec5990 - - - - -] 总共可用vcpus:25,总计分配的vcpus:101 2017-07-03 09:21:51.702 18891 INFO nova.compute.resource_tracker [req-b7069818-7930-4cd8-bb89-834332ec5990 - - - - -] 最终资源视图:name=node7 phys_ram=262109MB used_ram ... (more)

edit retag flag offensive close merge delete