vm filesystem shutting down
Dear Openstack community,
I have an openstack environment using qemu + kvm as a hypervisor. My physical hosts are running Centos 7.5 with filesystem BTRFS and XFS on vms My problem is that the vm becomes unresponsive when I run heavy jobs like software compilation. Console errors shown below:
kudu-test-1 login: [ 43.865030] random: crng init done
[10769.996980] blk_update_request: I/O error, dev vda, sector 63329504
[10770.003649] blk_update_request: I/O error, dev vda, sector 63329680
[10770.005974] blk_update_request: I/O error, dev vda, sector 63329832
[10770.008282] Buffer I/O error on dev vda1, logical block 7915932, lost async page write
[10770.011173] Buffer I/O error on dev vda1, logical block 7915933, lost async page write
[10770.014903] Buffer I/O error on dev vda1, logical block 7915934, lost async page write
[10770.019257] Buffer I/O error on dev vda1, logical block 7915935, lost async page write
[10770.024083] Buffer I/O error on dev vda1, logical block 7915936, lost async page write
[10770.028534] Buffer I/O error on dev vda1, logical block 7915937, lost async page write
[10770.032472] Buffer I/O error on dev vda1, logical block 7915938, lost async page write
[10770.036191] Buffer I/O error on dev vda1, logical block 7915939, lost async page write
[10770.040515] Buffer I/O error on dev vda1, logical block 7915940, lost async page write
[10770.043443] Buffer I/O error on dev vda1, logical block 7915941, lost async page write
[11963.873944] blk_update_request: I/O error, dev vda, sector 106841432
[11963.922842] buffer_io_error: 9 callbacks suppressed
[11963.926768] Buffer I/O error on dev vda1, logical block 13354411, lost async page write
[11963.930445] Buffer I/O error on dev vda1, logical block 13354412, lost async page write
[11963.933580] Buffer I/O error on dev vda1, logical block 13354413, lost async page write
[11963.937348] Buffer I/O error on dev vda1, logical block 13354414, lost async page write
[11963.941300] Buffer I/O error on dev vda1, logical block 13354415, lost async page write
[11963.944535] Buffer I/O error on dev vda1, logical block 13354416, lost async page write
[11963.947552] Buffer I/O error on dev vda1, logical block 13354417, lost async page write
[11963.953007] Buffer I/O error on dev vda1, logical block 13354418, lost async page write
[11963.956181] Buffer I/O error on dev vda1, logical block 13354419, lost async page write
[11963.959760] Buffer I/O error on dev vda1, logical block 13354420, lost async page write
[12644.472090] blk_update_request: I/O error, dev vda, sector 98732744
[12644.486750] blk_update_request: I/O error, dev vda, sector 98733768
[12644.488994] buffer_io_error: 2507 callbacks suppressed
[12644.490653] Buffer I/O error on dev vda1, logical block 12335719, lost async page write
[12644.493055] Buffer I/O error on dev vda1, logical block 12335720, lost async page write
[12644.495430] Buffer I/O error on dev vda1, logical block 12335721, lost async page write
[12644.498325] Buffer ...
Are the VMs on local file storage on the hypervisors? Have you monitored disk utilization of your compute nodes? Maybe your disks are too slow, especially if you run heavy i/o, maybe even in parallel?
yes vms are on local storage of the physical host. Also I am using nvme drives for storage so the disks should be fast enough
Anything in the libvirt log on the compute node? Could it be that the disks are qcow2 files (the default) and there is not enough space on the hosts? Any messages in the host’s system log and/or kernel message buffer?
libvirt logs shows errors from a different day 2019-04-09 07:57:28.013+0000: 3977: error : qemuMonitorIO:718 : internal error: End of file from qemu monitor 2019-04-09 08:07:19.385+0000: 4227: error : qemuDomainAgentAvailable:9139 : argument unsupported: QEMU guest agent is not configur