Ask Your Question
0

vm filesystem shutting down

asked 2019-04-10 06:41:53 -0500

masber gravatar image

updated 2019-04-10 06:55:57 -0500

Dear Openstack community,

I have an openstack environment using qemu + kvm as a hypervisor. My physical hosts are running Centos 7.5 with filesystem BTRFS and XFS on vms My problem is that the vm becomes unresponsive when I run heavy jobs like software compilation. Console errors shown below:

kudu-test-1 login: [   43.865030] random: crng init done
[10769.996980] blk_update_request: I/O error, dev vda, sector 63329504
[10770.003649] blk_update_request: I/O error, dev vda, sector 63329680
[10770.005974] blk_update_request: I/O error, dev vda, sector 63329832
[10770.008282] Buffer I/O error on dev vda1, logical block 7915932, lost async page write
[10770.011173] Buffer I/O error on dev vda1, logical block 7915933, lost async page write
[10770.014903] Buffer I/O error on dev vda1, logical block 7915934, lost async page write
[10770.019257] Buffer I/O error on dev vda1, logical block 7915935, lost async page write
[10770.024083] Buffer I/O error on dev vda1, logical block 7915936, lost async page write
[10770.028534] Buffer I/O error on dev vda1, logical block 7915937, lost async page write
[10770.032472] Buffer I/O error on dev vda1, logical block 7915938, lost async page write
[10770.036191] Buffer I/O error on dev vda1, logical block 7915939, lost async page write
[10770.040515] Buffer I/O error on dev vda1, logical block 7915940, lost async page write
[10770.043443] Buffer I/O error on dev vda1, logical block 7915941, lost async page write
[11963.873944] blk_update_request: I/O error, dev vda, sector 106841432
[11963.922842] buffer_io_error: 9 callbacks suppressed
[11963.926768] Buffer I/O error on dev vda1, logical block 13354411, lost async page write
[11963.930445] Buffer I/O error on dev vda1, logical block 13354412, lost async page write
[11963.933580] Buffer I/O error on dev vda1, logical block 13354413, lost async page write
[11963.937348] Buffer I/O error on dev vda1, logical block 13354414, lost async page write
[11963.941300] Buffer I/O error on dev vda1, logical block 13354415, lost async page write
[11963.944535] Buffer I/O error on dev vda1, logical block 13354416, lost async page write
[11963.947552] Buffer I/O error on dev vda1, logical block 13354417, lost async page write
[11963.953007] Buffer I/O error on dev vda1, logical block 13354418, lost async page write
[11963.956181] Buffer I/O error on dev vda1, logical block 13354419, lost async page write
[11963.959760] Buffer I/O error on dev vda1, logical block 13354420, lost async page write
[12644.472090] blk_update_request: I/O error, dev vda, sector 98732744
[12644.486750] blk_update_request: I/O error, dev vda, sector 98733768
[12644.488994] buffer_io_error: 2507 callbacks suppressed
[12644.490653] Buffer I/O error on dev vda1, logical block 12335719, lost async page write
[12644.493055] Buffer I/O error on dev vda1, logical block 12335720, lost async page write
[12644.495430] Buffer I/O error on dev vda1, logical block 12335721, lost async page write
[12644.498325] Buffer ...
(more)
edit retag flag offensive close merge delete

Comments

Are the VMs on local file storage on the hypervisors? Have you monitored disk utilization of your compute nodes? Maybe your disks are too slow, especially if you run heavy i/o, maybe even in parallel?

eblock gravatar imageeblock ( 2019-04-10 07:07:04 -0500 )edit

yes vms are on local storage of the physical host. Also I am using nvme drives for storage so the disks should be fast enough

masber gravatar imagemasber ( 2019-04-10 07:16:02 -0500 )edit

Anything in the libvirt log on the compute node? Could it be that the disks are qcow2 files (the default) and there is not enough space on the hosts? Any messages in the host’s system log and/or kernel message buffer?

Bernd Bausch gravatar imageBernd Bausch ( 2019-04-10 07:27:14 -0500 )edit

libvirt logs shows errors from a different day 2019-04-09 07:57:28.013+0000: 3977: error : qemuMonitorIO:718 : internal error: End of file from qemu monitor 2019-04-09 08:07:19.385+0000: 4227: error : qemuDomainAgentAvailable:9139 : argument unsupported: QEMU guest agent is not configur

masber gravatar imagemasber ( 2019-04-10 11:16:54 -0500 )edit

1 answer

Sort by » oldest newest most voted
0

answered 2019-04-11 19:19:48 -0500

masber gravatar image

updated 2019-04-11 19:20:14 -0500

ok, I don't have a good explanation for this but I got this issue fixed by replacing the filesystem on the physical/hypervisor nodes from btrfs to mdadm + XFS. Now the system is much more stable.

edit flag offensive delete link more

Comments

qcow2 on BTRFS seems to be a bad idea. An internet search yields pointers to performance problems, e.g. https://unix.stackexchange.com/questi.... It seems that there are workarounds.

I guess we all learned something :)

Bernd Bausch gravatar imageBernd Bausch ( 2019-04-11 19:47:32 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2019-04-10 06:41:53 -0500

Seen: 134 times

Last updated: Apr 11