Occassional permission denied creating /var/lib/nova/instances/.directio.test during live migration

asked 2016-05-25 14:05:38 -0600

mallornman gravatar image

updated 2016-05-25 16:18:10 -0600

I'm setting up a shared GlusterFS filesystem that nova (Kilo) will backend into. This will give us live migrations and some data redundancy. We're using the RDO release on CentOS 7.

The migration usually works, but every once in a while we'll get this error:

   ERROR oslo_messaging.rpc.dispatcher [req-b8b14414-c7a5-4e06-b3e7-7ff9cbb72016
76a2e5e3849646a0bf525d632ba15836 e010a6ef41fd4c08a2e8f3b5d63c6210 - - -] Exception during message handling: [Errno 13] Permission denied: '/var/lib/nova/instances/.directio.test'

We can do fifty live migrations back and forth without issue, and then this will happen and the instance will go into a weird state where the database thinks it's running on one hypervisor, but the hypervisor thinks it's running somewhere else. Of course, the instance goes down at that point.

I've made the directory world writeable (for testing) and the uids are the same across all servers. Any ideas?

Also, for what it's worth, is there any problem with me just removing the directio test from /usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py in our case? We support directio, so I'm fine just setting 'hasDirectIO = True' and bypassing the problematic code. Thoughts?


edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted

answered 2016-05-26 10:39:03 -0600

mallornman gravatar image

Looking further, we wonder if this is an issue with block sizes and buffer offsets.

From https://www.gluster.org/pipermail/gluster-users/2015-September/023615.html (https://www.gluster.org/pipermail/glu...):

Our gluster volume is a 7 x (2 + 1) (distributed disperse), but disperse volumes require "strict alignment of buffer offsets and block sizes. The disperse volume writes fragments multiples of 512 bytes in the bricks. If the underlying filesystem requires an alignment different than 512 bytes, it could cause the problem you are seeing. For example if the required alignment of the underlying filesystem is 4KB,you will need to write to the disperse volume in multiples of 12KB (you are using 3 data bricks)."

We use ZFS for our underlying filesystem, but there is no block size. We can set one, so we'll give that a shot and see what happens.

In the meantime, if anyone has an other ideas...

edit flag offensive delete link more

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower


Asked: 2016-05-25 14:05:38 -0600

Seen: 263 times

Last updated: May 26 '16