[libvirt] Use flock() instead of fcntl()

David Weber wb at munzinger.de
Mon Jul 29 13:48:03 UTC 2013


Am Montag, 29. Juli 2013, 12:52:00 schrieb Daniel P. Berrange:
> On Fri, Jul 26, 2013 at 11:35:32AM +0100, Daniel P. Berrange wrote:
> > On Fri, Jul 26, 2013 at 12:31:35PM +0200, David Weber wrote:
> > > Am Freitag, 26. Juli 2013, 10:14:59 schrieb Daniel P. Berrange:
> > > > On Fri, Jul 26, 2013 at 10:44:24AM +0200, David Weber wrote:
> > > > > > Looking again at flock() I see it cannot support locking of
> > > > > > ranges, only
> > > > > > the entire file. This makes it unsuitable as a replacement for
> > > > > > libvirt's
> > > > > > use of fcntl() I'm afraid. I can only sugggest you configure OCFS2
> > > > > > so
> > > > > > that it supports fcntl(), or setup virtlockd to use separate
> > > > > > indirect
> > > > > > leases on a diffrent shared filesystem, or perhaps try sanlock
> > > > > > instead
> > > > > > which doesn't require any special filesystem support.
> > > > > 
> > > > > It's true that flock() doesn't support locking of ranges but I can't
> > > > > see
> > > > > how this is necessary.
> > > > 
> > > > The code may not currently use ranges, but that doesn't mean it'll
> > > > stay
> > > > that way. By adding support for flock() we're preventing us from
> > > > making
> > > > use of this feature in the future, and I don't want to see that.
> > > 
> > > Just curious,  what would be a possible feature which would require
> > > range
> > > based locking?
> > > 
> > > I would really like to see flock() support in virtlockd because all
> > > other
> > > solutions have major drawbacks for me.
> > 
> > Currently we use locks to protect the content of disk images.
> > 
> > During startup/shutdown, however, libvirt also makes changes to the
> > metadata of images by setting SELinux labels, uid/gid ownership and
> > potentially ACLs. Currently we've delibrately crippled some of our
> > code during shutdown since it isn't safe in the face of multiple
> > libvirt's running. We need to introduce locking of file metadata
> > to protect this code. The metadata locks, however, must not conflict
> > with the content lock. Thus the reason why we only lock a single
> > byte (range 0-1) for content locks, is that we want to be able to
> > additional locks (range 1-2 or similar) for the metadata locks
> > on the same files.

Perhaps this was already your plan but the second lock mustn't come from the 
same process because it will reset the first lock if you use fcntl(). See for 
more details:
http://0pointer.de/blog/projects/locking.html
http://www.samba.org/samba/news/articles/low_point/tale_two_stds_os2.html

> 
> I've been wondering over the weekend if there is any viable alternative
> strategy we could take which could allow us to use flock(), without
> badly compromising our option for metadata locking. Given that metadata
> locks would be only held for very short periods of time, I think it
> could be reasonable to say we don't need to do locks on the individual
> disks files. It would be sufficient to do locks on the directory
> containing the disk file, if we only have flock() available. This would
> mean we wouldn't be blocked by the inability to lock byte ranges.

Sounds good!

> 
> 
> I'm also wondering if there is a way to detect when fcntl() is not
> available for OCFS2 ? With the current virtlockd code, what is the
> behaviour when it tries to lock a file with fcntl() ?  Does the fcntl
> lock attempt succeed, but only provide protection scoped to the single
> host, or do we get a hard errno from fcntl() on OCFS (eg EINVAL or
> something ?  If we can detect broken fcntl() on OCFS, then we should
> not need to have a global config parameter - we would be able to
> automatically use fcntl() by default ,and fallback to flock() on
> OCFS2 deployments which aren't using cluster technology to enable
> fcntl().

Unfortunately there seems to be no way to detect if fcntl() is cluster aware. 
When OCFS2 is used without an userspace cluster stack, the fcntl() locks will 
only exist on that machine. Every other machine in the cluster don't see them.

So we would still need a config parameter or libvirt always uses flock() and 
doesn't care about all crappy NFS servers which doen't support flock() :)

David


> 
> Daniel




More information about the libvir-list mailing list