[Linux-cluster] new dlm control/configuration

Mark Fasheh mark.fasheh at oracle.com
Thu Mar 31 21:50:36 UTC 2005


On Thu, Mar 31, 2005 at 09:37:51AM +0100, Patrick Caulfield wrote:
> On Thu, Mar 31, 2005 at 04:27:07PM +0800, David Teigland wrote:
> > Sure, the mechanism used to export the locking API to user space is pretty
> > inconsequential.  We're doing reads/writes on a misc device at the moment
> > (used through libdlm of course.)  Going through an fs might be better but
> > I'm not sure why.
> 
> A long time ago, we did consider a filesystem interface to the DLM. We rejected
> it for a couple of reasons:
> 
> 1) the mapping of locks to files is not a very clean one. Trying to squeeze
> things like LVBs and ranges into the API soon gets very messy. Returning status
> from asynchronous operations can make the coding rather complicated for
> applications with multiple locks (you would need a file descriptor open for
> each lock!). Also the hierarchy functions differently: a lock that has children
> is still a lock, not just a directory.
Well it's actually quite clean in ocfs2_dlmfs, part of that is likely
related to some design calls we made early on to simplify our userspace
locking. We don't do ranges (anywhere really), and we consider all userspace
lock requests to be synchronous. This does however result in a userspace API
which is extremely lightweight and dirt simple to use.

mkdir gives you a new domain, files created within that directory correspond
to lock resource with the same name. Open O_RDONLY gets you a PR mode lock,
open RDWR gives you an EX mode lock. You can do NOQUEUE (trylock) ops with
O_NONBLOCK. Reads and writes to the file return and set the LVB accordingly.

One can literally, create a domain, create locks within it and ship data via
the LVB all from a bash shell on my cluster nodes.

I was able to write a trivial library wrapper (for those who don't want to
use shell for controlling dlm functionality) in about 600 lines.
	--Mark

> 2) At the time it was still very complicated to add new filesystems to the Linux
> kernel. This has now changed of course.
> 
> I'll have a look at the OCFS2 filesystem and see if we can learn anything from
> it though.
> 
> -- 
> 
> patrick
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster
--
Mark Fasheh
Software Developer, Oracle
mark.fasheh at oracle.com




More information about the Linux-cluster mailing list