From saffroy at gmail.com  Tue Dec  6 16:18:44 2016
From: saffroy at gmail.com (Jean-Marc Saffroy)
Date: Tue, 6 Dec 2016 17:18:44 +0100 (CET)
Subject: [Linux-cluster] DLM user API for lock value block
Message-ID: <alpine.DEB.2.11.1612061708270.2863@erda.mds>

Hi,

I am trying to use the DLM userland API (libdlm3), and while I was able to 
do plain lock acquisitions and conversions, I am stuck trying to update 
and then read the lock value block.

Does anyone have working examples of this? I did look at the rhdlmbook 
doc, but couldn't fine one.

Attached is a messy test I wrote, which fails because it looks like 
up-converting a lock with the LKF_VALBLK set doesn't seem to overwrite the 
buffer I provide for the lock value block (and with strace it looks like 
the kernel device returns the LVB on a down-conversion! weird). Example 
output below.


Cheers,
Jean-Marc

-- 
saffroy at gmail.com

$ make D=1
gcc -D_REENTRANT -Wall -Werror   -O0 -g    locklvb.c  -pthread -ldlm 
-lpthread   -o locklvb

$ ./locklvb 
dlm_kernel_version 6.0.1
create_lockspace
create_lockspace: Operation not permitted
open_lockspace
dlm_pthread_init
acquiring NL on MyLock...
LOCK mode -> NL convert 0
read_lvb 0 write_lvb 0
completion ast
entering loop on lock #1
count 0
LOCK mode -> PW convert 1
read_lvb 0 write_lvb 0
completion ast
init lvb => 51
lvb cache => 52
LOCK mode -> CR convert 1
read_lvb 0 write_lvb 1
completion ast
count 1
LOCK mode -> PW convert 1
read_lvb 1 write_lvb 0
completion ast
read lvb -1
locklvb: locklvb.c:177: do_lock: Assertion `lvb_lock.val >= 0' failed.
Aborted (core dumped)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: locklvb.c
Type: text/x-csrc
Size: 8573 bytes
Desc: 
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20161206/64fd2e0b/attachment.bin>

From teigland at redhat.com  Tue Dec  6 16:50:45 2016
From: teigland at redhat.com (David Teigland)
Date: Tue, 6 Dec 2016 10:50:45 -0600
Subject: [Linux-cluster] DLM user API for lock value block
In-Reply-To: <alpine.DEB.2.11.1612061708270.2863@erda.mds>
References: <alpine.DEB.2.11.1612061708270.2863@erda.mds>
Message-ID: <20161206165045.GA11126@redhat.com>

On Tue, Dec 06, 2016 at 05:18:44PM +0100, Jean-Marc Saffroy wrote:
> Hi,
> 
> I am trying to use the DLM userland API (libdlm3), and while I was able to 
> do plain lock acquisitions and conversions, I am stuck trying to update 
> and then read the lock value block.
> 
> Does anyone have working examples of this? I did look at the rhdlmbook 
> doc, but couldn't fine one.

I haven't looked at your test to check if you're actually seeing this bug,
but you'll want this fix in any case:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/dlm/user.c?id=b96f465035f9fae83c1d8de3e80eecfe6877608c

In the following lvmlockd code, you can see an example of working around
that bug if you don't have immediate access to a newer kernel:

https://git.fedorahosted.org/cgit/lvm2.git/tree/daemons/lvmlockd/lvmlockd-dlm.c

There are some other random userland tests here that use lvbs:

https://fedorapeople.org/cgit/teigland/public_git/dct-stuff.git/tree/dlm

Dave


From saffroy at gmail.com  Tue Dec  6 18:02:58 2016
From: saffroy at gmail.com (Jean-Marc Saffroy)
Date: Tue, 6 Dec 2016 19:02:58 +0100 (CET)
Subject: [Linux-cluster] DLM user API for lock value block
In-Reply-To: <20161206165045.GA11126@redhat.com>
References: <alpine.DEB.2.11.1612061708270.2863@erda.mds>
	<20161206165045.GA11126@redhat.com>
Message-ID: <alpine.DEB.2.11.1612061852150.2863@erda.mds>

On Tue, 6 Dec 2016, David Teigland wrote:

> I haven't looked at your test to check if you're actually seeing this bug,
> but you'll want this fix in any case:
> 
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/dlm/user.c?id=b96f465035f9fae83c1d8de3e80eecfe6877608c

That's definitely the issue I have.

> In the following lvmlockd code, you can see an example of working around
> that bug if you don't have immediate access to a newer kernel:
> 
> https://git.fedorahosted.org/cgit/lvm2.git/tree/daemons/lvmlockd/lvmlockd-dlm.c

Indeed, I have to work with not-so-recent distributions and their kernels, 
so a workaround is much needed.

Adding a similar workaround in my test does help! But only with a single 
process, because with more I quickly get a conversion deadlock error. :( I 
will need to think more about this.

Thanks a lot for the pointers!


Cheers,
JM

-- 
saffroy at gmail.com


From wferi at niif.hu  Sat Dec 17 20:18:04 2016
From: wferi at niif.hu (Ferenc =?utf-8?Q?W=C3=A1gner?=)
Date: Sat, 17 Dec 2016 21:18:04 +0100
Subject: [Linux-cluster] Status of Git DLM
Message-ID: <87shpmux6r.fsf@lant.ki.iif.hu>

Hi David,

Is the current DLM HEAD (d5d7b8dd) stable enough for packaging?  If so,
could you please tag and release it?
-- 
Thanks,
Feri


From saffroy at gmail.com  Sun Dec 18 19:42:49 2016
From: saffroy at gmail.com (Jean-Marc Saffroy)
Date: Sun, 18 Dec 2016 20:42:49 +0100 (CET)
Subject: [Linux-cluster] DLM user API for blocking AST
Message-ID: <alpine.DEB.2.11.1612182020130.2863@erda.mds>

Hi,

Continuing with my experiments with the DLM user API, I am trying to use 
blocking AST callbacks, and find that the rules for the lifetime and 
ownership of the dlm_lksb struct are a bit surprising. This led me to some 
investigations, and the question at the end of this email.

It looks like the kernel remembers the pointer to the lksb struct used to 
issue the dlm_lock call, and libdlm happily overwrites this piece of 
memory whenever the kernel issues an event related to that lock, including 
just before firing a BAST callback. It is a bit frustrating because I got 
caught by surprise wondering why something was smashing my stack, ie. the 
place where I had once laid out my dlm_lksb, thinking that it was okay to 
release its memory after the completion AST callback has completed.

For now I have (apparently) working test code that deals with this in the 
following way: for a given lock (identified by its lockid), I keep two 
dlm_lksb structs and a bit indicating which of the two is free to use for 
conversions. I update the bit every time the CAST (not BAST) callback 
completes, thus doing some kind of double buffering.

So I assume that:

- each lock acquisition or conversion call gives ownership of the lksb to 
the kernel and libdlm (because a BAST callback can fire at any time and 
will overwrite the struct), causing the kernel/libdlm to forget about the 
previously owned lksb (meaning the caller can/should then dispose of it)

- AST and BAST callbacks run in order, such that after the CAST completes, 
and until a conversion occurs, a BAST firing will only overwrite the lksb 
given on the last lock or conversion

Are my assumptions correct?


Cheers,
Jean-Marc

-- 
saffroy at gmail.com


From saffroy at gmail.com  Sun Dec 18 19:46:25 2016
From: saffroy at gmail.com (Jean-Marc Saffroy)
Date: Sun, 18 Dec 2016 20:46:25 +0100 (CET)
Subject: [Linux-cluster] Protocol compatibility of DLM/Corosync across
	versions
Message-ID: <alpine.DEB.2.11.1612182042540.2863@erda.mds>

Hi (again),

Another question I have regarding DLM and Corosync (because Corosync is 
required to use DLM): should I expect compatibility across versions?

I did a quick test between distributions running different kernels (CentOS 
6, Centos7 and Ubuntu 14) but rather close versions of Corosync, and that 
test worked, but I am not sure if that was just luck. ;)


Cheers,
Jean-Marc

-- 
saffroy at gmail.com


From teigland at redhat.com  Mon Dec 19 17:59:42 2016
From: teigland at redhat.com (David Teigland)
Date: Mon, 19 Dec 2016 11:59:42 -0600
Subject: [Linux-cluster] DLM user API for blocking AST
In-Reply-To: <alpine.DEB.2.11.1612182020130.2863@erda.mds>
References: <alpine.DEB.2.11.1612182020130.2863@erda.mds>
Message-ID: <20161219175942.GB13720@redhat.com>

On Sun, Dec 18, 2016 at 08:42:49PM +0100, Jean-Marc Saffroy wrote:
> Hi,
> 
> Continuing with my experiments with the DLM user API, I am trying to use 
> blocking AST callbacks, and find that the rules for the lifetime and 
> ownership of the dlm_lksb struct are a bit surprising. This led me to some 
> investigations, and the question at the end of this email.

Hi, you're discovering just how old and crusty this userland interface is.
Sorry about that :)

This userland interface is left over from the earliest dlm implementation,
which could generously be called experimental.  It has long needed a
thorough redesign, but because the dlm is not heavily used from userland,
and because user/kernel interfaces are hard, it's never been done.

> It looks like the kernel remembers the pointer to the lksb struct used to 
> issue the dlm_lock call, and libdlm happily overwrites this piece of 
> memory whenever the kernel issues an event related to that lock, including 
> just before firing a BAST callback. It is a bit frustrating because I got 
> caught by surprise wondering why something was smashing my stack, ie. the 
> place where I had once laid out my dlm_lksb, thinking that it was okay to 
> release its memory after the completion AST callback has completed.

The way the kernel saves and restores the pointers is very unpleasant, and
handling lifetimes of lock structs/memory a big pain.

In a quick look at this code, I'm not seeing any simple or obvious ways to
avoid the lksb behavior you're describing, but it's been a while since I
was very familiar with this area.  With more study, it's possible that a
fix could be found, but it seems a bit unlikely.

As a workaround to avoid an unwanted bast callback after a completion, I
wonder if you could make a no-op call with NULL astaddr/astarg to prevent
any further callback using those?

> For now I have (apparently) working test code that deals with this in the 
> following way: for a given lock (identified by its lockid), I keep two 
> dlm_lksb structs and a bit indicating which of the two is free to use for 
> conversions. I update the bit every time the CAST (not BAST) callback 
> completes, thus doing some kind of double buffering.

OK, I don't know enough about the details to say whether there are any
subtle issues with this or not.

> So I assume that:
> 
> - each lock acquisition or conversion call gives ownership of the lksb to 
> the kernel and libdlm (because a BAST callback can fire at any time and 
> will overwrite the struct), causing the kernel/libdlm to forget about the 
> previously owned lksb (meaning the caller can/should then dispose of it)

That sounds about right.

> - AST and BAST callbacks run in order, such that after the CAST completes, 
> and until a conversion occurs, a BAST firing will only overwrite the lksb 
> given on the last lock or conversion
> 
> Are my assumptions correct?

That also sounds like it should be true.

To say with more certainty would require closer study of the code, because
whatever rules exist are a function of the current implemention, and not
derived from higher design rules per se.

Dave


From teigland at redhat.com  Mon Dec 19 18:14:39 2016
From: teigland at redhat.com (David Teigland)
Date: Mon, 19 Dec 2016 12:14:39 -0600
Subject: [Linux-cluster] Protocol compatibility of DLM/Corosync across
 versions
In-Reply-To: <alpine.DEB.2.11.1612182042540.2863@erda.mds>
References: <alpine.DEB.2.11.1612182042540.2863@erda.mds>
Message-ID: <20161219181439.GC13720@redhat.com>

On Sun, Dec 18, 2016 at 08:46:25PM +0100, Jean-Marc Saffroy wrote:
> Hi (again),
> 
> Another question I have regarding DLM and Corosync (because Corosync is 
> required to use DLM): should I expect compatibility across versions?
> 
> I did a quick test between distributions running different kernels (CentOS 
> 6, Centos7 and Ubuntu 14) but rather close versions of Corosync, and that 
> test worked, but I am not sure if that was just luck. ;)

I can only speak for the dlm part of that.  Between different
distributions, I'd call it luck :)  Within the context of one distribution
things shouldn't break if the distribution is doing it's job.  Upstream,
with no distribution context, I'm certainly aware of when compatibility
breaks between dlm_controld and corosync and between different
dlm_controld versions on nodes.  I try to avoid it, but there are
unpredictable reasons that it can break.

Dave


From saffroy at gmail.com  Tue Dec 20 01:15:28 2016
From: saffroy at gmail.com (Jean-Marc Saffroy)
Date: Tue, 20 Dec 2016 02:15:28 +0100 (CET)
Subject: [Linux-cluster] DLM user API for blocking AST
In-Reply-To: <20161219175942.GB13720@redhat.com>
References: <alpine.DEB.2.11.1612182020130.2863@erda.mds>
	<20161219175942.GB13720@redhat.com>
Message-ID: <alpine.DEB.2.11.1612200207210.2863@erda.mds>

On Mon, 19 Dec 2016, David Teigland wrote:

> As a workaround to avoid an unwanted bast callback after a completion, I
> wonder if you could make a no-op call with NULL astaddr/astarg to prevent
> any further callback using those?

I assume that what you call a no-op is a lock conversion towards the same 
mode as before, correct?

Then I think for this to work we need the second assumption I made, ie. 
the kernel should not deliver the bast event to userland after it 
delivered the cast for the last conversion.

[... assumptions ...]
> That also sounds like it should be true.
> 
> To say with more certainty would require closer study of the code, because
> whatever rules exist are a function of the current implemention, and not
> derived from higher design rules per se.

Ok, I understand.

Thank you David!


Cheers,
JM

-- 
saffroy at gmail.com


From saffroy at gmail.com  Tue Dec 20 01:56:51 2016
From: saffroy at gmail.com (Jean-Marc Saffroy)
Date: Tue, 20 Dec 2016 02:56:51 +0100 (CET)
Subject: [Linux-cluster] Protocol compatibility of DLM/Corosync across
 versions
In-Reply-To: <20161219181439.GC13720@redhat.com>
References: <alpine.DEB.2.11.1612182042540.2863@erda.mds>
	<20161219181439.GC13720@redhat.com>
Message-ID: <alpine.DEB.2.11.1612200218080.2863@erda.mds>

On Mon, 19 Dec 2016, David Teigland wrote:

> On Sun, Dec 18, 2016 at 08:46:25PM +0100, Jean-Marc Saffroy wrote:
> > Hi (again),
> > 
> > Another question I have regarding DLM and Corosync (because Corosync is 
> > required to use DLM): should I expect compatibility across versions?
> > 
> > I did a quick test between distributions running different kernels (CentOS 
> > 6, Centos7 and Ubuntu 14) but rather close versions of Corosync, and that 
> > test worked, but I am not sure if that was just luck. ;)
> 
> I can only speak for the dlm part of that.  Between different
> distributions, I'd call it luck :) 

Ah, so that could be a serious problem for me. I hoped to be able to use 
dlm across distributions without having to qualify each possible 
combination...

> Within the context of one distribution things shouldn't break if the 
> distribution is doing it's job.

Does that mean that, for example, I could expect dlm instances in RHEL6 
and RHEL7 kernels to work together?

> Upstream, with no distribution context, I'm certainly aware of when 
> compatibility breaks between dlm_controld and corosync and between 
> different dlm_controld versions on nodes.  I try to avoid it, but there 
> are unpredictable reasons that it can break.

How could instances of dlm_controld interact badly? I thought they were 
just glue between dlm and corosync, and never directly talk on the 
network. Do they have network-visible side effects on dlm/corosync?

In the end, I need to work across distributions and their kernels, but I 
could build from source a specific version of corosync and (the userland 
part of) dlm. I expect that the kernel interface to dlm is stable 
(right?), so the biggest risk would be incompatibilities in the dlm 
protocol on the network. Is this protocol stable? With git I see that 
DLM_HEADER_MAJOR/MINOR macros changed very rarely in recent years but I 
can't tell if this is a good indicator.


Cheers,
JM

-- 
saffroy at gmail.com


From jfriesse at redhat.com  Tue Dec 20 10:21:35 2016
From: jfriesse at redhat.com (Jan Friesse)
Date: Tue, 20 Dec 2016 11:21:35 +0100
Subject: [Linux-cluster] Protocol compatibility of DLM/Corosync across
 versions
In-Reply-To: <alpine.DEB.2.11.1612182042540.2863@erda.mds>
References: <alpine.DEB.2.11.1612182042540.2863@erda.mds>
Message-ID: <5859062F.4080109@redhat.com>

> Hi (again),
>
> Another question I have regarding DLM and Corosync (because Corosync is
> required to use DLM): should I expect compatibility across versions?

I will add just Corosync information.

Corosync with same major version is compatible (ie 2.3.0 works with 
2.4.0) but if major version differs it is incompatible (so no luck with 
1.4.2 vs 2.3.0). In 3.x we plan to (finally) to implement protocol 
(instead of sending 64-bit aligned C structures as it is now) so 
backwards/forwards compatibility should improve (but it also means that 
3.x is going to be incompatible with 2.x and 1.x).

For distribution versions. At least Fedora/RHEL has no extra patches for 
Corosync (eventho we may backport some fixes/features but never  breaks 
compatibility with upstream). I believe most of other distro works 
similar way. So as long as you keep Corosync versions close, Corosync works.


>
> I did a quick test between distributions running different kernels (CentOS
> 6, Centos7 and Ubuntu 14) but rather close versions of Corosync, and that
> test worked, but I am not sure if that was just luck. ;)
>
>
> Cheers,
> Jean-Marc
>


From saffroy at gmail.com  Tue Dec 20 10:33:59 2016
From: saffroy at gmail.com (Jean-Marc Saffroy)
Date: Tue, 20 Dec 2016 11:33:59 +0100 (CET)
Subject: [Linux-cluster] Protocol compatibility of DLM/Corosync across
 versions
In-Reply-To: <5859062F.4080109@redhat.com>
References: <alpine.DEB.2.11.1612182042540.2863@erda.mds>
	<5859062F.4080109@redhat.com>
Message-ID: <alpine.DEB.2.11.1612201133100.2863@erda.mds>

Thanks Jan for these details!


Cheers,
JM

-- 
saffroy at gmail.com


From teigland at redhat.com  Tue Dec 20 14:55:13 2016
From: teigland at redhat.com (David Teigland)
Date: Tue, 20 Dec 2016 08:55:13 -0600
Subject: [Linux-cluster] DLM user API for blocking AST
In-Reply-To: <alpine.DEB.2.11.1612200207210.2863@erda.mds>
References: <alpine.DEB.2.11.1612182020130.2863@erda.mds>
	<20161219175942.GB13720@redhat.com>
	<alpine.DEB.2.11.1612200207210.2863@erda.mds>
Message-ID: <20161220145513.GA4800@redhat.com>

> > As a workaround to avoid an unwanted bast callback after a completion, I
> > wonder if you could make a no-op call with NULL astaddr/astarg to prevent
> > any further callback using those?
> 
> I assume that what you call a no-op is a lock conversion towards the same 
> mode as before, correct?

That should work.

> Then I think for this to work we need the second assumption I made, ie. 
> the kernel should not deliver the bast event to userland after it 
> delivered the cast for the last conversion.

The suggestion just depended on clearing on the astaddr/astarg values in
the kernel (which will itself have some effect on callbacks being sent or
not).  I hoped that might be a way to avoid analyzing callback ordering.

Dave


From teigland at redhat.com  Tue Dec 20 15:13:54 2016
From: teigland at redhat.com (David Teigland)
Date: Tue, 20 Dec 2016 09:13:54 -0600
Subject: [Linux-cluster] Protocol compatibility of DLM/Corosync across
 versions
In-Reply-To: <alpine.DEB.2.11.1612200218080.2863@erda.mds>
References: <alpine.DEB.2.11.1612182042540.2863@erda.mds>
	<20161219181439.GC13720@redhat.com>
	<alpine.DEB.2.11.1612200218080.2863@erda.mds>
Message-ID: <20161220151354.GB4800@redhat.com>

On Tue, Dec 20, 2016 at 02:56:51AM +0100, Jean-Marc Saffroy wrote:
> Ah, so that could be a serious problem for me. I hoped to be able to use 
> dlm across distributions without having to qualify each possible 
> combination...
> 
> > Within the context of one distribution things shouldn't break if the 
> > distribution is doing it's job.
> 
> Does that mean that, for example, I could expect dlm instances in RHEL6 
> and RHEL7 kernels to work together?

I'm not sure you're talking to the right person any more.  In the history
of dlm in RHEL, the userland bits have usually only worked together within
one major release.

> How could instances of dlm_controld interact badly? I thought they were 
> just glue between dlm and corosync, and never directly talk on the 
> network. Do they have network-visible side effects on dlm/corosync?

dlm_controld instances communicate using a protocol that could change.
This is for coordinating lockspace recovery, not performing locking.

> In the end, I need to work across distributions and their kernels, but I 
> could build from source a specific version of corosync and (the userland 
> part of) dlm. I expect that the kernel interface to dlm is stable 
> (right?), 

right

> so the biggest risk would be incompatibilities in the dlm 
> protocol on the network. Is this protocol stable? With git I see that 
> DLM_HEADER_MAJOR/MINOR macros changed very rarely in recent years but I 
> can't tell if this is a good indicator.

It's quite stable.  Your situation sounds very unique.  If you can choose
and build userspace code yourself, they you don't need to worry about
distributions I guess.
Dave