From saffroy at gmail.com Tue Dec 6 16:18:44 2016 From: saffroy at gmail.com (Jean-Marc Saffroy) Date: Tue, 6 Dec 2016 17:18:44 +0100 (CET) Subject: [Linux-cluster] DLM user API for lock value block Message-ID: Hi, I am trying to use the DLM userland API (libdlm3), and while I was able to do plain lock acquisitions and conversions, I am stuck trying to update and then read the lock value block. Does anyone have working examples of this? I did look at the rhdlmbook doc, but couldn't fine one. Attached is a messy test I wrote, which fails because it looks like up-converting a lock with the LKF_VALBLK set doesn't seem to overwrite the buffer I provide for the lock value block (and with strace it looks like the kernel device returns the LVB on a down-conversion! weird). Example output below. Cheers, Jean-Marc -- saffroy at gmail.com $ make D=1 gcc -D_REENTRANT -Wall -Werror -O0 -g locklvb.c -pthread -ldlm -lpthread -o locklvb $ ./locklvb dlm_kernel_version 6.0.1 create_lockspace create_lockspace: Operation not permitted open_lockspace dlm_pthread_init acquiring NL on MyLock... LOCK mode -> NL convert 0 read_lvb 0 write_lvb 0 completion ast entering loop on lock #1 count 0 LOCK mode -> PW convert 1 read_lvb 0 write_lvb 0 completion ast init lvb => 51 lvb cache => 52 LOCK mode -> CR convert 1 read_lvb 0 write_lvb 1 completion ast count 1 LOCK mode -> PW convert 1 read_lvb 1 write_lvb 0 completion ast read lvb -1 locklvb: locklvb.c:177: do_lock: Assertion `lvb_lock.val >= 0' failed. Aborted (core dumped) -------------- next part -------------- A non-text attachment was scrubbed... Name: locklvb.c Type: text/x-csrc Size: 8573 bytes Desc: URL: From teigland at redhat.com Tue Dec 6 16:50:45 2016 From: teigland at redhat.com (David Teigland) Date: Tue, 6 Dec 2016 10:50:45 -0600 Subject: [Linux-cluster] DLM user API for lock value block In-Reply-To: References: Message-ID: <20161206165045.GA11126@redhat.com> On Tue, Dec 06, 2016 at 05:18:44PM +0100, Jean-Marc Saffroy wrote: > Hi, > > I am trying to use the DLM userland API (libdlm3), and while I was able to > do plain lock acquisitions and conversions, I am stuck trying to update > and then read the lock value block. > > Does anyone have working examples of this? I did look at the rhdlmbook > doc, but couldn't fine one. I haven't looked at your test to check if you're actually seeing this bug, but you'll want this fix in any case: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/dlm/user.c?id=b96f465035f9fae83c1d8de3e80eecfe6877608c In the following lvmlockd code, you can see an example of working around that bug if you don't have immediate access to a newer kernel: https://git.fedorahosted.org/cgit/lvm2.git/tree/daemons/lvmlockd/lvmlockd-dlm.c There are some other random userland tests here that use lvbs: https://fedorapeople.org/cgit/teigland/public_git/dct-stuff.git/tree/dlm Dave From saffroy at gmail.com Tue Dec 6 18:02:58 2016 From: saffroy at gmail.com (Jean-Marc Saffroy) Date: Tue, 6 Dec 2016 19:02:58 +0100 (CET) Subject: [Linux-cluster] DLM user API for lock value block In-Reply-To: <20161206165045.GA11126@redhat.com> References: <20161206165045.GA11126@redhat.com> Message-ID: On Tue, 6 Dec 2016, David Teigland wrote: > I haven't looked at your test to check if you're actually seeing this bug, > but you'll want this fix in any case: > > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/dlm/user.c?id=b96f465035f9fae83c1d8de3e80eecfe6877608c That's definitely the issue I have. > In the following lvmlockd code, you can see an example of working around > that bug if you don't have immediate access to a newer kernel: > > https://git.fedorahosted.org/cgit/lvm2.git/tree/daemons/lvmlockd/lvmlockd-dlm.c Indeed, I have to work with not-so-recent distributions and their kernels, so a workaround is much needed. Adding a similar workaround in my test does help! But only with a single process, because with more I quickly get a conversion deadlock error. :( I will need to think more about this. Thanks a lot for the pointers! Cheers, JM -- saffroy at gmail.com From wferi at niif.hu Sat Dec 17 20:18:04 2016 From: wferi at niif.hu (Ferenc =?utf-8?Q?W=C3=A1gner?=) Date: Sat, 17 Dec 2016 21:18:04 +0100 Subject: [Linux-cluster] Status of Git DLM Message-ID: <87shpmux6r.fsf@lant.ki.iif.hu> Hi David, Is the current DLM HEAD (d5d7b8dd) stable enough for packaging? If so, could you please tag and release it? -- Thanks, Feri From saffroy at gmail.com Sun Dec 18 19:42:49 2016 From: saffroy at gmail.com (Jean-Marc Saffroy) Date: Sun, 18 Dec 2016 20:42:49 +0100 (CET) Subject: [Linux-cluster] DLM user API for blocking AST Message-ID: Hi, Continuing with my experiments with the DLM user API, I am trying to use blocking AST callbacks, and find that the rules for the lifetime and ownership of the dlm_lksb struct are a bit surprising. This led me to some investigations, and the question at the end of this email. It looks like the kernel remembers the pointer to the lksb struct used to issue the dlm_lock call, and libdlm happily overwrites this piece of memory whenever the kernel issues an event related to that lock, including just before firing a BAST callback. It is a bit frustrating because I got caught by surprise wondering why something was smashing my stack, ie. the place where I had once laid out my dlm_lksb, thinking that it was okay to release its memory after the completion AST callback has completed. For now I have (apparently) working test code that deals with this in the following way: for a given lock (identified by its lockid), I keep two dlm_lksb structs and a bit indicating which of the two is free to use for conversions. I update the bit every time the CAST (not BAST) callback completes, thus doing some kind of double buffering. So I assume that: - each lock acquisition or conversion call gives ownership of the lksb to the kernel and libdlm (because a BAST callback can fire at any time and will overwrite the struct), causing the kernel/libdlm to forget about the previously owned lksb (meaning the caller can/should then dispose of it) - AST and BAST callbacks run in order, such that after the CAST completes, and until a conversion occurs, a BAST firing will only overwrite the lksb given on the last lock or conversion Are my assumptions correct? Cheers, Jean-Marc -- saffroy at gmail.com From saffroy at gmail.com Sun Dec 18 19:46:25 2016 From: saffroy at gmail.com (Jean-Marc Saffroy) Date: Sun, 18 Dec 2016 20:46:25 +0100 (CET) Subject: [Linux-cluster] Protocol compatibility of DLM/Corosync across versions Message-ID: Hi (again), Another question I have regarding DLM and Corosync (because Corosync is required to use DLM): should I expect compatibility across versions? I did a quick test between distributions running different kernels (CentOS 6, Centos7 and Ubuntu 14) but rather close versions of Corosync, and that test worked, but I am not sure if that was just luck. ;) Cheers, Jean-Marc -- saffroy at gmail.com From teigland at redhat.com Mon Dec 19 17:59:42 2016 From: teigland at redhat.com (David Teigland) Date: Mon, 19 Dec 2016 11:59:42 -0600 Subject: [Linux-cluster] DLM user API for blocking AST In-Reply-To: References: Message-ID: <20161219175942.GB13720@redhat.com> On Sun, Dec 18, 2016 at 08:42:49PM +0100, Jean-Marc Saffroy wrote: > Hi, > > Continuing with my experiments with the DLM user API, I am trying to use > blocking AST callbacks, and find that the rules for the lifetime and > ownership of the dlm_lksb struct are a bit surprising. This led me to some > investigations, and the question at the end of this email. Hi, you're discovering just how old and crusty this userland interface is. Sorry about that :) This userland interface is left over from the earliest dlm implementation, which could generously be called experimental. It has long needed a thorough redesign, but because the dlm is not heavily used from userland, and because user/kernel interfaces are hard, it's never been done. > It looks like the kernel remembers the pointer to the lksb struct used to > issue the dlm_lock call, and libdlm happily overwrites this piece of > memory whenever the kernel issues an event related to that lock, including > just before firing a BAST callback. It is a bit frustrating because I got > caught by surprise wondering why something was smashing my stack, ie. the > place where I had once laid out my dlm_lksb, thinking that it was okay to > release its memory after the completion AST callback has completed. The way the kernel saves and restores the pointers is very unpleasant, and handling lifetimes of lock structs/memory a big pain. In a quick look at this code, I'm not seeing any simple or obvious ways to avoid the lksb behavior you're describing, but it's been a while since I was very familiar with this area. With more study, it's possible that a fix could be found, but it seems a bit unlikely. As a workaround to avoid an unwanted bast callback after a completion, I wonder if you could make a no-op call with NULL astaddr/astarg to prevent any further callback using those? > For now I have (apparently) working test code that deals with this in the > following way: for a given lock (identified by its lockid), I keep two > dlm_lksb structs and a bit indicating which of the two is free to use for > conversions. I update the bit every time the CAST (not BAST) callback > completes, thus doing some kind of double buffering. OK, I don't know enough about the details to say whether there are any subtle issues with this or not. > So I assume that: > > - each lock acquisition or conversion call gives ownership of the lksb to > the kernel and libdlm (because a BAST callback can fire at any time and > will overwrite the struct), causing the kernel/libdlm to forget about the > previously owned lksb (meaning the caller can/should then dispose of it) That sounds about right. > - AST and BAST callbacks run in order, such that after the CAST completes, > and until a conversion occurs, a BAST firing will only overwrite the lksb > given on the last lock or conversion > > Are my assumptions correct? That also sounds like it should be true. To say with more certainty would require closer study of the code, because whatever rules exist are a function of the current implemention, and not derived from higher design rules per se. Dave From teigland at redhat.com Mon Dec 19 18:14:39 2016 From: teigland at redhat.com (David Teigland) Date: Mon, 19 Dec 2016 12:14:39 -0600 Subject: [Linux-cluster] Protocol compatibility of DLM/Corosync across versions In-Reply-To: References: Message-ID: <20161219181439.GC13720@redhat.com> On Sun, Dec 18, 2016 at 08:46:25PM +0100, Jean-Marc Saffroy wrote: > Hi (again), > > Another question I have regarding DLM and Corosync (because Corosync is > required to use DLM): should I expect compatibility across versions? > > I did a quick test between distributions running different kernels (CentOS > 6, Centos7 and Ubuntu 14) but rather close versions of Corosync, and that > test worked, but I am not sure if that was just luck. ;) I can only speak for the dlm part of that. Between different distributions, I'd call it luck :) Within the context of one distribution things shouldn't break if the distribution is doing it's job. Upstream, with no distribution context, I'm certainly aware of when compatibility breaks between dlm_controld and corosync and between different dlm_controld versions on nodes. I try to avoid it, but there are unpredictable reasons that it can break. Dave From saffroy at gmail.com Tue Dec 20 01:15:28 2016 From: saffroy at gmail.com (Jean-Marc Saffroy) Date: Tue, 20 Dec 2016 02:15:28 +0100 (CET) Subject: [Linux-cluster] DLM user API for blocking AST In-Reply-To: <20161219175942.GB13720@redhat.com> References: <20161219175942.GB13720@redhat.com> Message-ID: On Mon, 19 Dec 2016, David Teigland wrote: > As a workaround to avoid an unwanted bast callback after a completion, I > wonder if you could make a no-op call with NULL astaddr/astarg to prevent > any further callback using those? I assume that what you call a no-op is a lock conversion towards the same mode as before, correct? Then I think for this to work we need the second assumption I made, ie. the kernel should not deliver the bast event to userland after it delivered the cast for the last conversion. [... assumptions ...] > That also sounds like it should be true. > > To say with more certainty would require closer study of the code, because > whatever rules exist are a function of the current implemention, and not > derived from higher design rules per se. Ok, I understand. Thank you David! Cheers, JM -- saffroy at gmail.com From saffroy at gmail.com Tue Dec 20 01:56:51 2016 From: saffroy at gmail.com (Jean-Marc Saffroy) Date: Tue, 20 Dec 2016 02:56:51 +0100 (CET) Subject: [Linux-cluster] Protocol compatibility of DLM/Corosync across versions In-Reply-To: <20161219181439.GC13720@redhat.com> References: <20161219181439.GC13720@redhat.com> Message-ID: On Mon, 19 Dec 2016, David Teigland wrote: > On Sun, Dec 18, 2016 at 08:46:25PM +0100, Jean-Marc Saffroy wrote: > > Hi (again), > > > > Another question I have regarding DLM and Corosync (because Corosync is > > required to use DLM): should I expect compatibility across versions? > > > > I did a quick test between distributions running different kernels (CentOS > > 6, Centos7 and Ubuntu 14) but rather close versions of Corosync, and that > > test worked, but I am not sure if that was just luck. ;) > > I can only speak for the dlm part of that. Between different > distributions, I'd call it luck :) Ah, so that could be a serious problem for me. I hoped to be able to use dlm across distributions without having to qualify each possible combination... > Within the context of one distribution things shouldn't break if the > distribution is doing it's job. Does that mean that, for example, I could expect dlm instances in RHEL6 and RHEL7 kernels to work together? > Upstream, with no distribution context, I'm certainly aware of when > compatibility breaks between dlm_controld and corosync and between > different dlm_controld versions on nodes. I try to avoid it, but there > are unpredictable reasons that it can break. How could instances of dlm_controld interact badly? I thought they were just glue between dlm and corosync, and never directly talk on the network. Do they have network-visible side effects on dlm/corosync? In the end, I need to work across distributions and their kernels, but I could build from source a specific version of corosync and (the userland part of) dlm. I expect that the kernel interface to dlm is stable (right?), so the biggest risk would be incompatibilities in the dlm protocol on the network. Is this protocol stable? With git I see that DLM_HEADER_MAJOR/MINOR macros changed very rarely in recent years but I can't tell if this is a good indicator. Cheers, JM -- saffroy at gmail.com From jfriesse at redhat.com Tue Dec 20 10:21:35 2016 From: jfriesse at redhat.com (Jan Friesse) Date: Tue, 20 Dec 2016 11:21:35 +0100 Subject: [Linux-cluster] Protocol compatibility of DLM/Corosync across versions In-Reply-To: References: Message-ID: <5859062F.4080109@redhat.com> > Hi (again), > > Another question I have regarding DLM and Corosync (because Corosync is > required to use DLM): should I expect compatibility across versions? I will add just Corosync information. Corosync with same major version is compatible (ie 2.3.0 works with 2.4.0) but if major version differs it is incompatible (so no luck with 1.4.2 vs 2.3.0). In 3.x we plan to (finally) to implement protocol (instead of sending 64-bit aligned C structures as it is now) so backwards/forwards compatibility should improve (but it also means that 3.x is going to be incompatible with 2.x and 1.x). For distribution versions. At least Fedora/RHEL has no extra patches for Corosync (eventho we may backport some fixes/features but never breaks compatibility with upstream). I believe most of other distro works similar way. So as long as you keep Corosync versions close, Corosync works. > > I did a quick test between distributions running different kernels (CentOS > 6, Centos7 and Ubuntu 14) but rather close versions of Corosync, and that > test worked, but I am not sure if that was just luck. ;) > > > Cheers, > Jean-Marc > From saffroy at gmail.com Tue Dec 20 10:33:59 2016 From: saffroy at gmail.com (Jean-Marc Saffroy) Date: Tue, 20 Dec 2016 11:33:59 +0100 (CET) Subject: [Linux-cluster] Protocol compatibility of DLM/Corosync across versions In-Reply-To: <5859062F.4080109@redhat.com> References: <5859062F.4080109@redhat.com> Message-ID: Thanks Jan for these details! Cheers, JM -- saffroy at gmail.com From teigland at redhat.com Tue Dec 20 14:55:13 2016 From: teigland at redhat.com (David Teigland) Date: Tue, 20 Dec 2016 08:55:13 -0600 Subject: [Linux-cluster] DLM user API for blocking AST In-Reply-To: References: <20161219175942.GB13720@redhat.com> Message-ID: <20161220145513.GA4800@redhat.com> > > As a workaround to avoid an unwanted bast callback after a completion, I > > wonder if you could make a no-op call with NULL astaddr/astarg to prevent > > any further callback using those? > > I assume that what you call a no-op is a lock conversion towards the same > mode as before, correct? That should work. > Then I think for this to work we need the second assumption I made, ie. > the kernel should not deliver the bast event to userland after it > delivered the cast for the last conversion. The suggestion just depended on clearing on the astaddr/astarg values in the kernel (which will itself have some effect on callbacks being sent or not). I hoped that might be a way to avoid analyzing callback ordering. Dave From teigland at redhat.com Tue Dec 20 15:13:54 2016 From: teigland at redhat.com (David Teigland) Date: Tue, 20 Dec 2016 09:13:54 -0600 Subject: [Linux-cluster] Protocol compatibility of DLM/Corosync across versions In-Reply-To: References: <20161219181439.GC13720@redhat.com> Message-ID: <20161220151354.GB4800@redhat.com> On Tue, Dec 20, 2016 at 02:56:51AM +0100, Jean-Marc Saffroy wrote: > Ah, so that could be a serious problem for me. I hoped to be able to use > dlm across distributions without having to qualify each possible > combination... > > > Within the context of one distribution things shouldn't break if the > > distribution is doing it's job. > > Does that mean that, for example, I could expect dlm instances in RHEL6 > and RHEL7 kernels to work together? I'm not sure you're talking to the right person any more. In the history of dlm in RHEL, the userland bits have usually only worked together within one major release. > How could instances of dlm_controld interact badly? I thought they were > just glue between dlm and corosync, and never directly talk on the > network. Do they have network-visible side effects on dlm/corosync? dlm_controld instances communicate using a protocol that could change. This is for coordinating lockspace recovery, not performing locking. > In the end, I need to work across distributions and their kernels, but I > could build from source a specific version of corosync and (the userland > part of) dlm. I expect that the kernel interface to dlm is stable > (right?), right > so the biggest risk would be incompatibilities in the dlm > protocol on the network. Is this protocol stable? With git I see that > DLM_HEADER_MAJOR/MINOR macros changed very rarely in recent years but I > can't tell if this is a good indicator. It's quite stable. Your situation sounds very unique. If you can choose and build userspace code yourself, they you don't need to worry about distributions I guess. Dave