[Linux-cluster] CLVM/GFS2 distributed locking

Stevo Slavić sslavic at gmail.com
Fri Dec 30 20:37:00 UTC 2011


Pulling the cables between shared storage and foo01, foo01 gets fenced.
Here is some info from foo02 about shared storage and dlm debug (lock file
seems to remain locked)

root at foo02:-//data/activemq_data#ls -li
total 276
 66467 -rw-r--r-- 1 root root 33030144 Dec 30 16:32 db-1.log
 66468 -rw-r--r-- 1 root root    73728 Dec 30 16:24 db.data
 66470 -rw-r--r-- 1 root root    53344 Dec 30 16:24 db.redo
128014 -rw-r--r-- 1 root root        0 Dec 30 19:49 dummy
 66466 -rw-r--r-- 1 root root        0 Dec 30 16:23 lock
root at foo02:-//data/activemq_data#grep -A 7 -i 103a2 /debug/dlm/activemq
Resource ffff81090faf96c0 Name (len=24) "       2           103a2"
Master Copy
Granted Queue
03d10002 PR Remote:   1 00c80001
00e00001 PR
Conversion Queue
Waiting Queue
--
Resource ffff81090faf97c0 Name (len=24) "       5           103a2"
Master Copy
Granted Queue
03c30003 PR Remote:   1 039a0001
03550001 PR
Conversion Queue
Waiting Queue


Are there some docs for interpreting this dlm debug output?


Regards,
Stevo.

On Fri, Dec 30, 2011 at 9:23 PM, Digimer <linux at alteeve.com> wrote:

> On 12/30/2011 03:08 PM, Stevo Slavić wrote:
> > Hi Digimer and Yvette,
> >
> > Thanks for tips! I don't doubt reliability of the technology, just want
> > to make sure it is configured well.
> >
> > After fencing a node that held a lock on a file on shared storage, lock
> > remains, and non-fenced node cannot take over the lock on that file.
> > Wondering how can one check which process (from which node if possible)
> > is holding a lock on a file on shared storage.
> > dlm should have taken care of releasing the lock once node got fenced,
> > right?
> >
> > Regards,
> > Stevo.
>
> After a successful fence call, DLM will clean up any locks held by the
> lost node. That's why it's so critical that the fence action succeeded
> (ie: test-test-test). If a node doesn't actually die in a fence, but the
> cluster thinks it did, and somehow the lost node returns, the lost node
> will think it's locks are still valid and modify shared storage, leading
> to near-certain data corruption.
>
> It's all perfectly safe, provided you've tested your fencing properly. :)
>
> Yvette,
>
>  You might be right on the 'noatime' implying 'nodiratime'... I add
> both out of habit.
>
> --
> Digimer
> E-Mail:              digimer at alteeve.com
> Freenode handle:     digimer
> Papers and Projects: http://alteeve.com
> Node Assassin:       http://nodeassassin.org
> "omg my singularity battery is dead again.
> stupid hawking radiation." - epitron
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111230/89531f0f/attachment.htm>


More information about the Linux-cluster mailing list