[Cluster-devel] (no subject)

eric zren at suse.com
Tue Oct 13 10:07:14 UTC 2015


Hi David and list,

I'm working on ocfs2, and encountered an problem about dlm posix file lock.
After some investigation, I'd like to share information about it and get 
some
hints from you.

Environment:
    kernel: 3.12.47
    FS: OCFS2
    stack: pacemaker
    cluster: 2 testing nodes, node1, node2

Issue desc:
There is a deadlock test case for file lock in ocfs2 test suites. The 
deadlock test first prepare
an testing file1 on shared disk, then on node1 do "fcntl(file1, 
F_SETLKW, {F_WRLCK, SEEK_SET, 0, 0})"
, then on node2 set alarm(10s) and also  "fcntl(file1, F_SETLKW, 
{F_WRLCK, SEEK_SET, 0, 0})".
It expects alarm timeout to send SIGALRM, and wake up the sleep process, 
as "man fcntl"
says: "If a  signal  is  caught  while waiting,  then  the call is 
interrupted and (after the signal handler has returned)
returns immediately (with return value -1 and errno set to EINTR".

But, the process on node2 was in "Dl" state when using ps, and signal 
was blocked. So, the test case was hung for ever.

Investigations:
* Key debug infos:
process stack on node1:

n1:/opt/ocfs2-test/bin # cat /proc/22677/stack
[<ffffffff8104250b>] kvm_clock_get_cycles+0x1b/0x20
[<ffffffff810ba924>] __getnstimeofday+0x34/0xc0
[<ffffffff810ba9ba>] getnstimeofday+0xa/0x30
[<ffffffff811bb30d>] SyS_poll+0x5d/0xf0
[<ffffffff81529809>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

process stack on node2:
n2:~ # cat /proc/1534/stack
[<ffffffffa050fa65>] dlm_posix_lock+0x185/0x380 [dlm]
[<ffffffff811f39ce>] fcntl_setlk+0x12e/0x2d0
[<ffffffff811b8231>] SyS_fcntl+0x261/0x510
[<ffffffff81529809>] system_call_fastpath+0x16/0x1b
[<00007f3f5721eb42>] 0x7f3f5721eb42
[<ffffffffffffffff>] 0xffffffffffffffff

* dlm_posix_lock
Through adding printk and recompile dlm kernel module, where n2 is hung
has been located:
      dlm_posix_lock -> wait_event_killable
And wait_event_killable will put process into "TASK_KILLABLE" state which's like
"UNINTERRUPTABLE" but can be waked up by fatal signals. I did some tests, SIGTERM
can did it, but SIGALRM cannot.

Did this go against posix file lock semanteme? Any hints would be very appreciated!
I can provide any infos as I can if needed;-)

Thanks,
Eric





More information about the Cluster-devel mailing list