[Linux-cluster] F_SETLK fails after recovery

Mon Sep 8 14:44:49 UTC 2014

Further to the problem described last week. What I'm seeing is that the node (NODE2) that keeps going when NODE1 fails has many entries in dlm_tool log_plocks output:

1410147734 lvclusdidiz0360 receive plock 10303 LK WR 0-7fffffffffffffff 1/8112/13d5000 w 0
1410147734 lvclusdidiz0360 receive plock 1305c LK WR 0-7fffffffffffffff 1/8147/1390400 w 0
1410147734 lvclusdidiz0360 receive plock 50081 LK WR 0-7fffffffffffffff 1/8182/7ce04400 w 0
1410147736 lvclusdidiz0360 receive plock 10303 LK WR 0-7fffffffffffffff 1/8112/13d5000 w 0
1410147736 lvclusdidiz0360 receive plock 1305c LK WR 0-7fffffffffffffff 1/8147/1390400 w 0
1410147736 lvclusdidiz0360 receive plock 50081 LK WR 0-7fffffffffffffff 1/8182/7ce04400 w 0
1410147738 lvclusdidiz0360 receive plock 10303 LK WR 0-7fffffffffffffff 1/8112/13d5000 w 0
1410147738 lvclusdidiz0360 receive plock 1305c LK WR 0-7fffffffffffffff 1/8147/1390400 w 0
1410147738 lvclusdidiz0360 receive plock 50081 LK WR 0-7fffffffffffffff 1/8182/7ce04400 w 0
1410147740 lvclusdidiz0360 receive plock 10303 LK WR 0-7fffffffffffffff 1/8112/13d5000 w 0
1410147740 lvclusdidiz0360 receive plock 1305c LK WR 0-7fffffffffffffff 1/8147/1390400 w 0
1410147740 lvclusdidiz0360 receive plock 50081 LK WR 0-7fffffffffffffff 1/8182/7ce04400 w 0
1410147742 lvclusdidiz0360 receive plock 10303 LK WR 0-7fffffffffffffff 1/8112/13d5000 w 0
1410147742 lvclusdidiz0360 receive plock 1305c LK WR 0-7fffffffffffffff 1/8147/1390400 w 0
1410147742 lvclusdidiz0360 receive plock 50081 LK WR 0-7fffffffffffffff 1/8182/7ce04400 w 0

i.e. with no corresponding unlock entry. NODE1 is brought down by init 6 and when it restarts it gets as far as "Starting cman" before NODE2 fences it (I assume we need a higher post_join_delay). When the node is fenced I see:

1410147774 clvmd purged 0 plocks for 1
1410147774 lvclusdidiz0360 purged 3 plocks for 1

So it looks like it tried to some clean up but then when NODE1 attempts to join NODE2 examines the lockspace and reports the following:

1410147820 lvclusdidiz0360 wr sect ro 0 rf 0 len 40 "r78067.0"
1410147820 lvclusdidiz0360 wr sect ro 0 rf 0 len 40 "r78068.0"
1410147820 lvclusdidiz0360 wr sect ro 0 rf 0 len 40 "r78059.0"
1410147820 lvclusdidiz0360 wr sect ro 0 rf 0 len 40 "r88464.0"
1410147820 lvclusdidiz0360 wr sect ro 0 rf 0 len 40 "r88478.0"
1410147820 lvclusdidiz0360 store_plocks first 66307 last 88478 r_count 45 p_count 63 sig 5ab0
1410147820 lvclusdidiz0360 receive_plocks_stored 2:8 flags a sig 5ab0 need_plocks 0

So it believes NODE1 will have 45 plocks to process when it comes back. NODE1 receives that plock information: 

1410147820 lvclusdidiz0360 set_plock_ckpt_node from 0 to 2
1410147820 lvclusdidiz0360 receive_plocks_stored 2:8 flags a sig 5ab0 need_plocks 1

However, when NODE1 attempts to retrieve plocks it reports:

1410147820 lvclusdidiz0360 retrieve_plocks
1410147820 lvclusdidiz0360 retrieve_plocks first 0 last 0 r_count 0 p_count 0 sig 0

Because of the mismatch between sig 0 and sig 5ab0 plocks get disabled and the F_SETLK operation on the gfs2 target will fail on NODE1.

I'm am try to understand the checkpointing process and from where this information is actually being retrieved.

Neale