[Linux-cluster] F_SETLK fails after recovery

Neale Ferguson neale at sinenomine.net
Mon Sep 8 14:44:49 UTC 2014


Further to the problem described last week. What I'm seeing is that the node (NODE2) that keeps going when NODE1 fails has many entries in dlm_tool log_plocks output:

1410147734 lvclusdidiz0360 receive plock 10303 LK WR 0-7fffffffffffffff 1/8112/13d5000 w 0
1410147734 lvclusdidiz0360 receive plock 1305c LK WR 0-7fffffffffffffff 1/8147/1390400 w 0
1410147734 lvclusdidiz0360 receive plock 50081 LK WR 0-7fffffffffffffff 1/8182/7ce04400 w 0
1410147736 lvclusdidiz0360 receive plock 10303 LK WR 0-7fffffffffffffff 1/8112/13d5000 w 0
1410147736 lvclusdidiz0360 receive plock 1305c LK WR 0-7fffffffffffffff 1/8147/1390400 w 0
1410147736 lvclusdidiz0360 receive plock 50081 LK WR 0-7fffffffffffffff 1/8182/7ce04400 w 0
1410147738 lvclusdidiz0360 receive plock 10303 LK WR 0-7fffffffffffffff 1/8112/13d5000 w 0
1410147738 lvclusdidiz0360 receive plock 1305c LK WR 0-7fffffffffffffff 1/8147/1390400 w 0
1410147738 lvclusdidiz0360 receive plock 50081 LK WR 0-7fffffffffffffff 1/8182/7ce04400 w 0
1410147740 lvclusdidiz0360 receive plock 10303 LK WR 0-7fffffffffffffff 1/8112/13d5000 w 0
1410147740 lvclusdidiz0360 receive plock 1305c LK WR 0-7fffffffffffffff 1/8147/1390400 w 0
1410147740 lvclusdidiz0360 receive plock 50081 LK WR 0-7fffffffffffffff 1/8182/7ce04400 w 0
1410147742 lvclusdidiz0360 receive plock 10303 LK WR 0-7fffffffffffffff 1/8112/13d5000 w 0
1410147742 lvclusdidiz0360 receive plock 1305c LK WR 0-7fffffffffffffff 1/8147/1390400 w 0
1410147742 lvclusdidiz0360 receive plock 50081 LK WR 0-7fffffffffffffff 1/8182/7ce04400 w 0

i.e. with no corresponding unlock entry. NODE1 is brought down by init 6 and when it restarts it gets as far as "Starting cman" before NODE2 fences it (I assume we need a higher post_join_delay). When the node is fenced I see:

1410147774 clvmd purged 0 plocks for 1
1410147774 lvclusdidiz0360 purged 3 plocks for 1

So it looks like it tried to some clean up but then when NODE1 attempts to join NODE2 examines the lockspace and reports the following:

1410147820 lvclusdidiz0360 wr sect ro 0 rf 0 len 40 "r78067.0"
1410147820 lvclusdidiz0360 wr sect ro 0 rf 0 len 40 "r78068.0"
1410147820 lvclusdidiz0360 wr sect ro 0 rf 0 len 40 "r78059.0"
1410147820 lvclusdidiz0360 wr sect ro 0 rf 0 len 40 "r88464.0"
1410147820 lvclusdidiz0360 wr sect ro 0 rf 0 len 40 "r88478.0"
1410147820 lvclusdidiz0360 store_plocks first 66307 last 88478 r_count 45 p_count 63 sig 5ab0
1410147820 lvclusdidiz0360 receive_plocks_stored 2:8 flags a sig 5ab0 need_plocks 0

So it believes NODE1 will have 45 plocks to process when it comes back. NODE1 receives that plock information: 

1410147820 lvclusdidiz0360 set_plock_ckpt_node from 0 to 2
1410147820 lvclusdidiz0360 receive_plocks_stored 2:8 flags a sig 5ab0 need_plocks 1

However, when NODE1 attempts to retrieve plocks it reports:

1410147820 lvclusdidiz0360 retrieve_plocks
1410147820 lvclusdidiz0360 retrieve_plocks first 0 last 0 r_count 0 p_count 0 sig 0

Because of the mismatch between sig 0 and sig 5ab0 plocks get disabled and the F_SETLK operation on the gfs2 target will fail on NODE1.

I'm am try to understand the checkpointing process and from where this information is actually being retrieved.

Neale





More information about the Linux-cluster mailing list