[Cluster-devel] [PATCH 1/2] gfs2: Fix occasional glock use-after-free

Bob Peterson rpeterso at redhat.com
Fri Feb 1 14:51:21 UTC 2019


Hi Ross,

----- Original Message -----
> Do you have any suggestions for tracking down the root cause?

One time, when I had a similar problem in rhel7, and couldn't use
kernel tracing because there were millions of glocks involved.
The trace was too huge and quickly swamped the biggest possible
kernel trace buffer. So I ended up writing this ugly, hacky patch
that's attached. Perhaps you can use it as a starting point.

The idea is: every time there's a get or a put to a glock, it
saves off a 1-byte identifier of what function did the get/put.
It saved it in a new 64-byte field kept for each glock, which of
course meant the slab became much bigger, but it was never meant
to be shipped, right?

Then, when the problem occurred, it would dump out the problematic
glock, including the 64-byte get/put history value.
Then I would go through it and identify the history of what went
wrong.

Since this is a fairly old (2015) patch that targets an old rhel7,
it will obviously need a lot of updating to get it to work, but
it might work better than the kernel tracing, depending on how
many glocks are involved in your test.

Regards,

Bob Peterson
Red Hat File Systems
-------------- next part --------------
A non-text attachment was scrubbed...
Name: get_put.patch
Type: text/x-patch
Size: 25569 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/cluster-devel/attachments/20190201/4bb1e1c2/attachment.bin>


More information about the Cluster-devel mailing list