[Linux-cluster] Freeze with cluster-2.03.11

Wendy Cheng s.wendy.cheng at gmail.com
Sat Mar 28 03:36:16 UTC 2009

> I should get some sleep - but can't it be that I hit the potential
> deadlock mentioned here:

Please take my observation with a grain of salt (as I don't have Linux
source code in front of me to check the exact locking sequence, nor can I
afford spending time on this) ...

I don't see a strong evidence of deadlock (but it could) from the thread
backtraces However, assuming the cluster worked before, you could have
overloaded the e1000 driver in this case. There are suspicious page faults
but memory is very "ok". So one possibility is that GFS had generated too
many sync requests that flooded the e1000. As the result, the cluster heart
beat missed its interval. Do you have the same ethernet card for both AOE
and cluster traffic ? If yes, seperate them to see how it goes. And of
course, if you don't have Ben's mmap patch (as you described in your post),
it is probably a good idea to get it into your gfs-kmod.

But honestly,  I think running GFS1 on newer kernels is a bad idea.

-- Wendy

> commit  4787e11dc7831f42228b89ba7726fd6f6901a1e3
> gfs-kmod: workaround for potential deadlock. Prefault user pages
> The bug uncovered in 461770 does not seem fixable without a massive
> change to how gfs works.  There is a lock ordering mismatch between
> the process address space lock and the glocks. The only good way to
> avoid this in all cases is to not hold the glock for so long, which
> is what gfs2 does. This is impossible without completely changing
> how gfs does locking.  Fortunately, this is only a problem when you
> have multiple processes sharing an address space, and are doing IO
> to a gfs file with a userspace buffer that's part of an mmapped gfs
> file. In this case, prefaulting the buffer's pages immediately
> before acquiring the glocks significantly shortens the window for
> this deadlock. Closing the window any more causes a large
> performance hit.
> Mailman do mmap files...
> Best regards,
> Jozsef
> --
> E-mail : kadlec at mail.kfki.hu, kadlec at blackhole.kfki.hu
> PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt<http://www.kfki.hu/%7Ekadlec/pgp_public_key.txt>
> Address: KFKI Research Institute for Particle and Nuclear Physics
>         H-1525 Budapest 114, POB. 49, Hungary
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090327/6a405618/attachment.htm>

More information about the Linux-cluster mailing list