[Linux-cluster] Freeze with cluster-2.03.11
Kadlecsik Jozsef
kadlec at mail.kfki.hu
Tue Apr 7 14:01:08 UTC 2009
Hi,
On Mon, 6 Apr 2009, Kadlecsik Jozsef wrote:
> On Sun, 5 Apr 2009, Wendy Cheng wrote:
>
> > Based on code reading ...
> > 1. iput() gets inode_lock (a spin lock)
> > 2. iput() calls iput_final()
> > 3. iput_final() calls gfs_drop_inode() that calls
> > generic_drop_inode()
> > 4. generic_drop_inode() unlocks inode_lock.
> >
> > In theory, this logic violates the usage of spin lock as it is expected
> > to lock for a short period of time but gfs_drop_inode() could take a
> > while to finish. It has a blocking write page that need to make sure the
> > data gets sync-ed to storage before it can returns. Make matter worse is
> > that inode_lock is a global lock that could block non-GFS threads. One
> > would think a quick fix is to drop the inode_lock at the beginning of
> > gfs_drop_inode() and then re-acquires it after gfs sync the page.
> > Unfortunately, inode_lock is not an exported symbol and GFS is an
> > out-of-tree filesystem that has to be compiled as a kernel module. So
> > this trick won't work for GFS.
>
> Actually, it can work. inode_lock is not private and gfs can unlock/lock
> it:
>
> --- gfs-orig/ops_super.c 2009-01-22 13:33:51.000000000 +0100
> +++ gfs/ops_super.c 2009-04-06 13:07:06.000000000 +0200
> @@ -9,6 +9,7 @@
> #include <linux/statfs.h>
> #include <linux/seq_file.h>
> #include <linux/mount.h>
> +#include <linux/writeback.h>
>
> #include "gfs.h"
> #include "dio.h"
> @@ -68,8 +69,11 @@
> if (ip &&
> !inode->i_nlink &&
> S_ISREG(inode->i_mode) &&
> - !sdp->sd_args.ar_localcaching)
> + !sdp->sd_args.ar_localcaching) {
> + spin_unlock(&inode_lock);
> gfs_sync_page_i(inode, DIO_START | DIO_WAIT);
> + spin_lock(&inode_lock);
> + }
> generic_drop_inode(inode);
> }
>
> Tomorrow I'll give it a try, there's no time to test it today.
I added the required
EXPORT_SYMBOL(inode_lock);
line to fs/inode.c, recompiled the kernel and the modules.
Starting mailman in the test environment did not produce the almost
instant freeze. I started/stopped mailman several times and the system
worked just fine. So I believe the patch above and the plus line in
fs/inode.c fix the reported problem. I dunno whether modifying
fs/inode.c is acceptable or not...
Best regards,
Jozsef
--
E-mail : kadlec at mail.kfki.hu, kadlec at blackhole.kfki.hu
PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address: KFKI Research Institute for Particle and Nuclear Physics
H-1525 Budapest 114, POB. 49, Hungary
More information about the Linux-cluster
mailing list