[Linux-cluster] Freeze with cluster-2.03.11

Kadlecsik Jozsef kadlec at mail.kfki.hu
Tue Apr 7 14:01:08 UTC 2009


Hi, 

On Mon, 6 Apr 2009, Kadlecsik Jozsef wrote:

> On Sun, 5 Apr 2009, Wendy Cheng wrote:
> 
> > Based on code reading ...
> > 1. iput() gets inode_lock (a spin lock)
> > 2. iput() calls iput_final()
> > 3. iput_final() calls gfs_drop_inode() that calls
> >    generic_drop_inode()
> > 4. generic_drop_inode() unlocks inode_lock.
> > 
> > In theory, this logic violates the usage of spin lock as it is expected 
> > to lock for a short period of time but gfs_drop_inode() could take a 
> > while to finish. It has a blocking write page that need to make sure the 
> > data gets sync-ed to storage before it can returns. Make matter worse is 
> > that inode_lock is a global lock that could block non-GFS threads. One 
> > would think a quick fix is to drop the inode_lock at the beginning of 
> > gfs_drop_inode() and then re-acquires it after gfs sync the page. 
> > Unfortunately, inode_lock is not an exported symbol and GFS is an 
> > out-of-tree filesystem that has to be compiled as a kernel module. So 
> > this trick won't work for GFS.
> 
> Actually, it can work. inode_lock is not private and gfs can unlock/lock 
> it:
> 
> --- gfs-orig/ops_super.c	2009-01-22 13:33:51.000000000 +0100
> +++ gfs/ops_super.c	2009-04-06 13:07:06.000000000 +0200
> @@ -9,6 +9,7 @@
>  #include <linux/statfs.h>
>  #include <linux/seq_file.h>
>  #include <linux/mount.h>
> +#include <linux/writeback.h>
>  
>  #include "gfs.h"
>  #include "dio.h"
> @@ -68,8 +69,11 @@
>  	if (ip &&
>  	    !inode->i_nlink &&
>  	    S_ISREG(inode->i_mode) &&
> -	    !sdp->sd_args.ar_localcaching)
> +	    !sdp->sd_args.ar_localcaching) {
> +	    	spin_unlock(&inode_lock);
>  		gfs_sync_page_i(inode, DIO_START | DIO_WAIT);
> +		spin_lock(&inode_lock);
> +	}
>  	generic_drop_inode(inode);
>  }
> 
> Tomorrow I'll give it a try, there's no time to test it today.

I added the required 

EXPORT_SYMBOL(inode_lock);

line to fs/inode.c, recompiled the kernel and the modules.

Starting mailman in the test environment did not produce the almost 
instant freeze. I started/stopped mailman several times and the system 
worked just fine. So I believe the patch above and the plus line in 
fs/inode.c fix the reported problem. I dunno whether modifying 
fs/inode.c is acceptable or not...

Best regards,
Jozsef
--
E-mail : kadlec at mail.kfki.hu, kadlec at blackhole.kfki.hu
PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address: KFKI Research Institute for Particle and Nuclear Physics
         H-1525 Budapest 114, POB. 49, Hungary




More information about the Linux-cluster mailing list