[Linux-cluster] Freeze with cluster-2.03.11

Wendy Cheng s.wendy.cheng at gmail.com
Fri Apr 3 14:15:39 UTC 2009


Kadlecsik Jozsef wrote:
> On Thu, 2 Apr 2009, Wendy Cheng wrote:
>
>   
>>>> Kadlecsik Jozsef wrote:
>>>>     
>>>>         
>>>>> - commit 82d176ba485f2ef049fd303b9e41868667cebbdb
>>>>>   gfs_drop_inode as .drop_inode replacing .put_inode.
>>>>>   .put_inode was called without holding a lock, but .drop_inode
>>>>>   is called under inode_lock held. Might it be a problem
>>>>>       
>>>>>           
>> Based on code reading ...
>> 1. iput() gets inode_lock (a spin lock)
>> 2. iput() calls iput_final()
>> 3. iput_final() calls filesystem drop_inode(), followed by
>> generic_drop_inode()
>> 4. generic_drop_inode() unlock inode_lock after doing all sorts of fun things
>> with the inode
>>
>> So look to me that generic_drop_inode() statement within 
>> gfs_drop_inode() should be removed. Otherwise you would get double 
>> unlock and double list free.
>>     
>
> I think those function calls are right: iput_final calls either the 
> filesystem drop_inode function (in this case gfs_drop_inode) or 
> generic_drop_inode. There's no double call of generic_drop_inode. However 
> gfs_sync_page_i (and in turn filemap_fdatawrite and filemap_fdatawait) is 
> now called under inode_lock held and that was not so in previous versions.
> But I'm just speculating.
>   

It *is* called twice unless my eyes deceive me

static inline void iput_final(struct inode *inode)
{
const struct super_operations *op = inode->i_sb->s_op;
void (*drop)(struct inode *) = generic_drop_inode;

if (op && op->drop_inode)
drop = op->drop_inode; /* gfs call generic_drop_inode() */
drop(inode); /* second call into generic_drop_inode() again. */
}

>  
>   
>> In short, *remove* line #73 from gfs-kernel/src/gfs/ops_super.c in your 
>> source and let us know how it goes.
>>     
>
> I won't get a chance to start a test before Monday, sorry. 
>
>   

I'll be traveling next week as well. However, a few cautious words here:

Even this "fix" eventually solves your hang, running GFS on newer 
kernels with production system simply is *not* a good idea.

-- Wendy




More information about the Linux-cluster mailing list