[Linux-cluster] GFS2: processes stuck in "just schedule"

Allen Belletti allen at isye.gatech.edu
Thu Dec 3 22:30:29 UTC 2009

Hi All,

After Steve and the RedHat guys dug into my nasty crashdump (thanks 
all!), I believe I'm down to the last GFS2 problem on our mail cluster, 
but it's a common one.

I've always had trouble with processes getting stuck on GFS2 access and 
queuing up.  Since the 5.4 upgrade and moving the proper GFS2 kernel 
module, it's changed but not gone away.  Ever few days now, I'm seeing 
processes getting stuck with WCHAN=just_schedule.  Once this starts 
happening, both cluster nodes will accumulate them rapidly which 
eventually brings IO to a halt.  The only way I've found to escape is 
via a reboot, sometimes of one, sometimes of both nodes.

Since there's no crash, I don't get any useful debug information.  
Outside of this one repeating glitch, performance is great and all is 
well.  If anyone can suggest ways of gathering more data about the 
problem, or possible solutions, I would be grateful.


