[Linux-cluster] Re: mkfs.gfs2 issue...

Fri Mar 23 19:06:40 UTC 2007

Nick Couchman wrote:
 > Bob,
> Thanks for the quick follow-up.  I gave this a shot, but here's the problem: the program seems to be hung on I/O, so it doesn't respond to any signals.  When I try to run gdb against the currently running PID, gdb hangs after the following line:
> Attaching to program: mkfs.gfs2, process 3349
> 
> and does nothing (sometimes it doesn't even get that far).  I can't use <Ctrl-C> to kill the mkfs process, nor does it respond to a kill with a signal 9.  Also, during this mkfs.gfs2 process, the processor is being used quite heavily (80-100%) by the kernel [scsi_wq_1].  This tends to hang up things like login shells pretty badly, and the only way to get it back is to reboot the machine (in case it matters, this is inside a VMware virtual machine).  I'm going to give a couple of other things a try (like changing the elevator so that it's nicer to other processes, hopefully) and see if I can get gdb to work, but so far no such luck.
> 
> Thanks!
> Nick 
Hi Nick,

Hm.  Right after it says "Statfs:" it's done building all the data
structures in buffers and it goes to write the data out to disk.
So it sounds like the low-level disk IO is somehow broken.

I suppose you could do:

strace mkfs.gfs2 ...

and it may tell us what io it's getting hung on, but I don't think
it will tell us much.  Based on what you're telling me, I suspect 
this isn't a problem with mkfs.gfs2 but rather something in a lower
layer.

The next thing I'd try is writing to the raw device.  In other
words, something like:

dd if=/dev/sdb1 of=/tmp/gronk bs=1M count=1

followed by the opposite:

dd if=/tmp/gronk of=/dev/sdb1 bs=1M count=1

If dd can write to the disk, then mkfs should be able to as well.

Regards,

Bob Peterson
Red Hat Cluster Suite