jamesc at exa.com
Tue Sep 2 14:15:25 UTC 2008
On Tue, 2 Sep 2008, David Teigland wrote:
> On Mon, Sep 01, 2008 at 07:55:48PM -0400, James Chamberlain wrote:
>> Hi all,
>> Since I sent the below, the aforementioned cluster crashed. Now I
>> can't mount the scratch112 filesystem. Attempts to do so crash the
>> node trying to mount it. If I run gfs_fsck against it, I see the
>> # gfs_fsck -nv /dev/s12/scratch112
>> Initializing fsck
>> Initializing lists...
>> Initializing special inodes...
>> Validating Resource Group index.
>> Level 1 check.
>> 5834 resource groups found.
>> Setting block ranges...
>> Can't seek to last block in file system: 4969529913
>> Unable to determine the boundaries of the file system.
>> Freeing buffers.
>> Not being able to determine the boundaries of the file system seems
>> like a very bad thing. However, LVM didn't complain in the slightest
>> when I expanded the logical volume. How can I recover from this?
> Looks like the killed gfs_grow left your fs is a bad condition.
> I believe Bob Peterson has addressed that recently.
I think it was in a bad condition before I hit ^C rather than because I
did. As I mentioned, I was getting the lm_dlm_cancel messages before I hit
^C. But I'd agree that one way or another, the gfs_grow operation somehow
left the fs in a bad state.
>>> I'm trying to grow a GFS filesystem. I've grown this filesystem
>>> before and everything went fine. However, when I issued gfs_grow
>>> this time, I saw the following messages in my logs:
>>> Aug 29 21:04:13 s12n02 kernel: lock_dlm: lm_dlm_cancel 2,17 flags 80
>>> Aug 29 21:04:13 s12n02 kernel: lock_dlm: lm_dlm_cancel skip 2,17
>>> flags 100
>>> Aug 29 21:04:14 s12n02 kernel: lock_dlm: lm_dlm_cancel 2,17 flags 80
>>> Aug 29 21:04:14 s12n02 kernel: dlm: scratch112: (14239) dlm_unlock:
>>> 10241 busy 2
>>> Aug 29 21:04:14 s12n02 kernel: lock_dlm: lm_dlm_cancel rv -16 2,17
>>> flags 40080
>>> The last three lines of these log entries repeat themselves once a
>>> second until I hit ^C. The filesystem appears to still be up and
>>> accessible. Any thoughts on what's going on here and what I can do
>>> about it?
> Should be fixed by
Thanks Dave. Any idea if there's a corresponding patch for RHEL 4?
More information about the Linux-cluster