[Linux-cluster] Cluster1, RHEL4 gfs_fsck

Wes Young wcyoung at buffalo.edu
Tue May 20 17:18:11 UTC 2008


On May 20, 2008, at 12:40 PM, Bob Peterson wrote:

> On Mon, 2008-05-19 at 16:18 -0400, Wes Young wrote:
>> I'm having a little trouble with an older installation of RHEL4,
>> cluster/GFS.
>>
>> One of my cluster nodes crashed the other day, when I brought it back
>> up I got a the error:
>>
>> GFS: Trying to join cluster "lock_dlm", "oss:mydisk"
>> GFS: fsid=oss:mydisk.0: Joined cluster. Now mounting FS...
>> GFS: fsid=oss:mydisk.0: jid=0: Trying to acquire journal lock...
>> GFS: fsid=oss:mydisk.0: jid=0: Looking at journal...
>> attempt to access beyond end of device
>> sdb: rw=0, want=19149432840, limit=858673152
>> GFS: fsid=oss:mydisk.0: fatal: I/O error
>
> Hi Wes,
>
> Sorry for the long post, but this needs some explanation.
>
>> From your email, it sounds like you have corruption in your
> resource group index file (rindex).  You might be the victim
> of this bug:
> https://bugzilla.redhat.com/show_bug.cgi?id=436383
>
> If so, there's a fix to gfs_fsck to repair the damage.  This is
> associated with this bug record:
> https://bugzilla.redhat.com/show_bug.cgi?id=440896
>
> While working on that bug, I discovered some kinds of
> corruption that confuse the gfs_fsck's rindex repair code.
> That's described in bug:
> https://bugzilla.redhat.com/show_bug.cgi?id=442271
>
> I don't think any of these fixes are generally available
> yet, except in patch form; I think they're scheduled for
> 4.7.  The last one, 442271, is only written against RHEL5
> at the moment, so I don't have plans to fix it in RHEL4 yet.
>
> So here's what I recommend:
>
> First, determine for sure if this is the problem by doing
> something like this:
>
> mount the file system
> gfs_tool rindex /mnt/gfs | grep "4294967292"
> (there /mnt/gfs is your mount point)
> umount the file system


That's the problem though, it won't actually let me mount the "disk"  
because of this problem.

Sounds like my best option is to try and patch the gfs_fsck code in  
RHE4 and see if it still seg-faults on me...

If that doesn't work, i'm guessing a move to RHEL5 would be the next  
step, but given the actual value of the data, probably not worth it at  
this point.

Thanks for the info. I'll let you know how it goes.
--
Wes Young
Network Security Analyst
CIT - University at Buffalo
  -----------------------------------------------
| my OpenID:        | http://tinyurl.com/2zu2d3 |
  -----------------------------------------------







-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2421 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080520/a57fdb38/attachment.p7s>


More information about the Linux-cluster mailing list