<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
<title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
<br>
Hi -<br>
<br>
In case this helps,<br>
we got the following messages from EXT3 before the filesystem went<br>
Does anyone recognize these.....<br>
<br>
<b>//seems to mount okay </b><br>
<small> <small> Mar 25 17:52:30 acnlin82 kernel: EXT3 FS 2.4-0.9.19,
19 August 2002 on sd(8,33), internal journal <br>
Mar 25 17:52:30 acnlin82 kernel: EXT3-fs: recovery complete.</small><br>
<small> Mar 26 00:04:01 acnlin82 kernel: EXT3-fs: mounted filesystem
with ordered data mode.</small><br>
</small><small><big><b><br>
//soon as nfs clients start get a TON of errors like this</b></big><br>
<small>Mar 26 00:07:19 acnlin82 kernel: EXT3-fs error (device
sd(8,49)):
ext3_free_blocks: Freeing blocks not in datazone - block =
3443589120,
count = 1<br>
Mar 26 00:07:19 acnlin82 kernel: EXT3-fs error (device sd(8,49)):
ext3_free_blocks: Freeing blocks not in datazone - block = 2113834232,
count = 1<br>
Mar 26 00:07:22 acnlin82 kernel: EXT3-fs error (device sd(8,49)):
ext3_free_blocks: bit already cleared for block 49125</small><br>
<br>
<big><big><b>//interspersed with some of these </b></big></big><big><b><br>
</b></big><small>Mar 26 00:10:56 acnlin82 kernel: attempt to access
beyond end of device<br>
Mar 26 00:10:56 acnlin82 kernel: 08:31: rw=0, want=1891463980,
limit=1722264358<br>
Mar 26 00:10:56 acnlin82 kernel: attempt to access beyond end of device<br>
Mar 26 00:10:56 acnlin82 kernel: 08:31: rw=0, want=1824250576,
limit=1722264358<br>
Mar 26 00:10:56 acnlin82 kernel: attempt to access beyond end of device</small></small><br>
<br>
Then we had to reboot and basically filesystem is shot<br>
<br>
Thanks<br>
-Sev<br>
<br>
Sev Binello wrote:
<blockquote cite="mid443BDA84.7010102@bnl.gov" type="cite">Hi -
<br>
<br>
We have had 3 rather major occurances of ext3 filesystem corruption
lately,
<br>
i.e. so bad we couldn't event mount, and fsck didn't help.
<br>
<br>
I am looking for pointers, that could help us investigate the root
cause.
<br>
<br>
In general...
<br>
We are running RedHat WS 3 Update 6, 2.4.21-40.2.ELsmp or
2.4.21-37.ELsmp
<br>
<br>
We have a small SAN system that looks like this
<br>
3 NFS servers each containing 2 Qlocic hba's connected to
2 qlogic switches
<br>
connected to an nstor (now xyratex) 6TB raid system containing
2 (active-active) controllers.
<br>
<br>
On the first 2 occasions one of the controllers was failed over.
<br>
On a 3rd occasion both SAN switches lost power, and the hosts and
raid lost communication.
<br>
<br>
<br>
On all occasions the qlocic failover driver tried to start up on the
alternate HBA.
<br>
<br>
On the first 2 instances we sort of tried to blame the controller.
<br>
On the 3rd, that was harder to do since the raid system and the hosts
stayed up
<br>
but lost communication.
<br>
<br>
I can provide more detail if anyone as any info on how to proceed.
<br>
<br>
Thanks
<br>
-Sev
<br>
<br>
</blockquote>
<br>
<br>
<pre class="moz-signature" cols="100">--
Sev Binello
Brookhaven National Laboratory
Upton, New York
631-344-5647
<a class="moz-txt-link-abbreviated" href="mailto:sev@bnl.gov">sev@bnl.gov</a>
</pre>
</body>
</html>