<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
<title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
<br>
Since it seemed to mount okay only 3mins earlier,<br>
can we assume that it was initially uncorrupted ?<br>
Or, is that not valid assumption ?<br>
<br>
Is there anything that we can check, test etc...<br>
any advice, action at this point is better than waiting for the next
fileystem disaster to ocurr.<br>
<br>
Thanks<br>
-Sev<br>
<br>
Andreas Dilger wrote:
<blockquote cite="mid20060413054056.GP17364@schatzie.adilger.int"
type="cite">
<pre wrap="">On Apr 12, 2006 19:28 -0400, Sev Binello wrote:
[HTML-only email] - it would be preferred if you used plain text, or at
least multipart/mixed for your email to this list...
</pre>
<blockquote type="cite">
<pre wrap="">//soon as nfs clients start get a TON of errors like this
Mar 26 00:07:19 acnlin82 kernel: EXT3-fs error (device sd(8,49)):
ext3_free_blocks: Freeing blocks not in datazone - block = 3443589120, count = 1
Mar 26 00:07:19 acnlin82 kernel: EXT3-fs error (device sd(8,49)):
ext3_free_blocks: Freeing blocks not in datazone - block = 2113834232, count = 1
Mar 26 00:07:22 acnlin82 kernel: EXT3-fs error (device sd(8,49)):
ext3_free_blocks: bit already cleared for block 49125
</pre>
</blockquote>
<pre wrap=""><!---->
</pre>
<blockquote type="cite">
<pre wrap="">//interspersed with some of these
Mar 26 00:10:56 acnlin82 kernel: attempt to access beyond end of device
Mar 26 00:10:56 acnlin82 kernel: 08:31: rw=0, want=1891463980, limit=1722264358
Mar 26 00:10:56 acnlin82 kernel: attempt to access beyond end of device
Mar 26 00:10:56 acnlin82 kernel: 08:31: rw=0, want=1824250576, limit=1722264358
Mar 26 00:10:56 acnlin82 kernel: attempt to access beyond end of device
</pre>
</blockquote>
<pre wrap=""><!---->
These indicate that the kernel ext3 code detected serious corruption of the
metadata on the filesystem. In cases like this, if the filesystem doesn't
remount readonly (i.e. mounted with "-o errors=remount-ro") then it just
makes the corruption progressively worse.
It doesn't point to a root cause, however.
</pre>
<blockquote type="cite">
<pre wrap="">Would it be a problem if the two 1.8TB systems appeared on one host?
</pre>
</blockquote>
<pre wrap=""><!---->
No, some of our customers have hundreds of systems with two ext3 filesystems
of about this size, running on 2.4.21-RHEL3 kernels. The LUNs exported from
the RAID storage are all under 2TB. They have never reported similar problems
over several years of usage.
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.
</pre>
</blockquote>
<br>
<br>
<pre class="moz-signature" cols="100">--
Sev Binello
Brookhaven National Laboratory
Upton, New York
631-344-5647
<a class="moz-txt-link-abbreviated" href="mailto:sev@bnl.gov">sev@bnl.gov</a>
</pre>
</body>
</html>