<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type"> <title></title> </head> <body bgcolor="#ffffff" text="#000000"> Since it seemed to mount okay only 3mins earlier, can we assume that it was initially uncorrupted ? Or, is that not valid assumption ? Is there anything that we can check, test etc... any advice, action at this point is better than waiting for the next fileystem disaster to ocurr. Thanks -Sev Andreas Dilger wrote: <blockquote cite="mid20060413054056.GP17364@schatzie.adilger.int" type="cite"> <pre wrap="">On Apr 12, 2006 19:28 -0400, Sev Binello wrote: [HTML-only email] - it would be preferred if you used plain text, or at least multipart/mixed for your email to this list... </pre> <blockquote type="cite"> <pre wrap="">//soon as nfs clients start get a TON of errors like this Mar 26 00:07:19 acnlin82 kernel: EXT3-fs error (device sd(8,49)): ext3_free_blocks: Freeing blocks not in datazone - block = 3443589120, count = 1 Mar 26 00:07:19 acnlin82 kernel: EXT3-fs error (device sd(8,49)): ext3_free_blocks: Freeing blocks not in datazone - block = 2113834232, count = 1 Mar 26 00:07:22 acnlin82 kernel: EXT3-fs error (device sd(8,49)): ext3_free_blocks: bit already cleared for block 49125 </pre> </blockquote> <pre wrap=""> </pre> <blockquote type="cite"> <pre wrap="">//interspersed with some of these Mar 26 00:10:56 acnlin82 kernel: attempt to access beyond end of device Mar 26 00:10:56 acnlin82 kernel: 08:31: rw=0, want=1891463980, limit=1722264358 Mar 26 00:10:56 acnlin82 kernel: attempt to access beyond end of device Mar 26 00:10:56 acnlin82 kernel: 08:31: rw=0, want=1824250576, limit=1722264358 Mar 26 00:10:56 acnlin82 kernel: attempt to access beyond end of device </pre> </blockquote> <pre wrap=""> These indicate that the kernel ext3 code detected serious corruption of the metadata on the filesystem. In cases like this, if the filesystem doesn't remount readonly (i.e. mounted with "-o errors=remount-ro") then it just makes the corruption progressively worse. It doesn't point to a root cause, however. </pre> <blockquote type="cite"> <pre wrap="">Would it be a problem if the two 1.8TB systems appeared on one host? </pre> </blockquote> <pre wrap=""> No, some of our customers have hundreds of systems with two ext3 filesystems of about this size, running on 2.4.21-RHEL3 kernels. The LUNs exported from the RAID storage are all under 2TB. They have never reported similar problems over several years of usage. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. </pre> </blockquote> <pre class="moz-signature" cols="100">-- Sev Binello Brookhaven National Laboratory Upton, New York 631-344-5647 <a class="moz-txt-link-abbreviated" href="mailto:sev@bnl.gov">sev@bnl.gov</a> </pre> </body> </html>