<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type"> <title></title> </head> <body bgcolor="#ffffff" text="#000000"> Hi - In case this helps, we got the following messages from EXT3 before the filesystem went Does anyone recognize these..... //seems to mount okay Mar 25 17:52:30 acnlin82 kernel: EXT3 FS 2.4-0.9.19, 19 August 2002 on sd(8,33), internal journal Mar 25 17:52:30 acnlin82 kernel: EXT3-fs: recovery complete. Mar 26 00:04:01 acnlin82 kernel: EXT3-fs: mounted filesystem with ordered data mode. <big> //soon as nfs clients start get a TON of errors like this</big> Mar 26 00:07:19 acnlin82 kernel: EXT3-fs error (device sd(8,49)): ext3_free_blocks: Freeing blocks not in datazone - block = 3443589120, count = 1 Mar 26 00:07:19 acnlin82 kernel: EXT3-fs error (device sd(8,49)): ext3_free_blocks: Freeing blocks not in datazone - block = 2113834232, count = 1 Mar 26 00:07:22 acnlin82 kernel: EXT3-fs error (device sd(8,49)): ext3_free_blocks: bit already cleared for block 49125 <big><big>//interspersed with some of these </big></big><big> </big>Mar 26 00:10:56 acnlin82 kernel: attempt to access beyond end of device Mar 26 00:10:56 acnlin82 kernel: 08:31: rw=0, want=1891463980, limit=1722264358 Mar 26 00:10:56 acnlin82 kernel: attempt to access beyond end of device Mar 26 00:10:56 acnlin82 kernel: 08:31: rw=0, want=1824250576, limit=1722264358 Mar 26 00:10:56 acnlin82 kernel: attempt to access beyond end of device Then we had to reboot and basically filesystem is shot Thanks -Sev Sev Binello wrote: <blockquote cite="mid443BDA84.7010102@bnl.gov" type="cite">Hi - We have had 3 rather major occurances of ext3 filesystem corruption lately, i.e. so bad we couldn't event mount, and fsck didn't help. I am looking for pointers, that could help us investigate the root cause. In general... We are running RedHat WS 3 Update 6, 2.4.21-40.2.ELsmp or 2.4.21-37.ELsmp We have a small SAN system that looks like this 3 NFS servers each containing 2 Qlocic hba's connected to 2 qlogic switches connected to an nstor (now xyratex) 6TB raid system containing 2 (active-active) controllers. On the first 2 occasions one of the controllers was failed over. On a 3rd occasion both SAN switches lost power, and the hosts and raid lost communication. On all occasions the qlocic failover driver tried to start up on the alternate HBA. On the first 2 instances we sort of tried to blame the controller. On the 3rd, that was harder to do since the raid system and the hosts stayed up but lost communication. I can provide more detail if anyone as any info on how to proceed. Thanks -Sev </blockquote> <pre class="moz-signature" cols="100">-- Sev Binello Brookhaven National Laboratory Upton, New York 631-344-5647 <a class="moz-txt-link-abbreviated" href="mailto:sev@bnl.gov">sev@bnl.gov</a> </pre> </body> </html>