ext3 filesystem corruption

Sev Binello sev at bnl.gov
Tue Apr 11 16:34:12 UTC 2006


Hi -

    We have had 3 rather major occurances of ext3 filesystem corruption 
lately,
    i.e. so bad we couldn't event mount, and fsck didn't help.

    I am looking for pointers, that could help us investigate the root 
cause.

    In general...
   
    We are running  RedHat WS 3 Update 6,   2.4.21-40.2.ELsmp or 
2.4.21-37.ELsmp

    We have a small SAN  system that looks like this
      
          3 NFS servers each containing 2 Qlocic hba's connected to 2 
qlogic switches
          connected to an nstor (now xyratex) 6TB raid system containing 
2 (active-active) controllers.

  On the first 2 occasions one of the controllers was failed over.
  On a 3rd occasion both SAN  switches lost power, and the hosts and raid lost communication.
  

  On all occasions the qlocic failover driver tried to start up on the alternate HBA.

  On the first 2 instances we sort of tried to blame the controller.
  On the 3rd, that was harder to do since the raid system and the hosts stayed up
  but lost communication.

  I can provide more detail if anyone as any info on how to proceed.

Thanks
-Sev

-- 

Sev Binello
Brookhaven National Laboratory
Upton, New York
631-344-5647
sev at bnl.gov




More information about the Ext3-users mailing list