From lists at nerdbynature.de Sat Sep 1 10:04:32 2007 From: lists at nerdbynature.de (Christian Kujau) Date: Sat, 1 Sep 2007 12:04:32 +0200 (CEST) Subject: ext3-fs error with RAID 5 Array. In-Reply-To: <564417.24036.qm@web34711.mail.mud.yahoo.com> References: <564417.24036.qm@web34711.mail.mud.yahoo.com> Message-ID: On Wed, 29 Aug 2007, Kannan Raghuprasath wrote: > EXT3-fs error (device sdb1): ext3_add_entry: bad entry > in directory #2: directory entry across blocks - > offset=1080, inode=135216, rec_len=4132, name_len=25 > Aborting journal on device sdb1. > ext3_abort called. > EXT3-fs error (device sdb1): ext3_journal_start_sb: > Detected aborted journal Proably too late, but: are there any device-related errors in the log? Have you checked the device for errors? (eg. a simple dd if=/dev/sdb of=/dev/null should do). If the device (and the cabling) is fine, did you try fsck.ext3 yet? If so, what does it say? C. -- BOFH excuse #261: The Usenet news is out of date From lists at nerdbynature.de Mon Sep 3 06:46:40 2007 From: lists at nerdbynature.de (Christian Kujau) Date: Mon, 3 Sep 2007 08:46:40 +0200 (CEST) Subject: ext3-fs error with RAID 5 Array. In-Reply-To: <775678.54481.qm@web34706.mail.mud.yahoo.com> References: <775678.54481.qm@web34706.mail.mud.yahoo.com> Message-ID: On Sun, 2 Sep 2007, Kannan Raghuprasath wrote: > After fsck.ext3 i am able to mount the device again Wow, fsck did quite a few things, you probably checked /lost+found for missing data. Did you fsck again after fsck fixed things? > but after anther 20 hours of backup i am seeing the > same error again and partition remounts itself as > read-only. Unless you're hitting some crude, new, unidentified bug in ext3, I still think the corruption is hardware related. But the logs are clean, you say. Hm, can you try a current (vanilla) kernel? Lots of stuff happened since 2.6.11 (03/2005). Not being an expert, only CONFIG_LBD and CONFIG_LSF comes to mind, but I assume the FC kernel has set these. Christian. -- BOFH excuse #378: Operators killed by year 2000 bug bite. From jan.stobbe at netropol.de Mon Sep 3 06:54:47 2007 From: jan.stobbe at netropol.de (Jan Stobbe) Date: Mon, 03 Sep 2007 08:54:47 +0200 Subject: ext3-fs error with RAID 5 Array. In-Reply-To: <564417.24036.qm@web34711.mail.mud.yahoo.com> References: <564417.24036.qm@web34711.mail.mud.yahoo.com> Message-ID: <46DBAFB7.8030803@netropol.de> Hi Kannan, we had similar problems with ext3. After a month of searching for the error we changed the mainboard of the server. The maiboard was to old to work properly with the new areca raid controler we used. Without heavy load the device was o.k. Best regards, Jan > Hi, > > I have a Fedora core 4 machine (kernel- > 2.6.11-1.1369_FC4smp) conneted to external DAS using > Ultra 320 SCSI controller card. The DAS is configured > as RAID 5 of 3.5 TB. I partitioned this array into two > partitions of size 1.9TB and 1.6 TB. These are > assigned to 2 different LUNS so that these two appear > as two partitions in the my Linux machine. > > I use this DAS for backup and after backing up for 30 > hours i observe that partition gets remounted as > read-only with following error message in dmesg, > > EXT3-fs error (device sdb1): ext3_add_entry: bad entry > in directory #2: directory entry across blocks - > offset=1080, inode=135216, rec_len=4132, name_len=25 > Aborting journal on device sdb1. > ext3_abort called. > EXT3-fs error (device sdb1): ext3_journal_start_sb: > Detected aborted journal > Remounting filesystem read-only > EXT3-fs error (device sdb1) in start_transaction: > Journal has aborted > EXT3-fs error (device sdb1) in ext3_create: IO failure > EXT3-fs error (device sdb1): ext3_readdir: bad entry > in directory #2: directory entry across blocks - > offset=1080, inode=135216, rec_len=4132, name_len=25 > EXT3-fs error (device sdb1): ext3_readdir: bad entry > in directory #2: directory entry across blocks - > offset=1080, inode=135216, rec_len=4132, name_len=25 > EXT3-fs error (device sdb1): ext3_readdir: bad entry > in directory #2: directory entry across blocks - > offset=1080, inode=135216, rec_len=4132, name_len=25 > EXT3-fs error (device sdb1): ext3_readdir: bad entry > in directory #2: directory entry across blocks - > offset=1080, inode=135216, rec_len=4132, name_len=25 > EXT3-fs error (device sdb1): ext3_readdir: bad entry > in directory #2: directory entry across blocks - > offset=1080, inode=135216, rec_len=4132, name_len=25 > > > Any suggestions on how to solve this problem? > > Best regards and thanks in advance for your help, > > Raghu > > > > > ____________________________________________________________________________________ > Building a website is a piece of cake. Yahoo! Small Business gives you all the tools to get online. > http://smallbusiness.yahoo.com/webhosting > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users -- Jan Stobbe Netropol Digitale Systeme GmbH jan.stobbe at netropol.de Stresemannstrasse 161 Tel: +49 40 284167-20 D-22769 Hamburg/Germany Fax: +49 40 284167-40 http://www.netropol.de/ Registergericht: Gesch?ftsf?hrer: Handelsregister Hamburg, HRB 75989 Jan Stobbe, Karl-Heinz Dellwo From raghuprasath at yahoo.com Mon Sep 3 06:06:16 2007 From: raghuprasath at yahoo.com (Kannan Raghuprasath) Date: Sun, 2 Sep 2007 23:06:16 -0700 (PDT) Subject: ext3-fs error with RAID 5 Array. In-Reply-To: Message-ID: <757952.48387.qm@web34712.mail.mud.yahoo.com> Hi, thanks for the reply. there doesn't seem to be any device related error... I did try fsck.ext3 and here is the ouput .. ========================================================= e2fsck 1.37 (21-Mar-2005) /dev/sdb1: recovering journal /dev/sdb1 contains a file system with errors, check forced. Pass 1: Checking inodes, blocks, and sizes Inodes that were part of a corrupted orphan linked list found. Fix? yes Inode 23150593 was part of the orphaned inode list. FIXED. Inode 23258292 was part of the orphaned inode list. FIXED. yInode 23365847 was part of the orphaned inode list. FIXED. Inode 23365848 was part of the orphaned inode list. FIXED. Inode 23645369 was part of the orphaned inode list. FIXED. Inode 23645370 was part of the orphaned inode list. FIXED. Inode 23645371 was part of the orphaned inode list. FIXED. Inode 23645372 was part of the orphaned inode list. FIXED. Deleted inode 23989501 has zero dtime. Fix? yes Inode 23989502 was part of the orphaned inode list. FIXED. Inode 23989503 was part of the orphaned inode list. FIXED. Deleted inode 23989504 has zero dtime. Fix? yes Deleted inode 24068705 has zero dtime. Fix? yes Inode 24068706 was part of the orphaned inode list. FIXED. Inode 24068707 was part of the orphaned inode list. FIXED. Inode 24068708 was part of the orphaned inode list. FIXED. Inode 24154261 was part of the orphaned inode list. FIXED. Inode 24154262 was part of the orphaned inode list. FIXED. Inode 24154263 was part of the orphaned inode list. FIXED. Inode 24154264 was part of the orphaned inode list. FIXED. Inode 24319381 was part of the orphaned inode list. FIXED. Inode 24319382 was part of the orphaned inode list. FIXED. Inode 24319383 was part of the orphaned inode list. FIXED. Inode 24319384 was part of the orphaned inode list. FIXED. Inode 24391269 was part of the orphaned inode list. FIXED. Inode 24391270 was part of the orphaned inode list. FIXED. Inode 24391271 was part of the orphaned inode list. FIXED. Inode 24391272 was part of the orphaned inode list. FIXED. Inode 24469205 was part of the orphaned inode list. FIXED. Inode 24469206 was part of the orphaned inode list. FIXED. Inode 24469207 was part of the orphaned inode list. FIXED. Inode 24469208 was part of the orphaned inode list. FIXED. Inode 24699129 was part of the orphaned inode list. FIXED. Inode 24699130 was part of the orphaned inode list. FIXED. Inode 24699131 was part of the orphaned inode list. FIXED. Inode 24699132 was part of the orphaned inode list. FIXED. Inode 24784933 was part of the orphaned inode list. FIXED. Inode 24784934 was part of the orphaned inode list. FIXED. Inode 24784935 was part of the orphaned inode list. FIXED. Deleted inode 24784936 has zero dtime. Fix? yes Inode 24842705 was part of the orphaned inode list. FIXED. Inode 24842706 was part of the orphaned inode list. FIXED. Inode 24842707 was part of the orphaned inode list. FIXED. Inode 24842708 was part of the orphaned inode list. FIXED. Inode 67857521 was part of the orphaned inode list. FIXED. Inode 68381037 was part of the orphaned inode list. FIXED. Inode 68509537 was part of the orphaned inode list. FIXED. Inode 68582341 was part of the orphaned inode list. FIXED. Inode 69041813 was part of the orphaned inode list. FIXED. Inode 69119221 was part of the orphaned inode list. FIXED. Inode 69369641 was part of the orphaned inode list. FIXED. Inode 69369642 was part of the orphaned inode list. FIXED. Inode 69369643 was part of the orphaned inode list. FIXED. Inode 69369644 was part of the orphaned inode list. FIXED. Inode 134340585 was part of the orphaned inode list. FIXED. Inode 134434341 was part of the orphaned inode list. FIXED. Inode 134714021 was part of the orphaned inode list. FIXED. Inode 134835666 was part of the orphaned inode list. FIXED. Inode 136770309 was part of the orphaned inode list. FIXED. Inode 136778097 was part of the orphaned inode list. FIXED. Inode 138026013 was part of the orphaned inode list. FIXED. Inode 144104178 was part of the orphaned inode list. FIXED. Inode 152740634 was part of the orphaned inode list. FIXED. Deleted inode 229911998 has zero dtime. Fix? yes Deleted inode 229912000 has zero dtime. Fix? yes Deleted inode 235725049 has zero dtime. Fix? yes Deleted inode 235725050 has zero dtime. Fix? yes Deleted inode 235725052 has zero dtime. Fix? yes Inode 237667897 was part of the orphaned inode list. FIXED. Inode 242319975 was part of the orphaned inode list. FIXED. Inode 242319976 was part of the orphaned inode list. FIXED. Duplicate blocks found... invoking duplicate block passes. Pass 1B: Rescan for duplicate/bad blocks Duplicate/bad block(s) in inode 16524: 32455728 32455830 32472317 Duplicate/bad block(s) in inode 16534: 32488498 32488564 32488689 32504895 32505011 32505013 32505022 Duplicate/bad block(s) in inode 16574: 40774646 40827888 40827889 40827890 40827891 40827892 40827893 40827901 40827902 40827903 Duplicate/bad block(s) in inode 16994: 74327918 74381301 74381302 74381303 74381304 74381305 74381306 74381307 74381308 74381309 Duplicate/bad block(s) in inode 2113541: 4657478 4661666 4674020 4710821 Duplicate/bad block(s) in inode 2113551: 4714992 4714993 4714994 4714995 4714996 4714997 4714998 4714999 4715000 4715001 4715002 4715003 4715004 4715005 4715006 4715007 4715248 4715249 4715250 4715251 4715252 Duplicate/bad block(s) in inode 2113601: 7206128 7206129 7206130 7206131 7206138 7206139 7206140 7206141 7206142 7206143 Duplicate/bad block(s) in inode 2113621: 8515824 8515825 8515826 8515827 8515828 8515829 8515830 8515837 8515838 8515839 Duplicate/bad block(s) in inode 16793627: 35000268 35000283 35004238 35012482 35016461 35016467 35016607 35041073 Duplicate/bad block(s) in inode 16793637: 35049360 35053476 35061525 Duplicate/bad block(s) in inode 16793647: 35714288 35714289 35714290 35714291 35714292 35714293 35714294 35714301 35714302 35714303 Duplicate/bad block(s) in inode 25247794: 52760866 52760918 52760985 52761002 52765090 52765095 52769027 52769045 52769112 52777269 52777409 52793748 52797741 52818227 52818431 Duplicate/bad block(s) in inode 67125291: 136578816 136578817 136578818 136578819 136578820 136578821 136578822 136578829 136578830 136578831 Duplicate/bad block(s) in inode 67125331: 138219553 138219554 138219557 138219569 138219570 138219575 138219576 138219577 138219580 138219644 138219646 138219669 138219724 138219764 138223636 138223666 138223670 138223672 138223714 138223761 138223763 138223852 138227721 138227723 138227725 138227747 138227750 138227754 138227759 138227760 138227762 138227763 138227765 138227766 138227769 138227770 138227771 138227772 138227774 138227775 138227796 138227824 138227831 138227896 138227902 138227903 138227918 138235912 138235942 138235955 138235964 138236023 138240019 138240059 138240090 138240161 138240186 138244121 138244128 138244144 138244152 138244154 138244155 138244189 138244199 138244218 138244276 138244335 138252389 138252414 138252525 138252526 138256435 138256438 138256471 138256543 138256551 138260501 138260504 138260524 138260534 138260537 138260539 138260582 138260593 138260608 138260637 138260660 138260671 138264637 138264812 138268837 138268897 138272778 138272795 138272809 138272887 138272971 138272989 138273010 138277037 138277056 138277059 138277104 Duplicate/bad block(s) in inode 67125361: 139920385 139920559 139924547 139969634 139982064 139982068 139982069 139982070 139982071 139982072 139982073 139982074 139982075 139982076 139982077 139982078 139982079 Duplicate/bad block(s) in inode 67125371: 140115184 140115185 140115186 140115187 140115188 140115189 140115190 140115197 140115198 140115199 Duplicate/bad block(s) in inode 134234322: 278333096 278341216 278341217 278341218 278341219 278341220 278341221 278341222 278341223 278341224 278341225 278341226 278341227 278341228 278341229 278341230 278341231 278349543 278357756 278374090 278378233 278386314 278386328 278390427 278394608 278394609 278394610 278394611 278394612 278394613 278394614 278394615 278394616 278394617 278394618 278394619 278394620 278394621 278394622 278394623 Duplicate/bad block(s) in inode 134234362: 280430697 280430753 280431012 280434874 280438883 280447045 280447211 280451165 280455255 280467606 280467644 280484078 280488199 280488342 280492052 280492272 280492273 280492274 280492275 280492276 280492277 280492278 280492279 280492280 280492281 280492282 280492283 280492284 280492285 280492286 280492287 280492528 280492529 280492530 280492531 280492532 280492533 280492534 280492535 280492536 280492537 280492538 280492539 280492540 280492541 280492542 280492543 Pass 1C: Scan directories for inodes with dup blocks. Pass 1D: Reconciling duplicate blocks (There are 18 inodes containing duplicate/bad blocks.) File /20070830.120300.ts (inode #16524, mod time Thu Aug 30 12:13:22 2007) has 3 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070830.121100.ts (inode #16534, mod time Thu Aug 30 12:21:24 2007) has 7 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070830.124300.ts (inode #16574, mod time Thu Aug 30 12:53:23 2007) has 10 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070830.181900.ts (inode #16994, mod time Thu Aug 30 18:29:23 2007) has 10 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070829.174700.ts (inode #2113541, mod time Wed Aug 29 17:57:21 2007) has 4 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070829.175500.ts (inode #2113551, mod time Wed Aug 29 18:05:25 2007) has 21 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070829.183500.ts (inode #2113601, mod time Wed Aug 29 18:45:23 2007) has 10 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070829.185100.ts (inode #2113621, mod time Wed Aug 29 19:01:23 2007) has 10 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070830.001100.ts (inode #16793627, mod time Thu Aug 30 00:21:23 2007) has 8 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070830.001900.ts (inode #16793637, mod time Thu Aug 30 00:29:24 2007) has 3 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070830.002700.ts (inode #16793647, mod time Thu Aug 30 00:37:22 2007) has 10 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070830.084300.ts (inode #25247794, mod time Thu Aug 30 08:53:23 2007) has 15 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070830.033900.ts (inode #67125291, mod time Thu Aug 30 03:49:23 2007) has 10 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070830.041100.ts (inode #67125331, mod time Thu Aug 30 04:21:23 2007) has 104 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070830.043500.ts (inode #67125361, mod time Thu Aug 30 04:45:23 2007) has 17 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070830.044300.ts (inode #67125371, mod time Thu Aug 30 04:53:22 2007) has 10 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070830.073100.ts (inode #134234322, mod time Thu Aug 30 07:41:23 2007) has 40 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070830.080300.ts (inode #134234362, mod time Thu Aug 30 08:13:22 2007) has 47 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. Pass 2: Checking directory structure Entry '20070831.101900.ts' in / (2) has deleted/unused inode 17567. Clear? yes Entry '20070831.101900.tsrate' in / (2) has deleted/unused inode 17568. Clear? yes Entry '20070831.101900.0000.info' in / (2) has deleted/unused inode 17569. Clear? yes Entry '20070831.101900.0060.info' in / (2) has deleted/unused inode 17570. Clear? yes Entry '20070831.101900.0120.info' in / (2) has deleted/unused inode 17571. Clear? yes Entry '20070831.101900.0180.info' in / (2) has deleted/unused inode 17572. Clear? yes Entry '20070831.101900.0240.info' in / (2) has deleted/unused inode 17573. Clear? yes Entry '20070831.101900.0300.info' in / (2) has deleted/unused inode 17574. Clear? yes Entry '20070831.101900.0360.info' in / (2) has deleted/unused inode 17575. Clear? yes Entry '20070831.101900.0420.info' in / (2) has deleted/unused inode 17576. Clear? yes Entry '20070831.102700.ts' in / (2) has deleted/unused inode 17577. Clear? yes Entry '20070831.102700.tsrate' in / (2) has deleted/unused inode 17578. Clear? yes Entry '20070831.102700.0000.info' in / (2) has deleted/unused inode 17579. Clear? yes Entry '20070831.102700.0060.info' in / (2) has deleted/unused inode 17580. Clear? yes Entry '20070831.102700.0120.info' in / (2) has deleted/unused inode 17581. Clear? yes Entry '20070831.102700.0180.info' in / (2) has deleted/unused inode 17582. Clear? yes Entry '20070831.102700.0240.info' in / (2) has deleted/unused inode 17583. Clear? yes Entry '20070831.102700.0300.info' in / (2) has deleted/unused inode 17584. Clear? yes Entry '20070831.102700.0360.info' in / (2) has deleted/unused inode 17585. Clear? yes Entry '20070831.102700.0420.info' in / (2) has deleted/unused inode 17586. Clear? yes Entry '20070831.103500.ts' in / (2) has deleted/unused inode 17587. Clear? yes Entry '20070831.103500.tsrate' in / (2) has deleted/unused inode 17588. Clear? yes Entry '20070831.103500.0000.info' in / (2) has deleted/unused inode 17589. Clear? yes Entry '20070831.103500.0060.info' in / (2) has deleted/unused inode 17590. Clear? yes Entry '20070831.103500.0120.info' in / (2) has deleted/unused inode 17591. Clear? yes Entry '20070831.103500.0180.info' in / (2) has deleted/unused inode 17592. Clear? yes Entry '20070831.103500.0240.info' in / (2) has deleted/unused inode 17593. Clear? yes Entry '20070831.103500.0300.info' in / (2) has deleted/unused inode 17594. Clear? yes Entry '20070831.103500.0360.info' in / (2) has deleted/unused inode 17595. Clear? yes Entry '20070831.103500.0420.info' in / (2) has deleted/unused inode 17596. Clear? yes Entry '20070831.104300.ts' in / (2) has deleted/unused inode 17597. Clear? yes Entry '20070831.104300.tsrate' in / (2) has deleted/unused inode 17598. Clear? yes Entry '20070831.104300.0000.info' in / (2) has deleted/unused inode 17599. Clear? yes Entry '20070831.104300.0060.info' in / (2) has deleted/unused inode 17600. Clear? yes Entry '20070831.104300.0120.info' in / (2) has deleted/unused inode 17601. Clear? yes Entry '20070831.104300.0180.info' in / (2) has deleted/unused inode 17602. Clear? yes Entry '20070831.104300.0240.info' in / (2) has deleted/unused inode 17603. Clear? yes Entry '20070831.104300.0300.info' in / (2) has deleted/unused inode 17604. Clear? yes Entry '20070831.104300.0360.info' in / (2) has deleted/unused inode 17605. Clear? yes Entry '20070831.104300.0420.info' in / (2) has deleted/unused inode 17606. Clear? yes Entry '20070831.105100.ts' in / (2) has deleted/unused inode 17607. Clear? yes Entry '20070831.105100.tsrate' in / (2) has deleted/unused inode 17608. Clear? yes Entry '20070831.105100.0000.info' in / (2) has deleted/unused inode 17609. Clear? yes Entry '20070831.105100.0060.info' in / (2) has deleted/unused inode 17610. Clear? yes Entry '20070831.105100.0120.info' in / (2) has deleted/unused inode 17611. Clear? yes Entry '20070831.105100.0180.info' in / (2) has deleted/unused inode 17612. Clear? yes Entry '20070831.105100.0240.info' in / (2) has deleted/unused inode 17613. Clear? yes Entry '20070831.105100.0300.info' in / (2) has deleted/unused inode 17614. Clear? yes Entry '20070831.105100.0360.info' in / (2) has deleted/unused inode 17615. Clear? yes Entry '20070831.105100.0420.info' in / (2) has deleted/unused inode 17616. Clear? yes Entry '20070831.105900.ts' in / (2) has deleted/unused inode 17617. Clear? yes Entry '20070831.105900.tsrate' in / (2) has deleted/unused inode 17618. Clear? yes Entry '20070831.105900.0000.info' in / (2) has deleted/unused inode 17619. Clear? yes Entry '20070831.105900.0060.info' in / (2) has deleted/unused inode 17620. Clear? yes Entry '20070831.105900.0120.info' in / (2) has deleted/unused inode 17621. Clear? yes Entry '20070831.105900.0180.info' in / (2) has deleted/unused inode 17622. Clear? yes Entry '20070831.105900.0240.info' in / (2) has deleted/unused inode 17623. Clear? yes Entry '20070831.105900.0300.info' in / (2) has deleted/unused inode 17624. Clear? yes Entry '20070831.105900.0360.info' in / (2) has deleted/unused inode 17625. Clear? yes Entry '20070831.105900.0420.info' in / (2) has deleted/unused inode 17626. Clear? yes Entry '20070831.110700.ts' in / (2) has deleted/unused inode 17627. Clear? yes Entry '20070831.110700.tsrate' in / (2) has deleted/unused inode 17628. Clear? yes Entry '20070831.110700.0000.info' in / (2) has deleted/unused inode 17629. Clear? yes Entry '20070831.110700.0060.info' in / (2) has deleted/unused inode 17630. Clear? yes Entry '20070831.110700.0120.info' in / (2) has deleted/unused inode 17631. Clear? yes Entry '20070831.110700.0180.info' in / (2) has deleted/unused inode 17632. Clear? yes Entry '20070831.110700.0240.info' in / (2) has deleted/unused inode 17633. Clear? yes Entry '20070831.110700.0300.info' in / (2) has deleted/unused inode 17634. Clear? yes Entry '20070831.110700.0360.info' in / (2) has deleted/unused inode 17635. Clear? yes Entry '20070831.110700.0420.info' in / (2) has deleted/unused inode 17636. Clear? yes Entry '20070831.111500.ts' in / (2) has deleted/unused inode 17637. Clear? yes Entry '20070831.111500.tsrate' in / (2) has deleted/unused inode 17638. Clear? yes Entry '20070831.111500.0000.info' in / (2) has deleted/unused inode 17639. Clear? yes Entry '20070831.111500.0060.info' in / (2) has deleted/unused inode 17640. Clear? yes Entry '20070831.111500.0120.info' in / (2) has deleted/unused inode 17641. Clear? yes Entry '20070831.111500.0180.info' in / (2) has deleted/unused inode 17642. Clear? yes Entry '20070831.111500.0240.info' in / (2) has deleted/unused inode 17643. Clear? yes Entry '20070831.111500.0300.info' in / (2) has deleted/unused inode 17644. Clear? yes Entry '20070831.111500.0360.info' in / (2) has deleted/unused inode 17645. Clear? yes Entry '20070831.111500.0420.info' in / (2) has deleted/unused inode 17646. Clear? yes Pass 3: Checking directory connectivity Pass 4: Checking reference counts Unattached zero-length inode 17565. Clear? yes Unattached zero-length inode 17566. Clear? yes Free blocks count wrong for group #142 (0, counted=37). Fix? yes Free blocks count wrong for group #218 (0, counted=10). Fix? yes Free blocks count wrong for group #259 (4, counted=14). Fix? yes Free blocks count wrong for group #991 (0, counted=10). Fix? yes Free blocks count wrong for group #1068 (4, counted=15). Fix? yes Free blocks count wrong for group #1088 (4, counted=14). Fix? yes Free blocks count wrong for group #1245 (4, counted=14). Fix? yes Free blocks count wrong for group #1610 (0, counted=30). Fix? yes Free blocks count wrong for group #2269 (0, counted=10). Fix? yes Free blocks count wrong for group #4168 (4, counted=14). Fix? yes Free blocks count wrong for group #4219 (0, counted=113). Fix? yes Free blocks count wrong for group #4271 (6, counted=36). Fix? yes Free blocks count wrong for group #4275 (13, counted=23). Fix? yes Free blocks count wrong for group #8494 (0, counted=74). Fix? yes Free blocks count wrong for group #8558 (6, counted=90). Fix? yes Free blocks count wrong (360044482, counted=360044941). Fix? yes Inode bitmap differences: -(17567--17646) -184754182 -184754213 -184754222 -184754278 -184754285 -184754318 -(184754373--184754374) -184754405 -184754414 -184754438 -184754445 -184754470 -184755142 -184755181 -184755213 -184755461 -184755501 -184755504 -184755525 -184755534 Fix? yes /dev/sdb1: ***** FILE SYSTEM WAS MODIFIED ***** /dev/sdb1: 2498/243204096 files (0.1% non-contiguous), 126353048/486397989 blocks ======================================================= After fsck.ext3 i am able to mount the device again but after another 20 hours of backup i am seeing the same error again and partition remounts itself as read-only. Please provide your suggestion on solving this problem. Thank you once again. Best Regards, Raghu --- Christian Kujau wrote: > On Wed, 29 Aug 2007, Kannan Raghuprasath wrote: > > EXT3-fs error (device sdb1): ext3_add_entry: bad > entry > > in directory #2: directory entry across blocks - > > offset=1080, inode=135216, rec_len=4132, > name_len=25 > > Aborting journal on device sdb1. > > ext3_abort called. > > EXT3-fs error (device sdb1): > ext3_journal_start_sb: > > Detected aborted journal > > Proably too late, but: are there any device-related > errors in the log? > Have you checked the device for errors? (eg. a > simple > dd if=/dev/sdb of=/dev/null should do). > > If the device (and the cabling) is fine, did you try > fsck.ext3 yet? If > so, what does it say? > > C. > -- > BOFH excuse #261: > > The Usenet news is out of date > ____________________________________________________________________________________ Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase. http://farechase.yahoo.com/ From raghuprasath at yahoo.com Mon Sep 3 07:25:03 2007 From: raghuprasath at yahoo.com (Kannan Raghuprasath) Date: Mon, 3 Sep 2007 00:25:03 -0700 (PDT) Subject: ext3-fs error with RAID 5 Array. In-Reply-To: Message-ID: <237365.82904.qm@web34711.mail.mud.yahoo.com> Hi Christian, I am currently trying with 2.6.21 kernel but i am having problems in detecting my SCSI controller card in this kernel version.. currently working on that .. once i fix this SCSI card issue i will be able to test again and come up with the results.. Thanks Raghu --- Christian Kujau wrote: > On Sun, 2 Sep 2007, Kannan Raghuprasath wrote: > > After fsck.ext3 i am able to mount the device > again > > Wow, fsck did quite a few things, you probably > checked /lost+found for > missing data. Did you fsck again after fsck fixed > things? > > > but after anther 20 hours of backup i am seeing > the > > same error again and partition remounts itself as > > read-only. > > Unless you're hitting some crude, new, unidentified > bug in ext3, > I still think the corruption is hardware related. > But the logs are > clean, you say. Hm, can you try a current (vanilla) > kernel? Lots of > stuff happened since 2.6.11 (03/2005). Not being an > expert, only > CONFIG_LBD and CONFIG_LSF comes to mind, but I > assume the FC kernel has > set these. > > Christian. > -- > BOFH excuse #378: > > Operators killed by year 2000 bug bite. > ____________________________________________________________________________________ Moody friends. Drama queens. Your life? Nope! - their life, your story. Play Sims Stories at Yahoo! Games. http://sims.yahoo.com/ From raghuprasath at yahoo.com Mon Sep 3 07:29:09 2007 From: raghuprasath at yahoo.com (Kannan Raghuprasath) Date: Mon, 3 Sep 2007 00:29:09 -0700 (PDT) Subject: ext3-fs error with RAID 5 Array. In-Reply-To: <46DBAFB7.8030803@netropol.de> Message-ID: <439683.79478.qm@web34703.mail.mud.yahoo.com> Hi Jan, Thanks for reply. I am using latest mother board with core 2 duo processor .. did you change the RAM? can you kindly specify the type of mother board that worked for you?. Thanks Raghu --- Jan Stobbe wrote: > Hi Kannan, > > we had similar problems with ext3. After a month of > searching for the > error we changed the mainboard of the server. The > maiboard was to old to > work properly with the new areca raid controler we > used. > Without heavy load the device was o.k. > > > Best regards, Jan > > > > Hi, > > > > I have a Fedora core 4 machine (kernel- > > 2.6.11-1.1369_FC4smp) conneted to external DAS > using > > Ultra 320 SCSI controller card. The DAS is > configured > > as RAID 5 of 3.5 TB. I partitioned this array into > two > > partitions of size 1.9TB and 1.6 TB. These are > > assigned to 2 different LUNS so that these two > appear > > as two partitions in the my Linux machine. > > > > I use this DAS for backup and after backing up for > 30 > > hours i observe that partition gets remounted as > > read-only with following error message in dmesg, > > > > EXT3-fs error (device sdb1): ext3_add_entry: bad > entry > > in directory #2: directory entry across blocks - > > offset=1080, inode=135216, rec_len=4132, > name_len=25 > > Aborting journal on device sdb1. > > ext3_abort called. > > EXT3-fs error (device sdb1): > ext3_journal_start_sb: > > Detected aborted journal > > Remounting filesystem read-only > > EXT3-fs error (device sdb1) in start_transaction: > > Journal has aborted > > EXT3-fs error (device sdb1) in ext3_create: IO > failure > > EXT3-fs error (device sdb1): ext3_readdir: bad > entry > > in directory #2: directory entry across blocks - > > offset=1080, inode=135216, rec_len=4132, > name_len=25 > > EXT3-fs error (device sdb1): ext3_readdir: bad > entry > > in directory #2: directory entry across blocks - > > offset=1080, inode=135216, rec_len=4132, > name_len=25 > > EXT3-fs error (device sdb1): ext3_readdir: bad > entry > > in directory #2: directory entry across blocks - > > offset=1080, inode=135216, rec_len=4132, > name_len=25 > > EXT3-fs error (device sdb1): ext3_readdir: bad > entry > > in directory #2: directory entry across blocks - > > offset=1080, inode=135216, rec_len=4132, > name_len=25 > > EXT3-fs error (device sdb1): ext3_readdir: bad > entry > > in directory #2: directory entry across blocks - > > offset=1080, inode=135216, rec_len=4132, > name_len=25 > > > > > > Any suggestions on how to solve this problem? > > > > Best regards and thanks in advance for your help, > > > > Raghu > > > > > > > > > > > ____________________________________________________________________________________ > > Building a website is a piece of cake. Yahoo! > Small Business gives you all the tools to get > online. > > http://smallbusiness.yahoo.com/webhosting > > > > _______________________________________________ > > Ext3-users mailing list > > Ext3-users at redhat.com > > https://www.redhat.com/mailman/listinfo/ext3-users > > > -- > Jan Stobbe Netropol > Digitale Systeme GmbH > jan.stobbe at netropol.de > Stresemannstrasse 161 > Tel: +49 40 284167-20 D-22769 > Hamburg/Germany > Fax: +49 40 284167-40 > http://www.netropol.de/ > > Registergericht: Gesch?ftsf?hrer: > Handelsregister Hamburg, HRB 75989 Jan Stobbe, > Karl-Heinz Dellwo > > ____________________________________________________________________________________ Need a vacation? Get great deals to amazing places on Yahoo! Travel. http://travel.yahoo.com/ From raghuprasath at yahoo.com Mon Sep 3 06:29:37 2007 From: raghuprasath at yahoo.com (Kannan Raghuprasath) Date: Sun, 2 Sep 2007 23:29:37 -0700 (PDT) Subject: ext3-fs error with RAID 5 Array. In-Reply-To: Message-ID: <775678.54481.qm@web34706.mail.mud.yahoo.com> Hi, Thanks for the reply. there doesn't seem to be any device related error... I did try fsck.ext3 and here is the ouput .. ======================================================= e2fsck 1.37 (21-Mar-2005) /dev/sdb1: recovering journal /dev/sdb1 contains a file system with errors, check forced. Pass 1: Checking inodes, blocks, and sizes Inodes that were part of a corrupted orphan linked list found. Fix? yes Inode 23150593 was part of the orphaned inode list. FIXED. Inode 23258292 was part of the orphaned inode list. FIXED. yInode 23365847 was part of the orphaned inode list. FIXED. Inode 23365848 was part of the orphaned inode list. FIXED. Inode 23645369 was part of the orphaned inode list. FIXED. Inode 23645370 was part of the orphaned inode list. FIXED. Inode 23645371 was part of the orphaned inode list. FIXED. Inode 23645372 was part of the orphaned inode list. FIXED. Deleted inode 23989501 has zero dtime. Fix? yes Inode 23989502 was part of the orphaned inode list. FIXED. Inode 23989503 was part of the orphaned inode list. FIXED. Deleted inode 23989504 has zero dtime. Fix? yes Deleted inode 24068705 has zero dtime. Fix? yes Inode 24068706 was part of the orphaned inode list. FIXED. Inode 24068707 was part of the orphaned inode list. FIXED. Inode 24068708 was part of the orphaned inode list. FIXED. Inode 24154261 was part of the orphaned inode list. FIXED. Inode 24154262 was part of the orphaned inode list. FIXED. Inode 24154263 was part of the orphaned inode list. FIXED. Inode 24154264 was part of the orphaned inode list. FIXED. Inode 24319381 was part of the orphaned inode list. FIXED. Inode 24319382 was part of the orphaned inode list. FIXED. Inode 24319383 was part of the orphaned inode list. FIXED. Inode 24319384 was part of the orphaned inode list. FIXED. Inode 24391269 was part of the orphaned inode list. FIXED. Inode 24391270 was part of the orphaned inode list. FIXED. Inode 24391271 was part of the orphaned inode list. FIXED. Inode 24391272 was part of the orphaned inode list. FIXED. Inode 24469205 was part of the orphaned inode list. FIXED. Inode 24469206 was part of the orphaned inode list. FIXED. Inode 24469207 was part of the orphaned inode list. FIXED. Inode 24469208 was part of the orphaned inode list. FIXED. Inode 24699129 was part of the orphaned inode list. FIXED. Inode 24699130 was part of the orphaned inode list. FIXED. Inode 24699131 was part of the orphaned inode list. FIXED. Inode 24699132 was part of the orphaned inode list. FIXED. Inode 24784933 was part of the orphaned inode list. FIXED. Inode 24784934 was part of the orphaned inode list. FIXED. Inode 24784935 was part of the orphaned inode list. FIXED. Deleted inode 24784936 has zero dtime. Fix? yes Inode 24842705 was part of the orphaned inode list. FIXED. Inode 24842706 was part of the orphaned inode list. FIXED. Inode 24842707 was part of the orphaned inode list. FIXED. Inode 24842708 was part of the orphaned inode list. FIXED. Inode 67857521 was part of the orphaned inode list. FIXED. Inode 68381037 was part of the orphaned inode list. FIXED. Inode 68509537 was part of the orphaned inode list. FIXED. Inode 68582341 was part of the orphaned inode list. FIXED. Inode 69041813 was part of the orphaned inode list. FIXED. Inode 69119221 was part of the orphaned inode list. FIXED. Inode 69369641 was part of the orphaned inode list. FIXED. Inode 69369642 was part of the orphaned inode list. FIXED. Inode 69369643 was part of the orphaned inode list. FIXED. Inode 69369644 was part of the orphaned inode list. FIXED. Inode 134340585 was part of the orphaned inode list. FIXED. Inode 134434341 was part of the orphaned inode list. FIXED. Inode 134714021 was part of the orphaned inode list. FIXED. Inode 134835666 was part of the orphaned inode list. FIXED. Inode 136770309 was part of the orphaned inode list. FIXED. Inode 136778097 was part of the orphaned inode list. FIXED. Inode 138026013 was part of the orphaned inode list. FIXED. Inode 144104178 was part of the orphaned inode list. FIXED. Inode 152740634 was part of the orphaned inode list. FIXED. Deleted inode 229911998 has zero dtime. Fix? yes Deleted inode 229912000 has zero dtime. Fix? yes Deleted inode 235725049 has zero dtime. Fix? yes Deleted inode 235725050 has zero dtime. Fix? yes Deleted inode 235725052 has zero dtime. Fix? yes Inode 237667897 was part of the orphaned inode list. FIXED. Inode 242319975 was part of the orphaned inode list. FIXED. Inode 242319976 was part of the orphaned inode list. FIXED. Duplicate blocks found... invoking duplicate block passes. Pass 1B: Rescan for duplicate/bad blocks Duplicate/bad block(s) in inode 16524: 32455728 32455830 32472317 Duplicate/bad block(s) in inode 16534: 32488498 32488564 32488689 32504895 32505011 32505013 32505022 Duplicate/bad block(s) in inode 16574: 40774646 40827888 40827889 40827890 40827891 40827892 40827893 40827901 40827902 40827903 Duplicate/bad block(s) in inode 16994: 74327918 74381301 74381302 74381303 74381304 74381305 74381306 74381307 74381308 74381309 Duplicate/bad block(s) in inode 2113541: 4657478 4661666 4674020 4710821 Duplicate/bad block(s) in inode 2113551: 4714992 4714993 4714994 4714995 4714996 4714997 4714998 4714999 4715000 4715001 4715002 4715003 4715004 4715005 4715006 4715007 4715248 4715249 4715250 4715251 4715252 Duplicate/bad block(s) in inode 2113601: 7206128 7206129 7206130 7206131 7206138 7206139 7206140 7206141 7206142 7206143 Duplicate/bad block(s) in inode 2113621: 8515824 8515825 8515826 8515827 8515828 8515829 8515830 8515837 8515838 8515839 Duplicate/bad block(s) in inode 16793627: 35000268 35000283 35004238 35012482 35016461 35016467 35016607 35041073 Duplicate/bad block(s) in inode 16793637: 35049360 35053476 35061525 Duplicate/bad block(s) in inode 16793647: 35714288 35714289 35714290 35714291 35714292 35714293 35714294 35714301 35714302 35714303 Duplicate/bad block(s) in inode 25247794: 52760866 52760918 52760985 52761002 52765090 52765095 52769027 52769045 52769112 52777269 52777409 52793748 52797741 52818227 52818431 Duplicate/bad block(s) in inode 67125291: 136578816 136578817 136578818 136578819 136578820 136578821 136578822 136578829 136578830 136578831 Duplicate/bad block(s) in inode 67125331: 138219553 138219554 138219557 138219569 138219570 138219575 138219576 138219577 138219580 138219644 138219646 138219669 138219724 138219764 138223636 138223666 138223670 138223672 138223714 138223761 138223763 138223852 138227721 138227723 138227725 138227747 138227750 138227754 138227759 138227760 138227762 138227763 138227765 138227766 138227769 138227770 138227771 138227772 138227774 138227775 138227796 138227824 138227831 138227896 138227902 138227903 138227918 138235912 138235942 138235955 138235964 138236023 138240019 138240059 138240090 138240161 138240186 138244121 138244128 138244144 138244152 138244154 138244155 138244189 138244199 138244218 138244276 138244335 138252389 138252414 138252525 138252526 138256435 138256438 138256471 138256543 138256551 138260501 138260504 138260524 138260534 138260537 138260539 138260582 138260593 138260608 138260637 138260660 138260671 138264637 138264812 138268837 138268897 138272778 138272795 138272809 138272887 138272971 138272989 138273010 138277037 138277056 138277059 138277104 Duplicate/bad block(s) in inode 67125361: 139920385 139920559 139924547 139969634 139982064 139982068 139982069 139982070 139982071 139982072 139982073 139982074 139982075 139982076 139982077 139982078 139982079 Duplicate/bad block(s) in inode 67125371: 140115184 140115185 140115186 140115187 140115188 140115189 140115190 140115197 140115198 140115199 Duplicate/bad block(s) in inode 134234322: 278333096 278341216 278341217 278341218 278341219 278341220 278341221 278341222 278341223 278341224 278341225 278341226 278341227 278341228 278341229 278341230 278341231 278349543 278357756 278374090 278378233 278386314 278386328 278390427 278394608 278394609 278394610 278394611 278394612 278394613 278394614 278394615 278394616 278394617 278394618 278394619 278394620 278394621 278394622 278394623 Duplicate/bad block(s) in inode 134234362: 280430697 280430753 280431012 280434874 280438883 280447045 280447211 280451165 280455255 280467606 280467644 280484078 280488199 280488342 280492052 280492272 280492273 280492274 280492275 280492276 280492277 280492278 280492279 280492280 280492281 280492282 280492283 280492284 280492285 280492286 280492287 280492528 280492529 280492530 280492531 280492532 280492533 280492534 280492535 280492536 280492537 280492538 280492539 280492540 280492541 280492542 280492543 Pass 1C: Scan directories for inodes with dup blocks. Pass 1D: Reconciling duplicate blocks (There are 18 inodes containing duplicate/bad blocks.) File /20070830.120300.ts (inode #16524, mod time Thu Aug 30 12:13:22 2007) has 3 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070830.121100.ts (inode #16534, mod time Thu Aug 30 12:21:24 2007) has 7 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070830.124300.ts (inode #16574, mod time Thu Aug 30 12:53:23 2007) has 10 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070830.181900.ts (inode #16994, mod time Thu Aug 30 18:29:23 2007) has 10 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070829.174700.ts (inode #2113541, mod time Wed Aug 29 17:57:21 2007) has 4 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070829.175500.ts (inode #2113551, mod time Wed Aug 29 18:05:25 2007) has 21 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070829.183500.ts (inode #2113601, mod time Wed Aug 29 18:45:23 2007) has 10 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070829.185100.ts (inode #2113621, mod time Wed Aug 29 19:01:23 2007) has 10 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070830.001100.ts (inode #16793627, mod time Thu Aug 30 00:21:23 2007) has 8 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070830.001900.ts (inode #16793637, mod time Thu Aug 30 00:29:24 2007) has 3 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070830.002700.ts (inode #16793647, mod time Thu Aug 30 00:37:22 2007) has 10 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070830.084300.ts (inode #25247794, mod time Thu Aug 30 08:53:23 2007) has 15 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070830.033900.ts (inode #67125291, mod time Thu Aug 30 03:49:23 2007) has 10 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070830.041100.ts (inode #67125331, mod time Thu Aug 30 04:21:23 2007) has 104 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070830.043500.ts (inode #67125361, mod time Thu Aug 30 04:45:23 2007) has 17 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070830.044300.ts (inode #67125371, mod time Thu Aug 30 04:53:22 2007) has 10 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070830.073100.ts (inode #134234322, mod time Thu Aug 30 07:41:23 2007) has 40 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. File /20070830.080300.ts (inode #134234362, mod time Thu Aug 30 08:13:22 2007) has 47 duplicate block(s), shared with 0 file(s): Duplicated blocks already reassigned or cloned. Pass 2: Checking directory structure Entry '20070831.101900.ts' in / (2) has deleted/unused inode 17567. Clear? yes Entry '20070831.101900.tsrate' in / (2) has deleted/unused inode 17568. Clear? yes Entry '20070831.101900.0000.info' in / (2) has deleted/unused inode 17569. Clear? yes Entry '20070831.101900.0060.info' in / (2) has deleted/unused inode 17570. Clear? yes Entry '20070831.101900.0120.info' in / (2) has deleted/unused inode 17571. Clear? yes Entry '20070831.101900.0180.info' in / (2) has deleted/unused inode 17572. Clear? yes Entry '20070831.101900.0240.info' in / (2) has deleted/unused inode 17573. Clear? yes Entry '20070831.101900.0300.info' in / (2) has deleted/unused inode 17574. Clear? yes Entry '20070831.101900.0360.info' in / (2) has deleted/unused inode 17575. Clear? yes Entry '20070831.101900.0420.info' in / (2) has deleted/unused inode 17576. Clear? yes Entry '20070831.102700.ts' in / (2) has deleted/unused inode 17577. Clear? yes Entry '20070831.102700.tsrate' in / (2) has deleted/unused inode 17578. Clear? yes Entry '20070831.102700.0000.info' in / (2) has deleted/unused inode 17579. Clear? yes Entry '20070831.102700.0060.info' in / (2) has deleted/unused inode 17580. Clear? yes Entry '20070831.102700.0120.info' in / (2) has deleted/unused inode 17581. Clear? yes Entry '20070831.102700.0180.info' in / (2) has deleted/unused inode 17582. Clear? yes Entry '20070831.102700.0240.info' in / (2) has deleted/unused inode 17583. Clear? yes Entry '20070831.102700.0300.info' in / (2) has deleted/unused inode 17584. Clear? yes Entry '20070831.102700.0360.info' in / (2) has deleted/unused inode 17585. Clear? yes Entry '20070831.102700.0420.info' in / (2) has deleted/unused inode 17586. Clear? yes Entry '20070831.103500.ts' in / (2) has deleted/unused inode 17587. Clear? yes Entry '20070831.103500.tsrate' in / (2) has deleted/unused inode 17588. Clear? yes Entry '20070831.103500.0000.info' in / (2) has deleted/unused inode 17589. Clear? yes Entry '20070831.103500.0060.info' in / (2) has deleted/unused inode 17590. Clear? yes Entry '20070831.103500.0120.info' in / (2) has deleted/unused inode 17591. Clear? yes Entry '20070831.103500.0180.info' in / (2) has deleted/unused inode 17592. Clear? yes Entry '20070831.103500.0240.info' in / (2) has deleted/unused inode 17593. Clear? yes Entry '20070831.103500.0300.info' in / (2) has deleted/unused inode 17594. Clear? yes Entry '20070831.103500.0360.info' in / (2) has deleted/unused inode 17595. Clear? yes Entry '20070831.103500.0420.info' in / (2) has deleted/unused inode 17596. Clear? yes Entry '20070831.104300.ts' in / (2) has deleted/unused inode 17597. Clear? yes Entry '20070831.104300.tsrate' in / (2) has deleted/unused inode 17598. Clear? yes Entry '20070831.104300.0000.info' in / (2) has deleted/unused inode 17599. Clear? yes Entry '20070831.104300.0060.info' in / (2) has deleted/unused inode 17600. Clear? yes Entry '20070831.104300.0120.info' in / (2) has deleted/unused inode 17601. Clear? yes Entry '20070831.104300.0180.info' in / (2) has deleted/unused inode 17602. Clear? yes Entry '20070831.104300.0240.info' in / (2) has deleted/unused inode 17603. Clear? yes Entry '20070831.104300.0300.info' in / (2) has deleted/unused inode 17604. Clear? yes Entry '20070831.104300.0360.info' in / (2) has deleted/unused inode 17605. Clear? yes Entry '20070831.104300.0420.info' in / (2) has deleted/unused inode 17606. Clear? yes Entry '20070831.105100.ts' in / (2) has deleted/unused inode 17607. Clear? yes Entry '20070831.105100.tsrate' in / (2) has deleted/unused inode 17608. Clear? yes Entry '20070831.105100.0000.info' in / (2) has deleted/unused inode 17609. Clear? yes Entry '20070831.105100.0060.info' in / (2) has deleted/unused inode 17610. Clear? yes Entry '20070831.105100.0120.info' in / (2) has deleted/unused inode 17611. Clear? yes Entry '20070831.105100.0180.info' in / (2) has deleted/unused inode 17612. Clear? yes Entry '20070831.105100.0240.info' in / (2) has deleted/unused inode 17613. Clear? yes Entry '20070831.105100.0300.info' in / (2) has deleted/unused inode 17614. Clear? yes Entry '20070831.105100.0360.info' in / (2) has deleted/unused inode 17615. Clear? yes Entry '20070831.105100.0420.info' in / (2) has deleted/unused inode 17616. Clear? yes Entry '20070831.105900.ts' in / (2) has deleted/unused inode 17617. Clear? yes Entry '20070831.105900.tsrate' in / (2) has deleted/unused inode 17618. Clear? yes Entry '20070831.105900.0000.info' in / (2) has deleted/unused inode 17619. Clear? yes Entry '20070831.105900.0060.info' in / (2) has deleted/unused inode 17620. Clear? yes Entry '20070831.105900.0120.info' in / (2) has deleted/unused inode 17621. Clear? yes Entry '20070831.105900.0180.info' in / (2) has deleted/unused inode 17622. Clear? yes Entry '20070831.105900.0240.info' in / (2) has deleted/unused inode 17623. Clear? yes Entry '20070831.105900.0300.info' in / (2) has deleted/unused inode 17624. Clear? yes Entry '20070831.105900.0360.info' in / (2) has deleted/unused inode 17625. Clear? yes Entry '20070831.105900.0420.info' in / (2) has deleted/unused inode 17626. Clear? yes Entry '20070831.110700.ts' in / (2) has deleted/unused inode 17627. Clear? yes Entry '20070831.110700.tsrate' in / (2) has deleted/unused inode 17628. Clear? yes Entry '20070831.110700.0000.info' in / (2) has deleted/unused inode 17629. Clear? yes Entry '20070831.110700.0060.info' in / (2) has deleted/unused inode 17630. Clear? yes Entry '20070831.110700.0120.info' in / (2) has deleted/unused inode 17631. Clear? yes Entry '20070831.110700.0180.info' in / (2) has deleted/unused inode 17632. Clear? yes Entry '20070831.110700.0240.info' in / (2) has deleted/unused inode 17633. Clear? yes Entry '20070831.110700.0300.info' in / (2) has deleted/unused inode 17634. Clear? yes Entry '20070831.110700.0360.info' in / (2) has deleted/unused inode 17635. Clear? yes Entry '20070831.110700.0420.info' in / (2) has deleted/unused inode 17636. Clear? yes Entry '20070831.111500.ts' in / (2) has deleted/unused inode 17637. Clear? yes Entry '20070831.111500.tsrate' in / (2) has deleted/unused inode 17638. Clear? yes Entry '20070831.111500.0000.info' in / (2) has deleted/unused inode 17639. Clear? yes Entry '20070831.111500.0060.info' in / (2) has deleted/unused inode 17640. Clear? yes Entry '20070831.111500.0120.info' in / (2) has deleted/unused inode 17641. Clear? yes Entry '20070831.111500.0180.info' in / (2) has deleted/unused inode 17642. Clear? yes Entry '20070831.111500.0240.info' in / (2) has deleted/unused inode 17643. Clear? yes Entry '20070831.111500.0300.info' in / (2) has deleted/unused inode 17644. Clear? yes Entry '20070831.111500.0360.info' in / (2) has deleted/unused inode 17645. Clear? yes Entry '20070831.111500.0420.info' in / (2) has deleted/unused inode 17646. Clear? yes Pass 3: Checking directory connectivity Pass 4: Checking reference counts Unattached zero-length inode 17565. Clear? yes Unattached zero-length inode 17566. Clear? yes Free blocks count wrong for group #142 (0, counted=37). Fix? yes Free blocks count wrong for group #218 (0, counted=10). Fix? yes Free blocks count wrong for group #259 (4, counted=14). Fix? yes Free blocks count wrong for group #991 (0, counted=10). Fix? yes Free blocks count wrong for group #1068 (4, counted=15). Fix? yes Free blocks count wrong for group #1088 (4, counted=14). Fix? yes Free blocks count wrong for group #1245 (4, counted=14). Fix? yes Free blocks count wrong for group #1610 (0, counted=30). Fix? yes Free blocks count wrong for group #2269 (0, counted=10). Fix? yes Free blocks count wrong for group #4168 (4, counted=14). Fix? yes Free blocks count wrong for group #4219 (0, counted=113). Fix? yes Free blocks count wrong for group #4271 (6, counted=36). Fix? yes Free blocks count wrong for group #4275 (13, counted=23). Fix? yes Free blocks count wrong for group #8494 (0, counted=74). Fix? yes Free blocks count wrong for group #8558 (6, counted=90). Fix? yes Free blocks count wrong (360044482, counted=360044941). Fix? yes Inode bitmap differences: -(17567--17646) -184754182 -184754213 -184754222 -184754278 -184754285 -184754318 -(184754373--184754374) -184754405 -184754414 -184754438 -184754445 -184754470 -184755142 -184755181 -184755213 -184755461 -184755501 -184755504 -184755525 -184755534 Fix? yes /dev/sdb1: ***** FILE SYSTEM WAS MODIFIED ***** /dev/sdb1: 2498/243204096 files (0.1% non-contiguous), 126353048/486397989 blocks ===================================================== After fsck.ext3 i am able to mount the device again but after anther 20 hours of backup i am seeing the same error again and partition remounts itself as read-only. Please provide your suggestion on solving this problem. Thank you once again. Best Regards, Raghu --- Christian Kujau wrote: > On Wed, 29 Aug 2007, Kannan Raghuprasath wrote: > > EXT3-fs error (device sdb1): ext3_add_entry: bad > entry > > in directory #2: directory entry across blocks - > > offset=1080, inode=135216, rec_len=4132, > name_len=25 > > Aborting journal on device sdb1. > > ext3_abort called. > > EXT3-fs error (device sdb1): > ext3_journal_start_sb: > > Detected aborted journal > > Proably too late, but: are there any device-related > errors in the log? > Have you checked the device for errors? (eg. a > simple > dd if=/dev/sdb of=/dev/null should do). > > If the device (and the cabling) is fine, did you try > fsck.ext3 yet? If > so, what does it say? > > C. > -- > BOFH excuse #261: > > The Usenet news is out of date > ____________________________________________________________________________________ Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase. http://farechase.yahoo.com/ From raghuprasath at yahoo.com Mon Sep 3 07:39:09 2007 From: raghuprasath at yahoo.com (Kannan Raghuprasath) Date: Mon, 3 Sep 2007 00:39:09 -0700 (PDT) Subject: ext3-fs error with RAID 5 Array. In-Reply-To: <237365.82904.qm@web34711.mail.mud.yahoo.com> Message-ID: <44394.8772.qm@web34715.mail.mud.yahoo.com> Hi Christian, > > Wow, fsck did quite a few things, you probably > > checked /lost+found for > > missing data. Did you fsck again after fsck fixed > > things? No. I didn't try fsck after it had fixed things.. Thanks, Raghu --- Kannan Raghuprasath wrote: > Hi Christian, > > I am currently trying with 2.6.21 kernel but i am > having problems in detecting my SCSI controller card > > in this kernel version.. currently working on that > .. > once i fix this SCSI card issue i will be able to > test > again and come up with the results.. > > Thanks > Raghu > --- Christian Kujau wrote: > > > On Sun, 2 Sep 2007, Kannan Raghuprasath wrote: > > > After fsck.ext3 i am able to mount the device > > again > > > > Wow, fsck did quite a few things, you probably > > checked /lost+found for > > missing data. Did you fsck again after fsck fixed > > things? > > > > > but after anther 20 hours of backup i am seeing > > the > > > same error again and partition remounts itself > as > > > read-only. > > > > Unless you're hitting some crude, new, > unidentified > > bug in ext3, > > I still think the corruption is hardware related. > > But the logs are > > clean, you say. Hm, can you try a current > (vanilla) > > kernel? Lots of > > stuff happened since 2.6.11 (03/2005). Not being > an > > expert, only > > CONFIG_LBD and CONFIG_LSF comes to mind, but I > > assume the FC kernel has > > set these. > > > > Christian. > > -- > > BOFH excuse #378: > > > > Operators killed by year 2000 bug bite. > > > > > > > ____________________________________________________________________________________ > Moody friends. Drama queens. Your life? Nope! - > their life, your story. Play Sims Stories at Yahoo! > Games. > http://sims.yahoo.com/ > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users > ____________________________________________________________________________________ Be a better Globetrotter. Get better travel answers from someone who knows. Yahoo! Answers - Check it out. http://answers.yahoo.com/dir/?link=list&sid=396545469 From tpo2 at sourcepole.ch Mon Sep 3 15:01:49 2007 From: tpo2 at sourcepole.ch (Tomas Pospisek ML) Date: Mon, 03 Sep 2007 15:01:49 +0000 Subject: Second Block on Partition overwritten with 0xFF Message-ID: Hello everybody we're running a small population of lightly embedded machines with the following specs: System: +- standard intel box FS: ext3 (defaults,errors=remount-ro,noatime) HD: TRANSCEND, ATA DISK drive, Compact Flash (CF), 2000880 sectors (1024 MB) w/2KiB Cache, CHS=1985/16/63 Driver: Standard IDE Driver ICH4: chipset revision 2 ICH4: not 100% native mode: will probe irqs later ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:pio, hdb:pio ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:pio, hdd:pio kernel: 2.6.15.6 #1 PREEMPT Sat Mar 11 00:56:41 CET 2006 i686 GNU/Linux ext3 was chosen in the hope to make the system more power-failure resilient. The system run on a UPS, but unfortunately some operators will just pull the power plug (allthought they're instucted not to). What we have experienced now multiple times is, that the systems run just fine, absolutely no complaints in dmesg/kern.log, until it is rebooted (shutdown -r now). At that point, *very rarely* GRUB will no longer be able to read the boot filesystem (Error 17). I've checked the on-disk data and have discovered that 0x200-0x1c00 is overwritten with 0xff, then a single 0x0f and after that 0x00 untill 0x207f That is the second to the sixteenth on-disk blocks have been overwritten: 000001e0 53 59 53 4d 53 44 4f 53 20 20 20 53 59 53 7f 01 |SYSMSDOS SYS..| 000001f0 00 41 bb 00 07 60 66 6a 00 e9 3b ff 00 00 00 00 |.A?..`fj.?;?....| 00000200 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |????????????????| * 00001c00 ff 0f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |?...............| 00001c10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00002080 ed 41 00 00 00 04 00 00 1e 39 a0 46 a6 6a dd 45 |?A.......9 F?j?E| Our project does no hardware-level operations. All access is through regular file-operations only. Thus there's no way we're aware of that our software would be changing blocks on-disk directly. What's striking about the problem above is that the first affected block starts _before_ the on-disk filesystem (0x200), which starts at 0x400. My question is: does the ext3 driver _ever_ write outside of its own space on disk - i.e into 0x000-0x400? That is can we exclude with certainity that it's _not_ the ext3 driver causing the problem? What else could cause the problem then? We don't see any sign of a problem before reboot only after. Could the IDE driver be the problem? Or is it the IDE CF Card HW? I've done a dd=/dev/hdc of=/dev/null and there was absolutely no trouble visible (nothing in kern.log/dmesg), thus the card does not seem to be broken on the physical level and doesn't have badblocks that would fail on read. Does this ring a bell with anybody? *t From tpo2 at sourcepole.ch Wed Sep 5 08:58:15 2007 From: tpo2 at sourcepole.ch (Tomas Pospisek ML) Date: Wed, 05 Sep 2007 08:58:15 +0000 Subject: Second Block on Partition overwritten with 0xFF In-Reply-To: Message-ID: Can anybody here give me a hint about the problem? Particulary: > My question is: does the ext3 driver _ever_ write outside of its own > space on disk - i.e into 0x000-0x400? That is can we exclude with > certainity that it's _not_ the ext3 driver causing the problem? ? *t On 9/3/2007, "Tomas Pospisek ML" wrote: > >Hello everybody > >we're running a small population of lightly embedded machines with the >following specs: > >System: +- standard intel box >FS: ext3 (defaults,errors=remount-ro,noatime) >HD: TRANSCEND, ATA DISK drive, Compact Flash (CF), 2000880 sectors (1024 >MB) w/2KiB Cache, CHS=1985/16/63 >Driver: Standard IDE Driver > ICH4: chipset revision 2 > ICH4: not 100% native mode: will probe irqs later > ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:pio, >hdb:pio > ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:pio, >hdd:pio >kernel: 2.6.15.6 #1 PREEMPT Sat Mar 11 00:56:41 CET 2006 i686 GNU/Linux > >ext3 was chosen in the hope to make the system more power-failure >resilient. The system run on a UPS, but unfortunately some operators >will just pull the power plug (allthought they're instucted not to). > >What we have experienced now multiple times is, that the systems run just >fine, absolutely no complaints in dmesg/kern.log, until it is rebooted >(shutdown -r now). At that point, *very rarely* GRUB will no longer be >able to read the boot filesystem (Error 17). > >I've checked the on-disk data and have discovered that 0x200-0x1c00 is >overwritten with 0xff, then a single 0x0f and after that 0x00 untill >0x207f > >That is the second to the sixteenth on-disk blocks have been overwritten: > >000001e0 53 59 53 4d 53 44 4f 53 20 20 20 53 59 53 7f 01 |SYSMSDOS >SYS..| >000001f0 00 41 bb 00 07 60 66 6a 00 e9 3b ff 00 00 00 00 >|.A?..`fj.?;?....| >00000200 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >|????????????????| >* >00001c00 ff 0f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >|?...............| >00001c10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >|................| >* >00002080 ed 41 00 00 00 04 00 00 1e 39 a0 46 a6 6a dd 45 |?A.......9 >F?j?E| > >Our project does no hardware-level operations. All access is through >regular file-operations only. Thus there's no way we're aware of that >our software would be changing blocks on-disk directly. > >What's striking about the problem above is that the first affected block >starts _before_ the on-disk filesystem (0x200), which starts at 0x400. > >My question is: does the ext3 driver _ever_ write outside of its own >space on disk - i.e into 0x000-0x400? That is can we exclude with >certainity that it's _not_ the ext3 driver causing the problem? > >What else could cause the problem then? We don't see any sign of a >problem before reboot only after. Could the IDE driver be the problem? >Or is it the IDE CF Card HW? > >I've done a dd=/dev/hdc of=/dev/null and there was absolutely no trouble >visible (nothing in kern.log/dmesg), thus the card does not seem to be >broken on the physical level and doesn't have badblocks that would fail >on read. > >Does this ring a bell with anybody? >*t > >_______________________________________________ >Ext3-users mailing list >Ext3-users at redhat.com >https://www.redhat.com/mailman/listinfo/ext3-users > From lists at nerdbynature.de Wed Sep 5 11:12:26 2007 From: lists at nerdbynature.de (Christian Kujau) Date: Wed, 5 Sep 2007 13:12:26 +0200 (CEST) Subject: Second Block on Partition overwritten with 0xFF In-Reply-To: References: Message-ID: <39474.62.180.231.196.1188990746.squirrel@www.housecafe.de> On Wed, September 5, 2007 10:58, Tomas Pospisek ML wrote: >> My question is: does the ext3 driver _ever_ write outside of its own >> space on disk - i.e into 0x000-0x400? That is can we exclude with >> certainity that it's _not_ the ext3 driver causing the problem? Not being an expert I'd say it should not write outside its assigned space. When I format sda1 with *fs, I expect that only sda1 will be touched by the fs, nothing else. I cannot think of a corner case where the fs has to touch something else. So, assuming that's correct and assuming ext3 is not buggy, I'd exclude ext3 from being the one writing to this particular space on disk. C. -- BOFH excuse #442: Trojan horse ran out of hay From tytso at mit.edu Thu Sep 6 06:10:31 2007 From: tytso at mit.edu (Theodore Tso) Date: Thu, 6 Sep 2007 02:10:31 -0400 Subject: Second Block on Partition overwritten with 0xFF In-Reply-To: References: Message-ID: <20070906061031.GD2787@thunk.org> On Wed, Sep 05, 2007 at 08:58:15AM +0000, Tomas Pospisek ML wrote: > > Can anybody here give me a hint about the problem? Particulary: > > > My question is: does the ext3 driver _ever_ write outside of its own > > space on disk - i.e into 0x000-0x400? That is can we exclude with > > certainity that it's _not_ the ext3 driver causing the problem? The ext3 driver physically can not write outside of its space on disk, since it accesses it via some device whose boundaries are defined by the partition table, for example, /dev/hda2. >From what you describe, I would certainly be suspicious of the CF hardware. - Ted From tpo2 at sourcepole.ch Thu Sep 6 10:09:12 2007 From: tpo2 at sourcepole.ch (Tomas Pospisek ML) Date: Thu, 06 Sep 2007 10:09:12 +0000 Subject: Second Block on Partition overwritten with 0xFF In-Reply-To: <20070906061031.GD2787@thunk.org> Message-ID: On 9/6/2007, "Theodore Tso" wrote: >On Wed, Sep 05, 2007 at 08:58:15AM +0000, Tomas Pospisek ML wrote: >> >> Can anybody here give me a hint about the problem? Particulary: >> >> > My question is: does the ext3 driver _ever_ write outside of its own >> > space on disk - i.e into 0x000-0x400? That is can we exclude with >> > certainity that it's _not_ the ext3 driver causing the problem? > >The ext3 driver physically can not write outside of its space on disk, >since it accesses it via some device whose boundaries are defined by >the partition table, for example, /dev/hda2. Yes, however, as I read in [1] *each* partition with an ext2/3 FS on it starts with a boot sector, and the first block group starts (per default) at 0x400. Thus as I understand it, it *would* be possible for the ext3 driver to pysically write to those first sectors inside its partition. Does the ext2/3 driver *ever* touch anything before the first block group? >>From what you describe, I would certainly be suspicious of the CF >hardware. Well I certainly am, however I am not able to find any way forward to be able to point my finger on it. *t [1] http://web.mit.edu/tytso/www/linux/ext2intro.html (chapter "Physical Structure") From lists at nerdbynature.de Thu Sep 6 19:43:31 2007 From: lists at nerdbynature.de (Christian Kujau) Date: Thu, 6 Sep 2007 21:43:31 +0200 (CEST) Subject: Second Block on Partition overwritten with 0xFF In-Reply-To: References: Message-ID: On Thu, 6 Sep 2007, Tomas Pospisek ML wrote: > default) at 0x400. Thus as I understand it, it *would* be possible for > the ext3 driver to pysically write to those first sectors inside its > partition. ^^^^^^ Yes, ext3 will write *inside* its assigned partition, but not outside. -- BOFH excuse #357: I'd love to help you -- it's just that the Boss won't let me near the computer. From tpo2 at sourcepole.ch Thu Sep 6 21:02:55 2007 From: tpo2 at sourcepole.ch (Tomas Pospisek's Mailing Lists) Date: Thu, 6 Sep 2007 23:02:55 +0200 (CEST) Subject: Second Block on Partition overwritten with 0xFF In-Reply-To: References: Message-ID: On Thu, 6 Sep 2007, Christian Kujau wrote: > On Thu, 6 Sep 2007, Tomas Pospisek ML wrote: >> default) at 0x400. Thus as I understand it, it *would* be possible for >> the ext3 driver to pysically write to those first sectors inside its >> partition. ^^^^^^ > > Yes, ext3 will write *inside* its assigned partition, but not outside. Thanks, however it seems I can not get through what I need to know - sorry for that. I *do* know that ext3 will only write to its partition only. But once mke2fs has run: * will ext2/3 *ever* write to the first 4 sectors on *its* partition? Same question restated: is it possible that ext2/3 will write into the space before the first block group [1]? *t [1] http://web.mit.edu/tytso/www/linux/ext2intro.html (chapter "Physical Structure") -- ----------------------------------------------------------- Tomas Pospisek http://sourcepole.com - Linux & Open Source Solutions ----------------------------------------------------------- From adilger at clusterfs.com Thu Sep 6 23:18:49 2007 From: adilger at clusterfs.com (Andreas Dilger) Date: Thu, 6 Sep 2007 17:18:49 -0600 Subject: Second Block on Partition overwritten with 0xFF In-Reply-To: References: Message-ID: <20070906231849.GV5377@schatzie.adilger.int> On Sep 06, 2007 23:02 +0200, Tomas Pospisek's Mailing Lists wrote: > On Thu, 6 Sep 2007, Christian Kujau wrote: > >On Thu, 6 Sep 2007, Tomas Pospisek ML wrote: > >>default) at 0x400. Thus as I understand it, it *would* be possible for > >>the ext3 driver to pysically write to those first sectors inside its > >>partition. ^^^^^^ > > > >Yes, ext3 will write *inside* its assigned partition, but not outside. > > Thanks, however it seems I can not get through what I need to know - > sorry for that. I *do* know that ext3 will only write to its partition > only. But once mke2fs has run: > > * will ext2/3 *ever* write to the first 4 sectors on *its* partition? > > Same question restated: is it possible that ext2/3 will write into the > space before the first block group [1]? The ext2/3/4 superblock is at offset 1024 bytes. It is written by marking the buffer it is in dirty. If the filesystem blocksize is > 1024 bytes then the whole block will be written to disk (including the first sectors). That said, the buffer cache is coherent when written by the filesystem and when written via /dev/XXX so any modifications made to the first sectors should be rewritten each time the superblock is marked dirty. The ext3 code will never itself modify those sectors. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From tpo2 at sourcepole.ch Fri Sep 7 08:46:56 2007 From: tpo2 at sourcepole.ch (Tomas Pospisek's Mailing Lists) Date: Fri, 7 Sep 2007 10:46:56 +0200 (CEST) Subject: Second Block on Partition overwritten with 0xFF In-Reply-To: <20070906231849.GV5377@schatzie.adilger.int> References: <20070906231849.GV5377@schatzie.adilger.int> Message-ID: On Thu, 6 Sep 2007, Andreas Dilger wrote: > On Sep 06, 2007 23:02 +0200, Tomas Pospisek's Mailing Lists wrote: >> On Thu, 6 Sep 2007, Christian Kujau wrote: >>> On Thu, 6 Sep 2007, Tomas Pospisek ML wrote: >>>> default) at 0x400. Thus as I understand it, it *would* be possible for >>>> the ext3 driver to pysically write to those first sectors inside its >>>> partition. ^^^^^^ >>> >>> Yes, ext3 will write *inside* its assigned partition, but not outside. >> >> Thanks, however it seems I can not get through what I need to know - >> sorry for that. I *do* know that ext3 will only write to its partition >> only. But once mke2fs has run: >> >> * will ext2/3 *ever* write to the first 4 sectors on *its* partition? >> >> Same question restated: is it possible that ext2/3 will write into the >> space before the first block group [1]? > > The ext2/3/4 superblock is at offset 1024 bytes. It is written by marking > the buffer it is in dirty. If the filesystem blocksize is > 1024 bytes > then the whole block will be written to disk (including the first sectors). > > That said, the buffer cache is coherent when written by the filesystem and > when written via /dev/XXX so any modifications made to the first sectors > should be rewritten each time the superblock is marked dirty. The ext3 > code will never itself modify those sectors. Thanks! *t -- ----------------------------------------------------------- Tomas Pospisek http://sourcepole.com - Linux & Open Source Solutions ----------------------------------------------------------- From tpo2 at sourcepole.ch Sun Sep 9 23:06:53 2007 From: tpo2 at sourcepole.ch (Tomas Pospisek's Mailing Lists) Date: Mon, 10 Sep 2007 01:06:53 +0200 (CEST) Subject: Second Block on Partition overwritten with 0xFF In-Reply-To: <20070906231849.GV5377@schatzie.adilger.int> References: <20070906231849.GV5377@schatzie.adilger.int> Message-ID: On Thu, 6 Sep 2007, Andreas Dilger wrote: > On Sep 06, 2007 23:02 +0200, Tomas Pospisek's Mailing Lists wrote: >> On Thu, 6 Sep 2007, Christian Kujau wrote: >>> On Thu, 6 Sep 2007, Tomas Pospisek ML wrote: >>>> default) at 0x400. Thus as I understand it, it *would* be possible for >>>> the ext3 driver to pysically write to those first sectors inside its >>>> partition. ^^^^^^ >>> >>> Yes, ext3 will write *inside* its assigned partition, but not outside. >> >> Thanks, however it seems I can not get through what I need to know - >> sorry for that. I *do* know that ext3 will only write to its partition >> only. But once mke2fs has run: >> >> * will ext2/3 *ever* write to the first 4 sectors on *its* partition? >> >> Same question restated: is it possible that ext2/3 will write into the >> space before the first block group [1]? > > The ext2/3/4 superblock is at offset 1024 bytes. It is written by marking > the buffer it is in dirty. If the filesystem blocksize is > 1024 bytes > then the whole block will be written to disk (including the first sectors). > > That said, the buffer cache is coherent when written by the filesystem and > when written via /dev/XXX so any modifications made to the first sectors > should be rewritten each time the superblock is marked dirty. The ext3 > code will never itself modify those sectors. I just remembered, that once the problem occured when there was very high memory pressure. I.e. the OOM killer went around and killed applications, the machine rebooted, at which point the FS was broken. So a naive ad hoc theory of mine for the FS corruption would be that the FS was unmounted at a moment when processes wouldn't receive any more memory from the OS (due to OOM) and thus umount would flush/write out the first block (I believe it needs to obligatorily clear the dirty FS flag at umount) which it failed to properly allocate before?!? *t -- ----------------------------------------------------------- Tomas Pospisek http://sourcepole.com - Linux & Open Source Solutions ----------------------------------------------------------- From mvolaski at aecom.yu.edu Tue Sep 11 00:32:33 2007 From: mvolaski at aecom.yu.edu (Maurice Volaski) Date: Mon, 10 Sep 2007 20:32:33 -0400 Subject: Spontaneous development of supremely large files on different ext3 filesystems Message-ID: I have come across two files, essentially untouched in years, on two different ext3 filesystems on the same server, Gentoo AMD 64-bit with kernel 2.6.22 and fsck version 1.40.2 currently, spontaneously becoming supremely large: Filesystem one Inode 16257874, i_size is 18014398562775391, should be 53297152 Filesystem two Inode 2121855, i_size is 35184386120704, should be 14032896. Both were discovered during an ordinary backup operation (via EMC Insiginia's Retrospect Linux client). The backup runs daily and so one day, one file must have grew spontaneously to this size and then on another day, it happened to the second file, which is on a second filesystem. The backup attempt generated repeated errors: EXT3-fs warning (device dm-2): ext3_block_to_path: block > big Both filesystems are running on different logical volumes, but underlying that is are drbd network raid devices and underlying that is a RAID 6-based SATA disk array. -- Maurice Volaski, mvolaski at aecom.yu.edu Computing Support, Rose F. Kennedy Center Albert Einstein College of Medicine of Yeshiva University From darkonc at gmail.com Wed Sep 12 00:01:18 2007 From: darkonc at gmail.com (Stephen Samuel) Date: Tue, 11 Sep 2007 17:01:18 -0700 Subject: Spontaneous development of supremely large files on different ext3 filesystems In-Reply-To: References: Message-ID: <6cd50f9f0709111701l6c64f4c8jee4195c430b9fc86@mail.gmail.com> One simple point: bash-3.2$bc -ql obase=16 35184386120704; 14032896 200000D61C00 D62000 18014398562775391; 53297152 400000032D315F 32D4000 The filesize is basically the same, except for the addition of a stray bit, way off in left field. (( Note that both of the 'old' file sizes are multiples of 8K )) On 9/10/07, Maurice Volaski wrote: > I have come across two files, essentially untouched in years, on two > different ext3 filesystems on the same server, Gentoo AMD 64-bit with > kernel 2.6.22 and fsck version 1.40.2 currently, spontaneously > becoming supremely large: > > Filesystem one > Inode 16257874, i_size is 18014398562775391, should be 53297152 > > Filesystem two > Inode 2121855, i_size is 35184386120704, should be 14032896. > > Both were discovered during an ordinary backup operation (via EMC > Insiginia's Retrospect Linux client). > > The backup runs daily and so one day, one file must have grew > spontaneously to this size and then on another day, it happened to > the second file, which is on a second filesystem. The backup attempt > generated repeated errors: > > EXT3-fs warning (device dm-2): ext3_block_to_path: block > big > > Both filesystems are running on different logical volumes, but > underlying that is are drbd network raid devices and underlying that > is a RAID 6-based SATA disk array. > -- > > Maurice Volaski, mvolaski at aecom.yu.edu > Computing Support, Rose F. Kennedy Center > Albert Einstein College of Medicine of Yeshiva University > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users > -- Stephen Samuel http://www.bcgreen.com 778-861-7641 From adilger at clusterfs.com Wed Sep 12 02:58:15 2007 From: adilger at clusterfs.com (Andreas Dilger) Date: Tue, 11 Sep 2007 20:58:15 -0600 Subject: Spontaneous development of supremely large files on different ext3 filesystems In-Reply-To: <6cd50f9f0709111701l6c64f4c8jee4195c430b9fc86@mail.gmail.com> References: <6cd50f9f0709111701l6c64f4c8jee4195c430b9fc86@mail.gmail.com> Message-ID: <20070912025815.GA5377@schatzie.adilger.int> On Sep 11, 2007 17:01 -0700, Stephen Samuel wrote: > One simple point: > bash-3.2$bc -ql > obase=16 > 35184386120704; 14032896 > 200000D61C00 > D62000 > 18014398562775391; 53297152 > 400000032D315F > 32D4000 > > The filesize is basically the same, except for the addition of a stray > bit, way off in left field. Yes, it looks like single-bit corruption of some kind. > (( Note that both of the 'old' file sizes are multiples of 8K )) That is because e2fsck doesn't know the correct size, so just uses the end of the last valid block (it isn't possible to have a "hole" at the end of the file). > On 9/10/07, Maurice Volaski wrote: > > I have come across two files, essentially untouched in years, on two > > different ext3 filesystems on the same server, Gentoo AMD 64-bit with > > kernel 2.6.22 and fsck version 1.40.2 currently, spontaneously > > becoming supremely large: > > > > Filesystem one > > Inode 16257874, i_size is 18014398562775391, should be 53297152 > > > > Filesystem two > > Inode 2121855, i_size is 35184386120704, should be 14032896. > > > > Both were discovered during an ordinary backup operation (via EMC > > Insiginia's Retrospect Linux client). > > > > The backup runs daily and so one day, one file must have grew > > spontaneously to this size and then on another day, it happened to > > the second file, which is on a second filesystem. The backup attempt > > generated repeated errors: > > > > EXT3-fs warning (device dm-2): ext3_block_to_path: block > big > > > > Both filesystems are running on different logical volumes, but > > underlying that is are drbd network raid devices and underlying that > > is a RAID 6-based SATA disk array. > > -- > > > > Maurice Volaski, mvolaski at aecom.yu.edu > > Computing Support, Rose F. Kennedy Center > > Albert Einstein College of Medicine of Yeshiva University > > > > _______________________________________________ > > Ext3-users mailing list > > Ext3-users at redhat.com > > https://www.redhat.com/mailman/listinfo/ext3-users > > > > > -- > Stephen Samuel http://www.bcgreen.com > 778-861-7641 > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From mvolaski at aecom.yu.edu Wed Sep 12 07:05:59 2007 From: mvolaski at aecom.yu.edu (Maurice Volaski) Date: Wed, 12 Sep 2007 03:05:59 -0400 Subject: Spontaneous development of supremely large files on different ext3 filesystems Message-ID: > > (( Note that both of the 'old' file sizes are multiples of 8K )) > >That is because e2fsck doesn't know the correct size, so just uses >the end of the last valid block (it isn't possible to have a "hole" >at the end of the file). It looks like more than 1 bit was different and if I understand this correctly, those other bit changes are the result of this after fact padding by e2fsck. >The filesize is basically the same, except for the addition of a stray >bit, way off in left field. (( Note that both of the 'old' file >Yes, it looks like single-bit corruption of some kind. So does this imply a spontaneous bit flip on a platter? Shouldn't that have been picked by the RAID and twice because there is dual parity (RAID 6)? -- Maurice Volaski, mvolaski at aecom.yu.edu Computing Support, Rose F. Kennedy Center Albert Einstein College of Medicine of Yeshiva University From darkonc at gmail.com Wed Sep 12 07:30:59 2007 From: darkonc at gmail.com (Stephen Samuel) Date: Wed, 12 Sep 2007 00:30:59 -0700 Subject: Spontaneous development of supremely large files on different ext3 filesystems In-Reply-To: References: Message-ID: <6cd50f9f0709120030t568a6999qe3d4cd74829dcd79@mail.gmail.com> It's not clear where the error occured. It may actually be that there was a multi-bit error, and that it was incorrectly 'fixed'. It's also still possible that The spurious bit was flipped somewhere in software -- which wouldn'b be picked up by the RAID parity, because the RAID parity took that flipped bit into account. If you have hardware raid, then it's possible that the bit was flipped during or after transmission, but before parity was calculated.. Check for disk errors in your log files. Generically speaking, I'd be inclined to believe that the lower bits in the large file size are actually the precise size of the file... Check if the size minus the high-order flipped bit is consistent with a logical place to end the file. Note that the bit could have been flipped when a nearby inode (on the same disk/RAID block) was updated The block was read, modified and re-written and during that process, the bit could have been magically flipped. On 9/12/07, Maurice Volaski wrote: > > > (( Note that both of the 'old' file sizes are multiples of 8K )) > > > >That is because e2fsck doesn't know the correct size, so just uses > >the end of the last valid block (it isn't possible to have a "hole" > >at the end of the file). > > It looks like more than 1 bit was different and if I understand this > correctly, those other bit changes are the result of this after fact > padding by e2fsck. > > > >The filesize is basically the same, except for the addition of a stray > >bit, way off in left field. (( Note that both of the 'old' file > > >Yes, it looks like single-bit corruption of some kind. > > So does this imply a spontaneous bit flip on a platter? Shouldn't > that have been picked by the RAID and twice because there is dual > parity (RAID 6)? > -- > > Maurice Volaski, mvolaski at aecom.yu.edu > Computing Support, Rose F. Kennedy Center > Albert Einstein College of Medicine of Yeshiva University > -- Stephen Samuel http://www.bcgreen.com 778-861-7641 From filip.sneppe at gmail.com Wed Sep 12 22:53:40 2007 From: filip.sneppe at gmail.com (Filip Sneppe) Date: Thu, 13 Sep 2007 00:53:40 +0200 Subject: userspace tool to freeze/thaw ext3 to create consistent snapshots Message-ID: <9151ac2a0709121553mf14c40cj44ba10717f9d7406@mail.gmail.com> Hi everyone, Suppose one has an ext3 filesystem on a SAN LUN. I would like to know if it is possible to freeze this ext3 filesystem into something consistent from a filesystem point of view, then trigger the SAN's snapshotting functionality, and then thaw the ext3 filesystem to resume I/O. This would allow near-instantaneous snapshot backups without having to use LVM or LVM snapshots. >From googling around, I understand that: - LVM can take consistent snapshots at the FS level, and from posts to LKML, I understand that all Linux filesystems now support suspending I/O activity during these snapshots as a requirement for LVM snapshotting. - XFS has a userspace command "xfs_freeze [-u] mountpoint" which appears to do exactly what I want to achieve with ext3 - GFS has a userspace command "gfs_tool [un]freeze mountpoint" which appears to do the same thing So, my questions are: - Is this currently possible from userspace with ext3 ? If not, is this hard to write ? (I am not an expert programmer, but from looking at the XFS and GFS userspace code, things are done totally differently) - Given that all Linux FS support suspending of I/O operations, is it technically possible to write a generic userspace tool to do just that: freezing/thawing I/O requests to a mountpoint from userspace, no matter what the underlying FS is ? >From the various posts with similar questions that pop up on various mailing lists and support forums, it would appear to me that there is some level of interest/demand for this type of feature. Regards, Filip From jp at enix.org Wed Sep 12 23:52:34 2007 From: jp at enix.org (=?ISO-8859-1?Q?J=E9r=F4me_Petazzoni?=) Date: Thu, 13 Sep 2007 01:52:34 +0200 Subject: userspace tool to freeze/thaw ext3 to create consistent snapshots In-Reply-To: <9151ac2a0709121553mf14c40cj44ba10717f9d7406@mail.gmail.com> References: <9151ac2a0709121553mf14c40cj44ba10717f9d7406@mail.gmail.com> Message-ID: <46E87BC2.8040007@enix.org> Filip Sneppe wrote: > Hi everyone, > > Suppose one has an ext3 filesystem on a SAN LUN. > I would like to know if it is possible to freeze this ext3 filesystem > into something > consistent from a filesystem point of view, then trigger the SAN's snapshotting > functionality, and then thaw the ext3 filesystem to resume I/O. This > would allow > near-instantaneous snapshot backups without having to use LVM or > LVM snapshots. > I think (but I might be wrong) that this feature is not implemented - yet - for ext3fs. However, LVM2 should allow to create read-write snapshots. Therefore, you can snapshot whenever you want, then mount the snapshot, like you would mount an ? unclean ? filesystem (i.e. the log will be replayed, and that's it). One might argue that this could give unconsistent snapshots ; however (the gurus will tell if I'm wrong) since your programs are running when you do the snapshot, it should not change anything whether you do a ? hot ? snapshot or a ? thawed ? snapshot. regards From adilger at clusterfs.com Thu Sep 13 00:24:35 2007 From: adilger at clusterfs.com (Andreas Dilger) Date: Wed, 12 Sep 2007 18:24:35 -0600 Subject: userspace tool to freeze/thaw ext3 to create consistent snapshots In-Reply-To: <9151ac2a0709121553mf14c40cj44ba10717f9d7406@mail.gmail.com> References: <9151ac2a0709121553mf14c40cj44ba10717f9d7406@mail.gmail.com> Message-ID: <20070913002435.GO5377@schatzie.adilger.int> On Sep 13, 2007 00:53 +0200, Filip Sneppe wrote: > Suppose one has an ext3 filesystem on a SAN LUN. > I would like to know if it is possible to freeze this ext3 filesystem > into something > consistent from a filesystem point of view, then trigger the SAN's snapshotting > functionality, and then thaw the ext3 filesystem to resume I/O. This > would allow > near-instantaneous snapshot backups without having to use LVM or > LVM snapshots. You could probably make an ioctl to ext3 to do this, calling the existing code to flush the journal during snapshots. > So, my questions are: > - Is this currently possible from userspace with ext3 ? If not, is this > hard to write ? (I am not an expert programmer, but from looking at the > XFS and GFS userspace code, things are done totally differently) > - Given that all Linux FS support suspending of I/O operations, > is it technically possible to write a generic userspace tool to do just > that: freezing/thawing I/O requests to a mountpoint from userspace, > no matter what the underlying FS is ? Yes, if there is a common ioctl or syscall - the filesystems themselves have a common method to do the freeze/thaw. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From mb/ext3 at dcs.qmul.ac.uk Mon Sep 17 08:05:58 2007 From: mb/ext3 at dcs.qmul.ac.uk (Matt Bernstein) Date: Mon, 17 Sep 2007 09:05:58 +0100 (BST) Subject: external journals, UUIDs and LVM Message-ID: Hi, I'm starting to use LVM2 on my big servers. I'm using ext3 with data=journal and the journals on different volume groups to the volumes. "tune2fs -l" on the volumes show _both_ the UUID and device major/minor of the journal, irrespective of which you specify during "tunefs -j". With LVM the latter are dymanically allocated (0xfdnn), so may change from boot to boot. My question is: will mount find the journal by UUID? You can see it might be fun if it doesn't ;) Thanks Matt From mvolaski at aecom.yu.edu Mon Sep 17 17:31:11 2007 From: mvolaski at aecom.yu.edu (Maurice Volaski) Date: Mon, 17 Sep 2007 13:31:11 -0400 Subject: Could drbd randomly flip bits? Was: Database page corruption on disk occurring during mysqldump on a fresh database and Was: Spontaneous development of supremely large files on different ext3 filesystems In-Reply-To: <1189815170.20948.ezmlm@lists.mysql.com> References: <1189815170.20948.ezmlm@lists.mysql.com> Message-ID: In using drbd 8.0.5 recently, I have come across at least two instances where a bit on disk apparently flipped spontaneously in the ext3 metadata on volumes running on top of drbd. Also, I have been seeing regular corruption of a mysql database, which runs on top of drbd, and when I reported this as a bug since I also recently upgraded mysql versions, they question whether drbd could be responsible! All the volumes have been fscked recently and there were no reported errors. And, of course, there have been no errors reported from the underlying hardware. I have since upgraded to 8.0.6, but it's too early to say whether there is a change. I'm also seeing the backup server complain of not being files not comparing, though this may be a separate problem on the backup server. The ext-3 bit flipping: At 12:00 PM -0400 9/11/07, ext3-users-request at redhat.com wrote: >I have come across two files, essentially untouched in years, on two >different ext3 filesystems on the same server, Gentoo AMD 64-bit with >kernel 2.6.22 and fsck version 1.40.2 currently, spontaneously >becoming supremely large: > >Filesystem one >Inode 16257874, i_size is 18014398562775391, should be 53297152 > >Filesystem two >Inode 2121855, i_size is 35184386120704, should be 14032896. > >Both were discovered during an ordinary backup operation (via EMC >Insiginia's Retrospect Linux client). > >The backup runs daily and so one day, one file must have grew >spontaneously to this size and then on another day, it happened to >the second file, which is on a second filesystem. The backup attempt >generated repeated errors: > >EXT3-fs warning (device dm-2): ext3_block_to_path: block > big > >Both filesystems are running on different logical volumes, but >underlying that is are drbd network raid devices and underlying that >is a RAID 6-based SATA disk array. The answer to the bug report regarding mysql data corruption, who is blaming drbd! >http://bugs.mysql.com/?id=31038 > > Updated by: Heikki Tuuri > Reported by: Maurice Volaski > Category: Server: InnoDB > Severity: S2 (Serious) > Status: Open > Version: 5.0.48 > OS: Linux > OS Details: Gentoo > Tags: database page corruption locking up corrupt doublewrite > >[17 Sep 18:49] Heikki Tuuri > >Maurice, my first guess is to suspect the RAID-1 driver. My initial report of mysql data corruption: >>A 64-bit Gentoo Linux box had just been upgraded from MySQL 4.1 >>to5.0.44 fresh (by dumping in 4.1 and restoring in 5.0.44) and >>almostimmediately after that, during which time the database was >>not used,a crash occurred during a scripted mysqldump. So I >>restored and dayslater, it happened again. The crash details seem >>to be trying tosuggest some other aspect of the operating system, >>even the memoryor disk is flipping a bit. Or could I be running >>into a bug in thisversion of MySQL? >> >>Here's the output of the crash >>----------------------------------- >>InnoDB: Database page corruption on disk or a failed >>InnoDB: file read of page 533. >>InnoDB: You may have to recover from a backup. >>070827 3:10:04 InnoDB: Page dump in ascii and hex (16384 bytes): >> len 16384; hex >> >>[dump itself deleted >>forbrevity] >> >> >> >> ;InnoDB: End of page dump >>070827 3:10:04 InnoDB: Page checksum >>646563254,prior-to-4.0.14-form checksum 2415947328 >>InnoDB: stored checksum 4187530870, prior-to-4.0.14-form >>storedchecksum 2415947328 >>InnoDB: Page lsn 0 4409041, low 4 bytes of lsn at page end 4409041 >>InnoDB: Page number (if stored to page already) 533, >>InnoDB: space id (if created with >= MySQL-4.1.1 and stored already) 0 >>InnoDB: Page may be an index page where index id is 0 35 >>InnoDB: (index PRIMARY of table elegance/image) >>InnoDB: Database page corruption on disk or a failed >>InnoDB: file read of page 533. >>InnoDB: You may have to recover from a backup. >>InnoDB: It is also possible that your operating >>InnoDB: system has corrupted its own file cache >>InnoDB: and rebooting your computer removes the >>InnoDB: error. >>InnoDB: If the corrupt page is an index page >>InnoDB: you can also try to fix the corruption >>InnoDB: by dumping, dropping, and reimporting >>InnoDB: the corrupt table. You can use CHECK >>InnoDB: TABLE to scan your table for corruption. >>InnoDB: See also >>InnoDB:http://dev.mysql.com/doc/refman/5.0/en/forcing-recovery.html >>InnoDB: about forcing recovery. >InnoDB: Ending processing because of a corrupt database page. -- Maurice Volaski, mvolaski at aecom.yu.edu Computing Support, Rose F. Kennedy Center Albert Einstein College of Medicine of Yeshiva University From adilger at clusterfs.com Mon Sep 17 17:51:56 2007 From: adilger at clusterfs.com (Andreas Dilger) Date: Mon, 17 Sep 2007 11:51:56 -0600 Subject: Could drbd randomly flip bits? Was: Database page corruption on disk occurring during mysqldump on a fresh database and Was: Spontaneous development of supremely large files on different ext3 filesystems In-Reply-To: References: <1189815170.20948.ezmlm@lists.mysql.com> Message-ID: <20070917175156.GO2990@schatzie.adilger.int> On Sep 17, 2007 13:31 -0400, Maurice Volaski wrote: > In using drbd 8.0.5 recently, I have come across at least two > instances where a bit on disk apparently flipped spontaneously in the > ext3 metadata on volumes running on top of drbd. > > Also, I have been seeing regular corruption of a mysql database, > which runs on top of drbd, and when I reported this as a bug since I > also recently upgraded mysql versions, they question whether drbd > could be responsible! Seems unlikely - more likely to be RAM or similar (would include cable for PATA/SCSI but that is less likely an issue for SATA). Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From mvolaski at aecom.yu.edu Mon Sep 17 18:37:06 2007 From: mvolaski at aecom.yu.edu (Maurice Volaski) Date: Mon, 17 Sep 2007 14:37:06 -0400 Subject: Could drbd randomly flip bits? Was: Database page corruption on disk occurring during mysqldump on a fresh database and Was: Spontaneous development of supremely large files on different ext3 filesystems In-Reply-To: <20070917175156.GO2990@schatzie.adilger.int> References: <1189815170.20948.ezmlm@lists.mysql.com> <20070917175156.GO2990@schatzie.adilger.int> Message-ID: >On Sep 17, 2007 13:31 -0400, Maurice Volaski wrote: >> In using drbd 8.0.5 recently, I have come across at least two >> instances where a bit on disk apparently flipped spontaneously in the >> ext3 metadata on volumes running on top of drbd. >> >> Also, I have been seeing regular corruption of a mysql database, >> which runs on top of drbd, and when I reported this as a bug since I >> also recently upgraded mysql versions, they question whether drbd >> could be responsible! > >Seems unlikely - more likely to be RAM or similar (would include cable >for PATA/SCSI but that is less likely an issue for SATA). > Shouldn't trip the ECC and produce machine check exceptions and ones that were unrecoverable? The disks are part of hardware RAID with a SATA II cableless backplane and SATA-SCSI controller, so there is a SCSI cable and SCSI HBA (LSI Logic). -- Maurice Volaski, mvolaski at aecom.yu.edu Computing Support, Rose F. Kennedy Center Albert Einstein College of Medicine of Yeshiva University From mvolaski at aecom.yu.edu Mon Sep 17 19:32:04 2007 From: mvolaski at aecom.yu.edu (Maurice Volaski) Date: Mon, 17 Sep 2007 15:32:04 -0400 Subject: Could drbd randomly flip bits? Was: Database page corruption on disk occurring during mysqldump on a fresh database and Was: Spontaneous development of supremely large files on different ext3 filesystems In-Reply-To: <46EECE74.7010103@provenscaling.com> References: <1189815170.20948.ezmlm@lists.mysql.com> <46EECE74.7010103@provenscaling.com> Message-ID: >Hi Maurice, > >If you're running into corruption both in ext3 metadata and in MySQL >data, it is certainly not he fault of MySQL as you're likely aware. I am hoping they are not related. The problems with MySQL surfaced almost immediately after upgrading to 5.0.x. >[details deleted] > >You can see that there are in fact many bits flipped in each. I >would suspect higher-level corruption than I initially thought this as well, but the explanation on the ext3 mailing list is that it really is just a lone flipped bit in both instances. The other differences are due to fsck padding out the block when it guesses what the correct size is. >Do note that data on e.g. the PCI bus is not protected by any sort >of checksum. I've seen this cause corruption problems with PCI >risers and RAID cards. Are you using a PCI riser card? Note that >LSI does *not* certify their cards to be used on risers if you are >custom building a machine. > Yes, there is a riser card. Wouldn't this imply that LSI is saying you can't use a 1U or a 2U box? It's kind of scary there is no end-to-end parity implemented somewhere along the whole data path to prevent this. It sort of defeats the point of RAID 6 and ECC. How did you determine this was the cause? >Do you mean a Serially-Attached SCSI aka SAS controller, I assume? No, it's SATA to SCSI. >Is this a custom build machine or a vendor integrated one? It is custom-built. > >Maurice Volaski wrote: >>In using drbd 8.0.5 recently, I have come across at least two >>instances where a bit on disk apparently flipped spontaneously in >>the ext3 metadata on volumes running on top of drbd. >> >>Also, I have been seeing regular corruption of a mysql database, >>which runs on top of drbd, and when I reported this as a bug since >>I also recently upgraded mysql versions, they question whether drbd >>could be responsible! >> >>All the volumes have been fscked recently and there were no >>reported errors. And, of course, there have been no errors reported >>from the underlying hardware. >> >>I have since upgraded to 8.0.6, but it's too early to say whether >>there is a change. >> >>I'm also seeing the backup server complain of not being files not >>comparing, though this may be a separate problem on the backup >>server. >> >> >> >>The ext-3 bit flipping: >>At 12:00 PM -0400 9/11/07, ext3-users-request at redhat.com wrote: >>>I have come across two files, essentially untouched in years, on two >>>different ext3 filesystems on the same server, Gentoo AMD 64-bit with >>>kernel 2.6.22 and fsck version 1.40.2 currently, spontaneously >>>becoming supremely large: >>> >>>Filesystem one >>>Inode 16257874, i_size is 18014398562775391, should be 53297152 >>> >>>Filesystem two >>>Inode 2121855, i_size is 35184386120704, should be 14032896. >>> >>>Both were discovered during an ordinary backup operation (via EMC >>>Insiginia's Retrospect Linux client). >>> >>>The backup runs daily and so one day, one file must have grew >>>spontaneously to this size and then on another day, it happened to >>>the second file, which is on a second filesystem. The backup attempt >>>generated repeated errors: >>> >>>EXT3-fs warning (device dm-2): ext3_block_to_path: block > big >>> >>>Both filesystems are running on different logical volumes, but >>>underlying that is are drbd network raid devices and underlying that >>>is a RAID 6-based SATA disk array. >> >> >> >>The answer to the bug report regarding mysql data corruption, who >>is blaming drbd! >>>http://bugs.mysql.com/?id=31038 >>> >>> Updated by: Heikki Tuuri >>> Reported by: Maurice Volaski >>> Category: Server: InnoDB >>> Severity: S2 (Serious) >>> Status: Open >>> Version: 5.0.48 >>> OS: Linux >>> OS Details: Gentoo >>> Tags: database page corruption locking up corrupt doublewrite >>> >>>[17 Sep 18:49] Heikki Tuuri >>> >>>Maurice, my first guess is to suspect the RAID-1 driver. >> >> >>My initial report of mysql data corruption: >>>>A 64-bit Gentoo Linux box had just been upgraded from MySQL 4.1 >>>>to5.0.44 fresh (by dumping in 4.1 and restoring in 5.0.44) and >>>>almostimmediately after that, during which time the database was >>>>not used,a crash occurred during a scripted mysqldump. So I >>>>restored and dayslater, it happened again. The crash details seem >>>>to be trying tosuggest some other aspect of the operating system, >>>>even the memoryor disk is flipping a bit. Or could I be running >>>>into a bug in thisversion of MySQL? >>>> >>>>Here's the output of the crash >>>>----------------------------------- >>>>InnoDB: Database page corruption on disk or a failed >>>>InnoDB: file read of page 533. >>>>InnoDB: You may have to recover from a backup. >>>>070827 3:10:04 InnoDB: Page dump in ascii and hex (16384 bytes): >>>> len 16384; hex >>>> >>>>[dump itself deleted >>>>forbrevity] >>>> >>>> >>>> >>>> >>>> ;InnoDB: End of page dump >>>>070827 3:10:04 InnoDB: Page checksum >>>>646563254,prior-to-4.0.14-form checksum 2415947328 >>>>InnoDB: stored checksum 4187530870, prior-to-4.0.14-form >>>>storedchecksum 2415947328 >>>>InnoDB: Page lsn 0 4409041, low 4 bytes of lsn at page end 4409041 >>>>InnoDB: Page number (if stored to page already) 533, >>>>InnoDB: space id (if created with >= MySQL-4.1.1 and stored already) 0 >>>>InnoDB: Page may be an index page where index id is 0 35 >>>>InnoDB: (index PRIMARY of table elegance/image) >>>>InnoDB: Database page corruption on disk or a failed >>>>InnoDB: file read of page 533. >>>>InnoDB: You may have to recover from a backup. >>>>InnoDB: It is also possible that your operating >>>>InnoDB: system has corrupted its own file cache >>>>InnoDB: and rebooting your computer removes the >>>>InnoDB: error. >>>>InnoDB: If the corrupt page is an index page >>>>InnoDB: you can also try to fix the corruption >>>>InnoDB: by dumping, dropping, and reimporting >>>>InnoDB: the corrupt table. You can use CHECK >>>>InnoDB: TABLE to scan your table for corruption. >>>>InnoDB: See also >>>>InnoDB:http://dev.mysql.com/doc/refman/5.0/en/forcing-recovery.html >>>>InnoDB: about forcing recovery. >>>InnoDB: Ending processing because of a corrupt database page. >> > >-- >high performance mysql consulting >www.provenscaling.com -- Maurice Volaski, mvolaski at aecom.yu.edu Computing Support, Rose F. Kennedy Center Albert Einstein College of Medicine of Yeshiva University From darkonc at gmail.com Mon Sep 17 19:59:11 2007 From: darkonc at gmail.com (Stephen Samuel) Date: Mon, 17 Sep 2007 12:59:11 -0700 Subject: Could drbd randomly flip bits? Was: Database page corruption on disk occurring during mysqldump on a fresh database and Was: Spontaneous development of supremely large files on different ext3 filesystems In-Reply-To: <20070917175156.GO2990@schatzie.adilger.int> References: <1189815170.20948.ezmlm@lists.mysql.com> <20070917175156.GO2990@schatzie.adilger.int> Message-ID: <6cd50f9f0709171259l5e1507b7k346baeb6d55349e6@mail.gmail.com> You may want to excercise your I/O subsystem. Given that you probably don't want to stomp on a live filesystem, you might want to create a file of a couple of gigabytes and turn it into a pseudo-device with 'lofs(1)'. EG: # make a 15GB test file dd if=/dev/zero of=the_testfile bs=1M count=15000 # find a free loopback pseudo-device device=`losetup -f` # attach it to the 5GB test file losetup $device the_testfile # exercise this block of data nice badblocks -w -p5 $device # 5 passes of a read-write test. The other thing to do would be a memory test, to makes sure that there's not something very wrong with your memory subsystem. I think that there are tools that can do a *partial* memtest on a live system, but a (really) quick look didn't find them. Most distributions have a memtest boot option which runs a (reasonably) complete memory test. On 9/17/07, Andreas Dilger wrote: > On Sep 17, 2007 13:31 -0400, Maurice Volaski wrote: > > In using drbd 8.0.5 recently, I have come across at least two > > instances where a bit on disk apparently flipped spontaneously in the > > ext3 metadata on volumes running on top of drbd. > > > > Also, I have been seeing regular corruption of a mysql database, > > which runs on top of drbd, and when I reported this as a bug since I > > also recently upgraded mysql versions, they question whether drbd > > could be responsible! > > Seems unlikely - more likely to be RAM or similar (would include cable > for PATA/SCSI but that is less likely an issue for SATA). > > Cheers, Andreas > -- > Andreas Dilger > Principal Software Engineer > Cluster File Systems, Inc. > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users > -- Stephen Samuel http://www.bcgreen.com 778-861-7641 From mvolaski at aecom.yu.edu Mon Sep 17 22:47:33 2007 From: mvolaski at aecom.yu.edu (Maurice Volaski) Date: Mon, 17 Sep 2007 18:47:33 -0400 Subject: Could drbd randomly flip bits? Was: Database page corruption on disk occurring during mysqldump on a fresh database and Was: Spontaneous development of supremely large files on different ext3 filesystems In-Reply-To: <46EEDC7F.1090903@provenscaling.com> References: <1189815170.20948.ezmlm@lists.mysql.com> <46EECE74.7010103@provenscaling.com> <46EEDC7F.1090903@provenscaling.com> Message-ID: I guess I will watch it closely for now and if it trips up again failover to the drbd peer and see what happens there. I suppose I could even deattach the local disks and have it run using the peer over the wire. That should eliminate the local I/O subsystem. >>It's kind of scary there is no end-to-end parity implemented >>somewhere along the whole data path to prevent this. It sort of >>defeats the point of RAID 6 and ECC. > >I agree, it's pretty damn scary. You can read about the story and >the ensuing discussion here: I wonder if drbd could help out with that. >Interesting. I hadn't heard of such a thing until I just looked it >up. But in any case that adds yet another variable (and a fairly >uncommon one) to the mix. > It's this one: http://www.acnc.com/02_01_jetstor_sata_416s.html. I thought units like it are very popular. -- Maurice Volaski, mvolaski at aecom.yu.edu Computing Support, Rose F. Kennedy Center Albert Einstein College of Medicine of Yeshiva University From mvolaski at aecom.yu.edu Tue Sep 18 05:50:03 2007 From: mvolaski at aecom.yu.edu (Maurice Volaski) Date: Tue, 18 Sep 2007 01:50:03 -0400 Subject: Could drbd randomly flip bits? Was: Database page corruption on disk occurring during mysqldump on a fresh database and Was: Spontaneous development of supremely large files on different ext3 filesystems Message-ID: I failed over the server and ran a short backup and there were no "didn't compare" errors where on the first server, they are there pretty reliably. I guess this confirms some hardware on the first server is flipping bits. Essentially, users could have any number of munged files (most files are binary) since the problem surfaced a few weeks ago, and there'd be know way to know. Unfortunately, the secondary server was off for a short time at one point, so even if the munging were taken place on the I/O subsystem and not in RAM, it is possible that some blocks got copied badly to the secondary server. Anyway, it seems the problem is definitely hardware and not due to either ext3, drbd or mysql! -- Maurice Volaski, mvolaski at aecom.yu.edu Computing Support, Rose F. Kennedy Center Albert Einstein College of Medicine of Yeshiva University From jeremy at provenscaling.com Mon Sep 17 18:59:00 2007 From: jeremy at provenscaling.com (Jeremy Cole) Date: Mon, 17 Sep 2007 11:59:00 -0700 Subject: Could drbd randomly flip bits? Was: Database page corruption on disk occurring during mysqldump on a fresh database and Was: Spontaneous development of supremely large files on different ext3 filesystems In-Reply-To: References: <1189815170.20948.ezmlm@lists.mysql.com> Message-ID: <46EECE74.7010103@provenscaling.com> Hi Maurice, If you're running into corruption both in ext3 metadata and in MySQL data, it is certainly not he fault of MySQL as you're likely aware. There are absolutely many places where corruption could occur between MySQL and the physical bits on disk. The corruption you're seeing does not appear to be just "flipped bits", although I guess any corruption could be called that. If you compare the two i_sizes you see from below: >> Inode 16257874, i_size is 18014398562775391, should be 53297152 53297152: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0011 0010 1101 0100 0000 0000 0000 18014398562775391: 0000 0000 0100 0000 0000 0000 0000 0000 0000 0011 0010 1101 0011 0001 0101 1111 Differences: 10 x 0->1, 1 x 1->0. >> Inode 2121855, i_size is 35184386120704, should be 14032896. 14032896: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1101 0110 0010 0000 0000 0000 35184386120704: 0000 0000 0000 0000 0010 0000 0000 0000 0000 0000 1101 0110 0001 1100 0000 0000 Differences: 4 x 0->1, 1 x 1->0 You can see that there are in fact many bits flipped in each. I would suspect higher-level corruption than the actual disks (typical single bit or double bit flips, and generally 1->0 only) but lower than the OS (typical entire page corruptions of 4k-64k). That leaves network, SATA controller, various system buses, and possibly stupid errors in DRBD (although I'd call this unlikely). Do note that data on e.g. the PCI bus is not protected by any sort of checksum. I've seen this cause corruption problems with PCI risers and RAID cards. Are you using a PCI riser card? Note that LSI does *not* certify their cards to be used on risers if you are custom building a machine. Regards, Jeremy Maurice Volaski wrote: > In using drbd 8.0.5 recently, I have come across at least two > instances where a bit on disk apparently flipped spontaneously in the > ext3 metadata on volumes running on top of drbd. > > Also, I have been seeing regular corruption of a mysql database, > which runs on top of drbd, and when I reported this as a bug since I > also recently upgraded mysql versions, they question whether drbd > could be responsible! > > All the volumes have been fscked recently and there were no reported > errors. And, of course, there have been no errors reported from the > underlying hardware. > > I have since upgraded to 8.0.6, but it's too early to say whether > there is a change. > > I'm also seeing the backup server complain of not being files not > comparing, though this may be a separate problem on the backup server. > > > > The ext-3 bit flipping: > At 12:00 PM -0400 9/11/07, ext3-users-request at redhat.com wrote: >> I have come across two files, essentially untouched in years, on two >> different ext3 filesystems on the same server, Gentoo AMD 64-bit with >> kernel 2.6.22 and fsck version 1.40.2 currently, spontaneously >> becoming supremely large: >> >> Filesystem one >> Inode 16257874, i_size is 18014398562775391, should be 53297152 >> >> Filesystem two >> Inode 2121855, i_size is 35184386120704, should be 14032896. >> >> Both were discovered during an ordinary backup operation (via EMC >> Insiginia's Retrospect Linux client). >> >> The backup runs daily and so one day, one file must have grew >> spontaneously to this size and then on another day, it happened to >> the second file, which is on a second filesystem. The backup attempt >> generated repeated errors: >> >> EXT3-fs warning (device dm-2): ext3_block_to_path: block > big >> >> Both filesystems are running on different logical volumes, but >> underlying that is are drbd network raid devices and underlying that >> is a RAID 6-based SATA disk array. > > > > The answer to the bug report regarding mysql data corruption, who is > blaming drbd! >> http://bugs.mysql.com/?id=31038 >> >> Updated by: Heikki Tuuri >> Reported by: Maurice Volaski >> Category: Server: InnoDB >> Severity: S2 (Serious) >> Status: Open >> Version: 5.0.48 >> OS: Linux >> OS Details: Gentoo >> Tags: database page corruption locking up corrupt doublewrite >> >> [17 Sep 18:49] Heikki Tuuri >> >> Maurice, my first guess is to suspect the RAID-1 driver. > > > My initial report of mysql data corruption: >>> A 64-bit Gentoo Linux box had just been upgraded from MySQL 4.1 >>> to5.0.44 fresh (by dumping in 4.1 and restoring in 5.0.44) and >>> almostimmediately after that, during which time the database was >>> not used,a crash occurred during a scripted mysqldump. So I >>> restored and dayslater, it happened again. The crash details seem >>> to be trying tosuggest some other aspect of the operating system, >>> even the memoryor disk is flipping a bit. Or could I be running >>> into a bug in thisversion of MySQL? >>> >>> Here's the output of the crash >>> ----------------------------------- >>> InnoDB: Database page corruption on disk or a failed >>> InnoDB: file read of page 533. >>> InnoDB: You may have to recover from a backup. >>> 070827 3:10:04 InnoDB: Page dump in ascii and hex (16384 bytes): >>> len 16384; hex >>> >>> [dump itself deleted >>> forbrevity] >>> >>> >>> >>> ;InnoDB: End of page dump >>> 070827 3:10:04 InnoDB: Page checksum >>> 646563254,prior-to-4.0.14-form checksum 2415947328 >>> InnoDB: stored checksum 4187530870, prior-to-4.0.14-form >>> storedchecksum 2415947328 >>> InnoDB: Page lsn 0 4409041, low 4 bytes of lsn at page end 4409041 >>> InnoDB: Page number (if stored to page already) 533, >>> InnoDB: space id (if created with >= MySQL-4.1.1 and stored already) 0 >>> InnoDB: Page may be an index page where index id is 0 35 >>> InnoDB: (index PRIMARY of table elegance/image) >>> InnoDB: Database page corruption on disk or a failed >>> InnoDB: file read of page 533. >>> InnoDB: You may have to recover from a backup. >>> InnoDB: It is also possible that your operating >>> InnoDB: system has corrupted its own file cache >>> InnoDB: and rebooting your computer removes the >>> InnoDB: error. >>> InnoDB: If the corrupt page is an index page >>> InnoDB: you can also try to fix the corruption >>> InnoDB: by dumping, dropping, and reimporting >>> InnoDB: the corrupt table. You can use CHECK >>> InnoDB: TABLE to scan your table for corruption. >>> InnoDB: See also >>> InnoDB:http://dev.mysql.com/doc/refman/5.0/en/forcing-recovery.html >>> InnoDB: about forcing recovery. >> InnoDB: Ending processing because of a corrupt database page. > -- high performance mysql consulting www.provenscaling.com From jeremy at provenscaling.com Mon Sep 17 19:00:29 2007 From: jeremy at provenscaling.com (Jeremy Cole) Date: Mon, 17 Sep 2007 12:00:29 -0700 Subject: Could drbd randomly flip bits? Was: Database page corruption on disk occurring during mysqldump on a fresh database and Was: Spontaneous development of supremely large files on different ext3 filesystems In-Reply-To: References: <1189815170.20948.ezmlm@lists.mysql.com> <20070917175156.GO2990@schatzie.adilger.int> Message-ID: <46EECECD.80907@provenscaling.com> Hi Maurice, Do you mean a Serially-Attached SCSI aka SAS controller, I assume? Is this a custom build machine or a vendor integrated one? Regards, Jeremy Maurice Volaski wrote: >> On Sep 17, 2007 13:31 -0400, Maurice Volaski wrote: >>> In using drbd 8.0.5 recently, I have come across at least two >>> instances where a bit on disk apparently flipped spontaneously in the >>> ext3 metadata on volumes running on top of drbd. >>> >>> Also, I have been seeing regular corruption of a mysql database, >>> which runs on top of drbd, and when I reported this as a bug since I >>> also recently upgraded mysql versions, they question whether drbd >>> could be responsible! >> Seems unlikely - more likely to be RAM or similar (would include cable >> for PATA/SCSI but that is less likely an issue for SATA). >> > > Shouldn't trip the ECC and produce machine check exceptions and ones > that were unrecoverable? > > The disks are part of hardware RAID with a SATA II cableless > backplane and SATA-SCSI controller, so there is a SCSI cable and SCSI > HBA (LSI Logic). -- high performance mysql consulting www.provenscaling.com From jeremy at provenscaling.com Mon Sep 17 19:58:55 2007 From: jeremy at provenscaling.com (Jeremy Cole) Date: Mon, 17 Sep 2007 12:58:55 -0700 Subject: Could drbd randomly flip bits? Was: Database page corruption on disk occurring during mysqldump on a fresh database and Was: Spontaneous development of supremely large files on different ext3 filesystems In-Reply-To: References: <1189815170.20948.ezmlm@lists.mysql.com> <46EECE74.7010103@provenscaling.com> Message-ID: <46EEDC7F.1090903@provenscaling.com> Hi Maurice, >> If you're running into corruption both in ext3 metadata and in MySQL >> data, it is certainly not he fault of MySQL as you're likely aware. > > I am hoping they are not related. The problems with MySQL surfaced > almost immediately after upgrading to 5.0.x. It's possible that they are not related, but it could even be 5.0 specific but still not a MySQL bug. I.e. MySQL 5.0 could be doing something that steps on the bug and causes it to occur. But, it's hard to say anything for sure. Nonetheless, I generally don't bother worrying about the possibility of MySQL bugs until I'm sure that the OS and hardware are stable. >> You can see that there are in fact many bits flipped in each. I >> would suspect higher-level corruption than > > I initially thought this as well, but the explanation on the ext3 > mailing list is that it really is just a lone flipped bit in both > instances. The other differences are due to fsck padding out the > block when it guesses what the correct size is. Interesting. Can you forward that mail to me personally, or summarize for the list? I'd be interested to read the explanation. >> Do note that data on e.g. the PCI bus is not protected by any sort >> of checksum. I've seen this cause corruption problems with PCI >> risers and RAID cards. Are you using a PCI riser card? Note that >> LSI does *not* certify their cards to be used on risers if you are >> custom building a machine. > > Yes, there is a riser card. Wouldn't this imply that LSI is saying > you can't use a 1U or a 2U box? Kind of. Presumably you would be buying a vendor integrated solution where they have certified that the riser card and RAID card are compatible. Presumably. You'll also notice that most vendors are moving to controllers that aren't PCI{,-E,-X} slot based, and rather connect directly to a low-profile integrated slot. This removes a few variables. (And frees up some space.) > It's kind of scary there is no end-to-end parity implemented > somewhere along the whole data path to prevent this. It sort of > defeats the point of RAID 6 and ECC. I agree, it's pretty damn scary. You can read about the story and the ensuing discussion here: http://jcole.us/blog/archives/2006/09/04/on-1u-cases-pci-risers-and-lsi-megaraid/ > How did you determine this was the cause? Isolating lots of variables. The customer in question had a workload that could reproduce the problem reliably, although not in the same place or same time to be able to track things down, and not under debug mode (which likely slowed things down enough to not cause trouble). I finally suggested that they isolate the riser card as a variable by plugging it directly into the slot. Since it was a 1U machine, it required taking the metal frame off the card and leaving the case open (and hanging out into the datacenter aisle). it could then be shown that with riser, corruption always occurred, and without the riser, corruption never occurred. Obviously, running the machines with cases open and cards plugged in directly was not an option, so the only other possible option was chosen: move to all new hardware with integrated RAID. (HP and their integrated SmartArray/cciss controller was chosen as a vendor in this case.) >> Do you mean a Serially-Attached SCSI aka SAS controller, I assume? > > No, it's SATA to SCSI. Interesting. I hadn't heard of such a thing until I just looked it up. But in any case that adds yet another variable (and a fairly uncommon one) to the mix. Regards, Jeremy -- high performance mysql consulting www.provenscaling.com From adilger at clusterfs.com Wed Sep 19 17:11:06 2007 From: adilger at clusterfs.com (Andreas Dilger) Date: Wed, 19 Sep 2007 11:11:06 -0600 Subject: Could drbd randomly flip bits? Was: Database page corruption on disk occurring during mysqldump on a fresh database and Was: Spontaneous development of supremely large files on different ext3 filesystems In-Reply-To: <46EECE74.7010103@provenscaling.com> References: <1189815170.20948.ezmlm@lists.mysql.com> <46EECE74.7010103@provenscaling.com> Message-ID: <20070919171106.GO32520@schatzie.adilger.int> On Sep 17, 2007 11:59 -0700, Jeremy Cole wrote: > >> Inode 16257874, i_size is 18014398562775391, should be 53297152 > > 53297152: > > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0011 0010 1101 0100 0000 0000 0000 > > 18014398562775391: > > 0000 0000 0100 0000 0000 0000 0000 0000 > 0000 0011 0010 1101 0011 0001 0101 1111 Actually, since e2fsck doesn't know the right file size, it rounds to the end of the last valid block, hence some of the last 12 bits flipped and an increment. > >> Inode 2121855, i_size is 35184386120704, should be 14032896. > > 14032896: > > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 1101 0110 0010 0000 0000 0000 > > 35184386120704: > > 0000 0000 0000 0000 0010 0000 0000 0000 > 0000 0000 1101 0110 0001 1100 0000 0000 Same. > I would > suspect higher-level corruption than the actual disks (typical single > bit or double bit flips, and generally 1->0 only) but lower than the OS > (typical entire page corruptions of 4k-64k). > > That leaves network, SATA controller, various system buses, and possibly > stupid errors in DRBD (although I'd call this unlikely). > > Do note that data on e.g. the PCI bus is not protected by any sort of > checksum. I've seen this cause corruption problems with PCI risers and > RAID cards. Are you using a PCI riser card? Note that LSI does *not* > certify their cards to be used on risers if you are custom building a > machine. > Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From tango at tiac.net Thu Sep 20 21:36:17 2007 From: tango at tiac.net (Thomas Watt) Date: Thu, 20 Sep 2007 17:36:17 -0400 (EDT) Subject: How are alternate superblocks repaired? Message-ID: <23459344.1190324177569.JavaMail.root@mswamui-backed.atl.sa.earthlink.net> Hi, Using dumpe2fs I have been able to determine that all of my alternate ext3 superblocks are corrupted (not clean), and only the primary superblock is valid, i.e. mount works and the ordered journal is applied. When the primary superblock gets flakey, i.e. the ext_attr Filesystem feature goes missing - not sure why this occurs. At this point, the mount does not apply the journal using the primary superblock and mount completes without it. Usually, I will resort to booting up the FC3 OS hard drive on which the ext3 filesystem resides to fix at least the primary superblock via fsck. This situation is just the reverse of the normal assumptions the kernel and filesystem make in their design, i.e. that the alternate superblocks remain intact when the primary is hosed - not a good place to be, and evidence that the situation can occur. I do not think that this is a kernel bug, but possibly an omission since it never spawns a kernel process (during idle time) to check the consistency of all of the superblocks in the filesystem, i.e. self-diagnosing and repair, during idle time - surely this would improve the reliability of the filesystem. Just a random thought I had while thinking about the problem. When I have run fsck on boot up of the FC3 OS, that seems to repair the primary superblock, but the alternates are never repaired to be consistent with the primary superblock - that's all fsck ever seems to do. Why does fsck not repair the alternate superblocks when it has opportunity to do so? Shouldn't fsck at least detect the inconsistency with the kernel assuptions that alternate superblocks are valid, and only the primary superblock needs to be repaired after something catastrophic occurs? Shouldn't the inconsistency be reported - at the very least? Or, shouldn't there be an option to direct fsck to fix alternate superblock inconsistency, if so desired? One would think so. The Linux disk in question is an 80GB SATA drive, with an ext3 filesystem where the Filesystem features are: has_journal, ext_attr, filetype, sparse_super with the good primary superblock. The alternate superblocks all are absent the ext_attr feature, and also, the primary maximum mount count is -1 when the primary superblock goes flakey. Normally, I do not boot up the FC3 OS, but mount the disk from a Live CD to move data into the Live CD environment. The FC3 kernel is a 2.6.10-1 version. The Live CD kernels are either 2.6.15.6 or 2.6.20-16-generic. Also, the Live CD e2fsprogs are 1.40 WIP for the 2.6.20-16-generic kernel vs. 1.38 for the 2.6.15.6 kernel (both of the 2.6 kernels are not FCn OS). Interestingly, the problem (flakey primary superblock where the journal is not applied) does not manifest with the 2.6.15.6 kernel Live CD, but only with the 2.6.20-16-generic kernel Live CD which I usually run currently. Recently, because I do not know the origin of the problem, I have resorted to issuing three sync commands from the Live CD environment after I have moved data to the FC3 ext3 journal filesystem (mounted with -o sync) prior to issuing a umount command. At least the file system buffers will be flushed. I do not know if not doing this previously may have contributed to the initial problem of the primary superblock going flakey or not. Will the command: e2fsck -fp /dev/sdb2 repair the alternate superblocks, and if so, should it only be run from the Live CD environment? Or, do I need to get into runlevel 1 as single user to issue the command after unmounting the hard drive in order to run it? Or, will a dd command using skip and seek for the primary and alternate superblocks correct their corruption, as in the following example: For the purpose of this example, here is a truncated list of the primary and 1st alternative superblocks from the output of the dumpe2fs command: Primary superblock at 0, Group descriptors at 1-5 Backup superblock at 32768, Group descriptors at 32769-32773 Given: FS blocksize=4096; primary superblock at=0; 1st alternative superblock at=32768 and size of superblock=1024 <=== Is this correct??? To copy the 1st backup superblock (assuming it is clean) to fix primary superblock: # dd if=/dev/sdb2 of=/dev/sdbn bs=1024 skip=32768 count=1 To copy the primary superblock (assuming it is clean) to fix the 1st backup superblock: # dd if=/dev/sdb2 of=/dev/sdbn bs=1024 seek=32768 count=1 I am leary of using the dd commands to effect the repairs to the alternate superblocks - will they work or hose the filesystem completely? My guess is that they will hose the filesystem completely. Is this correct, and why? Also, if the e2fsck -fp /dev/sdb2 command will not repair the alternate superblocks, what tool will - debugfs? And how do I use it to make the repairs? In the event that no tool will repair the alternate superblocks, what process can I use to effect the repairs to the alternate superblocks so that they can finally be in accord with the original design assumptions for the kernel and ext3 filesystem (consistent with the primary superblock)? -- Tom From swapana_ghosh at yahoo.com Sun Sep 23 04:34:16 2007 From: swapana_ghosh at yahoo.com (Swapana Ghosh) Date: Sat, 22 Sep 2007 21:34:16 -0700 (PDT) Subject: ext3 file system becoming read only Message-ID: <241858.40398.qm@web58309.mail.re3.yahoo.com> Hi In our office environment few servers mostly database servers and yesterday it happened for one application server(first time) the partion is getting "read only". I was checking the archives, found may be similar kind of issues in the 2007-July archives. But how it has been solved if someone describes me that will be really helpful. In our case, just at the problem started found the line in log file as follows: EXT3-fs error (device dm-12): edxt3_find_entry: reading directory #2015496 offset 2 Then one blank line Then the line is Aborting journal on device dm-12. ext3_abort called Ext3-fs error (device dm-12): ext3_journal_start_sb: Detected aborted journal Remounting filesysem read-only Then the continuous line as follows: EXT3-fs error (device dm-12) in start_transaction: Journal has aborted The above message is continuous until we remount the filesystem and partion becomes 'read-write'. We could not figure it out what is the root cause of the system. We are using individual EMC luns and are configured with LVM volume groups and then mounted on logical volumes. Here i am giving the server description: ____________________________________________________________ [root at server ~]# lsmod |grep -i qla qla2300 130304 0 qla2xxx_conf 305924 0 qla2xxx 307448 21 qla2300 scsi_mod 117709 5 sg,emcp,qla2xxx,cciss,sd_mod ____________________________________________________________ [root at server ~]# cat /etc/modprobe.conf alias eth0 tg3 alias eth1 tg3 alias eth2 e1000 alias eth3 e1000 alias eth4 e1000 alias eth5 e1000 alias bond0 bonding alias scsi_hostadapter cciss options bond0 max_bonds=2 miimon=100 mode=1 alias scsi_hostadapter1 qla2xxx alias scsi_hostadapter2 qla2xxx_conf #alias scsi_hostadapter3 qla6312 options qla2xxx ql2xmaxqdepth=16 qlport_down_retry=64 ql2xloginretrycount=30 ql2xfailover=0 ql2xlbType=0 install qla2xxx /sbin/modprobe qla2xxx_conf; /sbin/modprobe --ignore-install qla2xxx remove qla2xxx /sbin/modprobe -r --first-time --ignore-remove qla2xxx && { /sbin/modprobe -r --ignore-remove qla2xxx_conf; } ###BEGINPP include /etc/modprobe.conf.pp ###ENDPP ###BEGINPP include /etc/modprobe.conf.pp ###ENDPP ###BEGINPP include /etc/modprobe.conf.pp ###ENDPP ________________________________________________ [root at server ~]# rpm -qa |grep -i EMC EMCpower.LINUX-4.5.1-022 ________________________________________________ [root at server ~]# rpm -qa|grep -i scli scli-1.06.16-57 ________________________________________________ [root at server ~]# rpm -qa|grep -i nav naviagentcli-6.19.1.3.0-1 ________________________________________________ product: QLA2312 Fibre Channel Adapter ________________________________________________ [root at server ~]# rpm -qa|grep -i lvm lvm2-2.02.06-6.0.RHEL4 system-config-lvm-1.0.19-1.0 ________________________________________________ If I missed any info, pl. let me know. It would be really appreciated if I get some hints to solve the issues Thanks in advance -swapana ____________________________________________________________________________________ Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games. http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow From darkonc at gmail.com Sun Sep 23 07:27:17 2007 From: darkonc at gmail.com (Stephen Samuel) Date: Sun, 23 Sep 2007 00:27:17 -0700 Subject: ext3 file system becoming read only In-Reply-To: <6cd50f9f0709222231s72f9e06cr5c075aaaea45b0b3@mail.gmail.com> References: <241858.40398.qm@web58309.mail.re3.yahoo.com> <6cd50f9f0709222231s72f9e06cr5c075aaaea45b0b3@mail.gmail.com> Message-ID: <6cd50f9f0709230027k48591ea3v72706cf65b1eaeb6@mail.gmail.com> It looks to me like your system is setup sot that filesystems revert to read-only on errors. The system has detected an error (possibly a read error), and has now reverted the system to read-only. Check your system logs for errors related to the partition that you're having problems with (this includes the devices underlying the LVM). Then, (presuming that it's not a hardware problem) I'd suggest: 1) a backup of whatever data you can currently retreive from the syatem, the, 2) go to single-user and manually run fsck on the suspect filesystem. Check another relatively recent thread about problems with RAID cards on risers (1U and 2U systems). On 9/22/07, Swapana Ghosh wrote: > Hi > > In our office environment few servers mostly database servers and yesterday it > happened > for one application server(first time) the partion is getting "read only". > > I was checking the archives, found may be similar kind of issues in the > 2007-July archives. > But how it has been solved if someone describes me that will be really helpful. > > In our case, just at the problem started found the line in log file as follows: > > EXT3-fs error (device dm-12): edxt3_find_entry: reading directory #2015496 > offset 2 > ..... -- Stephen Samuel http://www.bcgreen.com 778-861-7641 From balu.manyam at gmail.com Sun Sep 23 17:42:50 2007 From: balu.manyam at gmail.com (Balu manyam) Date: Sun, 23 Sep 2007 23:12:50 +0530 Subject: ext3 file system becoming read only In-Reply-To: <241858.40398.qm@web58309.mail.re3.yahoo.com> References: <241858.40398.qm@web58309.mail.re3.yahoo.com> Message-ID: <995392220709231042j19e066f7l4a3944e845887dd5@mail.gmail.com> what are you using for managing the multipathing to your SAN? /Balu On 9/23/07, Swapana Ghosh wrote: > > Hi > > In our office environment few servers mostly database servers and > yesterday it > happened > for one application server(first time) the partion is getting "read only". > > I was checking the archives, found may be similar kind of issues in the > 2007-July archives. > But how it has been solved if someone describes me that will be really > helpful. > > In our case, just at the problem started found the line in log file as > follows: > > EXT3-fs error (device dm-12): edxt3_find_entry: reading directory > #2015496 > offset 2 > > Then one blank line > Then the line is > > Aborting journal on device dm-12. > ext3_abort called > > Ext3-fs error (device dm-12): ext3_journal_start_sb: Detected aborted > journal > Remounting filesysem read-only > > Then the continuous line as follows: > > > EXT3-fs error (device dm-12) in start_transaction: Journal has aborted > > > > The above message is continuous until we remount the filesystem and > partion > becomes > 'read-write'. > > We could not figure it out what is the root cause of the system. > > We are using individual EMC luns and are configured with LVM volume groups > and > then mounted on logical > volumes. > > Here i am giving the server description: > > ____________________________________________________________ > > [root at server ~]# lsmod |grep -i qla > qla2300 130304 0 > qla2xxx_conf 305924 0 > qla2xxx 307448 21 qla2300 > scsi_mod 117709 5 sg,emcp,qla2xxx,cciss,sd_mod > > ____________________________________________________________ > [root at server ~]# cat /etc/modprobe.conf > alias eth0 tg3 > alias eth1 tg3 > alias eth2 e1000 > alias eth3 e1000 > alias eth4 e1000 > alias eth5 e1000 > alias bond0 bonding > alias scsi_hostadapter cciss > options bond0 max_bonds=2 miimon=100 mode=1 > alias scsi_hostadapter1 qla2xxx > alias scsi_hostadapter2 qla2xxx_conf > #alias scsi_hostadapter3 qla6312 > options qla2xxx ql2xmaxqdepth=16 qlport_down_retry=64 > ql2xloginretrycount=30 ql2xfailover=0 ql2xlbType=0 > install qla2xxx /sbin/modprobe qla2xxx_conf; /sbin/modprobe > --ignore-install qla2xxx > remove qla2xxx /sbin/modprobe -r --first-time --ignore-remove qla2xxx && { > /sbin/modprobe -r --ignore-remove qla2xxx_conf; } > ###BEGINPP > include /etc/modprobe.conf.pp > ###ENDPP > ###BEGINPP > include /etc/modprobe.conf.pp > ###ENDPP > ###BEGINPP > include /etc/modprobe.conf.pp > ###ENDPP > > ________________________________________________ > [root at server ~]# rpm -qa |grep -i EMC > EMCpower.LINUX-4.5.1-022 > > ________________________________________________ > [root at server ~]# rpm -qa|grep -i scli > scli-1.06.16-57 > > ________________________________________________ > [root at server ~]# rpm -qa|grep -i nav > naviagentcli-6.19.1.3.0-1 > > ________________________________________________ > product: QLA2312 Fibre Channel Adapter > > ________________________________________________ > [root at server ~]# rpm -qa|grep -i lvm > lvm2-2.02.06-6.0.RHEL4 > system-config-lvm-1.0.19-1.0 > > ________________________________________________ > > If I missed any info, pl. let me know. > > It would be really appreciated if I get some hints to solve the issues > > Thanks in advance > -swapana > > > > > > ____________________________________________________________________________________ > Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated > for today's economy) at Yahoo! Games. > http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jprats at cesca.es Tue Sep 25 06:28:42 2007 From: jprats at cesca.es (Jordi Prats) Date: Tue, 25 Sep 2007 08:28:42 +0200 Subject: ext3 file system becoming read only In-Reply-To: <241858.40398.qm@web58309.mail.re3.yahoo.com> References: <241858.40398.qm@web58309.mail.re3.yahoo.com> Message-ID: <46F8AA9A.9060804@cesca.es> Hi, It seems like what it happened to me. I did this to solve this issue: Mark the filesystem as it does not have a journal (take it to ext2) tune2fs -O ^has_journal /dev/cciss/c0d0p2 fsck it to delete the journal: e2fsck /dev/cciss/c0d0p2 Create the journal (take it back to ext3) tune2fs -j /dev/cciss/c0d0p2 and finaly, remount it. In my case it was with a local disk, but with your SAN disk should be the same. Jordi Swapana Ghosh wrote: > Hi > > In our office environment few servers mostly database servers and yesterday it > happened > for one application server(first time) the partion is getting "read only". > > I was checking the archives, found may be similar kind of issues in the > 2007-July archives. > But how it has been solved if someone describes me that will be really helpful. > > In our case, just at the problem started found the line in log file as follows: > > EXT3-fs error (device dm-12): edxt3_find_entry: reading directory #2015496 > offset 2 > > Then one blank line > Then the line is > > Aborting journal on device dm-12. > ext3_abort called > > Ext3-fs error (device dm-12): ext3_journal_start_sb: Detected aborted > journal > Remounting filesysem read-only > > Then the continuous line as follows: > > > EXT3-fs error (device dm-12) in start_transaction: Journal has aborted > > > > The above message is continuous until we remount the filesystem and partion > becomes > 'read-write'. > > We could not figure it out what is the root cause of the system. > > We are using individual EMC luns and are configured with LVM volume groups and > then mounted on logical > volumes. > > Here i am giving the server description: > > ____________________________________________________________ > > [root at server ~]# lsmod |grep -i qla > qla2300 130304 0 > qla2xxx_conf 305924 0 > qla2xxx 307448 21 qla2300 > scsi_mod 117709 5 sg,emcp,qla2xxx,cciss,sd_mod > > ____________________________________________________________ > [root at server ~]# cat /etc/modprobe.conf > alias eth0 tg3 > alias eth1 tg3 > alias eth2 e1000 > alias eth3 e1000 > alias eth4 e1000 > alias eth5 e1000 > alias bond0 bonding > alias scsi_hostadapter cciss > options bond0 max_bonds=2 miimon=100 mode=1 > alias scsi_hostadapter1 qla2xxx > alias scsi_hostadapter2 qla2xxx_conf > #alias scsi_hostadapter3 qla6312 > options qla2xxx ql2xmaxqdepth=16 qlport_down_retry=64 > ql2xloginretrycount=30 ql2xfailover=0 ql2xlbType=0 > install qla2xxx /sbin/modprobe qla2xxx_conf; /sbin/modprobe > --ignore-install qla2xxx > remove qla2xxx /sbin/modprobe -r --first-time --ignore-remove qla2xxx && { > /sbin/modprobe -r --ignore-remove qla2xxx_conf; } > ###BEGINPP > include /etc/modprobe.conf.pp > ###ENDPP > ###BEGINPP > include /etc/modprobe.conf.pp > ###ENDPP > ###BEGINPP > include /etc/modprobe.conf.pp > ###ENDPP > > ________________________________________________ > [root at server ~]# rpm -qa |grep -i EMC > EMCpower.LINUX-4.5.1-022 > > ________________________________________________ > [root at server ~]# rpm -qa|grep -i scli > scli-1.06.16-57 > > ________________________________________________ > [root at server ~]# rpm -qa|grep -i nav > naviagentcli-6.19.1.3.0-1 > > ________________________________________________ > product: QLA2312 Fibre Channel Adapter > > ________________________________________________ > [root at server ~]# rpm -qa|grep -i lvm > lvm2-2.02.06-6.0.RHEL4 > system-config-lvm-1.0.19-1.0 > > ________________________________________________ > > If I missed any info, pl. let me know. > > It would be really appreciated if I get some hints to solve the issues > > Thanks in advance > -swapana > > > > > ____________________________________________________________________________________ > Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games. > http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users > > > -- ...................................................................... __ / / Jordi Prats C E / S / C A Dept. de Sistemes /_/ Centre de Supercomputaci? de Catalunya Gran Capit?, 2-4 (Edifici Nexus) ? 08034 Barcelona T. 93 205 6464 ? F. 93 205 6979 ? jprats at cesca.es ...................................................................... From swapana_ghosh at yahoo.com Tue Sep 25 16:47:50 2007 From: swapana_ghosh at yahoo.com (Swapana Ghosh) Date: Tue, 25 Sep 2007 09:47:50 -0700 (PDT) Subject: ext3 file system becoming read only In-Reply-To: <46F8AA9A.9060804@cesca.es> Message-ID: <474290.89634.qm@web58307.mail.re3.yahoo.com> Hi Jordi, Thanks for your reply. I will test the way you suggested. Thanks -swapna --- Jordi Prats wrote: > Hi, > It seems like what it happened to me. I did this to solve this issue: > > Mark the filesystem as it does not have a journal (take it to ext2) > > tune2fs -O ^has_journal /dev/cciss/c0d0p2 > > fsck it to delete the journal: > > e2fsck /dev/cciss/c0d0p2 > > Create the journal (take it back to ext3) > > tune2fs -j /dev/cciss/c0d0p2 > > and finaly, remount it. > > In my case it was with a local disk, but with your SAN disk should be > the same. > > Jordi > > > > Swapana Ghosh wrote: > > Hi > > > > In our office environment few servers mostly database servers and > yesterday it > > happened > > for one application server(first time) the partion is getting "read only". > > > > I was checking the archives, found may be similar kind of issues in the > > 2007-July archives. > > But how it has been solved if someone describes me that will be really > helpful. > > > > In our case, just at the problem started found the line in log file as > follows: > > > > EXT3-fs error (device dm-12): edxt3_find_entry: reading directory > #2015496 > > offset 2 > > > > Then one blank line > > Then the line is > > > > Aborting journal on device dm-12. > > ext3_abort called > > > > Ext3-fs error (device dm-12): ext3_journal_start_sb: Detected aborted > > journal > > Remounting filesysem read-only > > > > Then the continuous line as follows: > > > > > > EXT3-fs error (device dm-12) in start_transaction: Journal has aborted > > > > > > > > The above message is continuous until we remount the filesystem and > partion > > becomes > > 'read-write'. > > > > We could not figure it out what is the root cause of the system. > > > > We are using individual EMC luns and are configured with LVM volume groups > and > > then mounted on logical > > volumes. > > > > Here i am giving the server description: > > > > ____________________________________________________________ > > > > [root at server ~]# lsmod |grep -i qla > > qla2300 130304 0 > > qla2xxx_conf 305924 0 > > qla2xxx 307448 21 qla2300 > > scsi_mod 117709 5 sg,emcp,qla2xxx,cciss,sd_mod > > > > ____________________________________________________________ > > [root at server ~]# cat /etc/modprobe.conf > > alias eth0 tg3 > > alias eth1 tg3 > > alias eth2 e1000 > > alias eth3 e1000 > > alias eth4 e1000 > > alias eth5 e1000 > > alias bond0 bonding > > alias scsi_hostadapter cciss > > options bond0 max_bonds=2 miimon=100 mode=1 > > alias scsi_hostadapter1 qla2xxx > > alias scsi_hostadapter2 qla2xxx_conf > > #alias scsi_hostadapter3 qla6312 > > options qla2xxx ql2xmaxqdepth=16 qlport_down_retry=64 > > ql2xloginretrycount=30 ql2xfailover=0 ql2xlbType=0 > > install qla2xxx /sbin/modprobe qla2xxx_conf; /sbin/modprobe > > --ignore-install qla2xxx > > remove qla2xxx /sbin/modprobe -r --first-time --ignore-remove qla2xxx && { > > /sbin/modprobe -r --ignore-remove qla2xxx_conf; } > > ###BEGINPP > > include /etc/modprobe.conf.pp > > ###ENDPP > > ###BEGINPP > > include /etc/modprobe.conf.pp > > ###ENDPP > > ###BEGINPP > > include /etc/modprobe.conf.pp > > ###ENDPP > > > > ________________________________________________ > > [root at server ~]# rpm -qa |grep -i EMC > > EMCpower.LINUX-4.5.1-022 > > > > ________________________________________________ > > [root at server ~]# rpm -qa|grep -i scli > > scli-1.06.16-57 > > > > ________________________________________________ > > [root at server ~]# rpm -qa|grep -i nav > > naviagentcli-6.19.1.3.0-1 > > > > ________________________________________________ > > product: QLA2312 Fibre Channel Adapter > > > > ________________________________________________ > > [root at server ~]# rpm -qa|grep -i lvm > > lvm2-2.02.06-6.0.RHEL4 > > system-config-lvm-1.0.19-1.0 > > > > ________________________________________________ > > > > If I missed any info, pl. let me know. > > > > It would be really appreciated if I get some hints to solve the issues > > > > Thanks in advance > > -swapana > > > > > > > > > > > ____________________________________________________________________________________ > > Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated > for today's economy) at Yahoo! Games. > > http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow > > > > _______________________________________________ > > Ext3-users mailing list > > Ext3-users at redhat.com > > https://www.redhat.com/mailman/listinfo/ext3-users > > > > > > > > > -- > ...................................................................... > __ > / / Jordi Prats > C E / S / C A Dept. de Sistemes > /_/ Centre de Supercomputaci? de Catalunya > > Gran Capit?, 2-4 (Edifici Nexus) ? 08034 Barcelona > T. 93 205 6464 ? F. 93 205 6979 ? jprats at cesca.es > ...................................................................... > > ____________________________________________________________________________________ Be a better Heartthrob. Get better relationship answers from someone who knows. Yahoo! Answers - Check it out. http://answers.yahoo.com/dir/?link=list&sid=396545433 From tweeks at rackspace.com Tue Sep 25 18:27:04 2007 From: tweeks at rackspace.com (tweeks) Date: Tue, 25 Sep 2007 13:27:04 -0500 Subject: ext3 file system becoming read only In-Reply-To: <474290.89634.qm@web58307.mail.re3.yahoo.com> References: <474290.89634.qm@web58307.mail.re3.yahoo.com> Message-ID: <200709251327.05400.tweeks@rackspace.com> The EL4 kernel is wacky when it comes the the I/O scheduler locking up and and causing ext3 to remount RO. Various hardware hiccups can cause it to go RO. And when it does.. you need to tread lightly or you could lose everything. If your ext3 filesystem had problems and remounted read-only, I would strongly advise /against/ simply fscking it. Often times when your filesystem has gone RO, it may have been that way for 30 minutes or more. Just rebooting ro fscking is a great way to lose everything (i.e. everything being dumped into /lost+found/" Instead, I would recommend: 1) rebooting into a rescue CD environment (not allowing the rescue environment to mount or fsck your filesystems). 2) Nuke the ext3 journal: tune2fs -O ^has_journal /dev/ (possibly doing the same for other problem partitions) 3) Do a fake fsck to see the extent of damage: fsck -fn /dev/ (after checking things out.. use "-fy" once you're sure that it's safe) 4) Rebuild the journal w, "tune2fs -j /dev/ (rerun at least once until "clean" result is repeatable) 5) Mount and check things out, "mkdir /mnt/tmp && mount -t ext3 /dev/ /mnt/tmp" 6) Gracefully umount & reboot: "umount /mnt/tmp && shutdown -rf now && exit" Tweeks On Tuesday 25 September 2007 11:47, Swapana Ghosh wrote: > Hi Jordi, > > Thanks for your reply. I will test the way you suggested. > > Thanks > -swapna > > --- Jordi Prats wrote: > > Hi, > > It seems like what it happened to me. I did this to solve this issue: > > > > Mark the filesystem as it does not have a journal (take it to ext2) > > > > tune2fs -O ^has_journal /dev/cciss/c0d0p2 > > > > fsck it to delete the journal: > > > > e2fsck /dev/cciss/c0d0p2 > > > > Create the journal (take it back to ext3) > > > > tune2fs -j /dev/cciss/c0d0p2 > > > > and finaly, remount it. > > > > In my case it was with a local disk, but with your SAN disk should be > > the same. > > > > Jordi > > > > Swapana Ghosh wrote: > > > Hi > > > > > > In our office environment few servers mostly database servers and > > > > yesterday it > > > > > happened > > > for one application server(first time) the partion is getting "read > > > only". > > > > > > I was checking the archives, found may be similar kind of issues in the > > > 2007-July archives. > > > But how it has been solved if someone describes me that will be really > > > > helpful. > > > > > In our case, just at the problem started found the line in log file as > > > > follows: > > > EXT3-fs error (device dm-12): edxt3_find_entry: reading directory > > > > #2015496 > > > > > offset 2 > > > > > > Then one blank line > > > Then the line is > > > > > > Aborting journal on device dm-12. > > > ext3_abort called > > > > > > Ext3-fs error (device dm-12): ext3_journal_start_sb: Detected > > > aborted journal > > > Remounting filesysem read-only > > > > > > Then the continuous line as follows: > > > > > > > > > EXT3-fs error (device dm-12) in start_transaction: Journal has > > > aborted > > > > > > > > > > > > The above message is continuous until we remount the filesystem and > > > > partion > > > > > becomes > > > 'read-write'. > > > > > > We could not figure it out what is the root cause of the system. > > > > > > We are using individual EMC luns and are configured with LVM volume > > > groups > > > > and > > > > > then mounted on logical > > > volumes. > > > > > > Here i am giving the server description: > > > > > > ____________________________________________________________ > > > > > > [root at server ~]# lsmod |grep -i qla > > > qla2300 130304 0 > > > qla2xxx_conf 305924 0 > > > qla2xxx 307448 21 qla2300 > > > scsi_mod 117709 5 sg,emcp,qla2xxx,cciss,sd_mod > > > > > > ____________________________________________________________ > > > [root at server ~]# cat /etc/modprobe.conf > > > alias eth0 tg3 > > > alias eth1 tg3 > > > alias eth2 e1000 > > > alias eth3 e1000 > > > alias eth4 e1000 > > > alias eth5 e1000 > > > alias bond0 bonding > > > alias scsi_hostadapter cciss > > > options bond0 max_bonds=2 miimon=100 mode=1 > > > alias scsi_hostadapter1 qla2xxx > > > alias scsi_hostadapter2 qla2xxx_conf > > > #alias scsi_hostadapter3 qla6312 > > > options qla2xxx ql2xmaxqdepth=16 qlport_down_retry=64 > > > ql2xloginretrycount=30 ql2xfailover=0 ql2xlbType=0 > > > install qla2xxx /sbin/modprobe qla2xxx_conf; /sbin/modprobe > > > --ignore-install qla2xxx > > > remove qla2xxx /sbin/modprobe -r --first-time --ignore-remove qla2xxx > > > && { /sbin/modprobe -r --ignore-remove qla2xxx_conf; } > > > ###BEGINPP > > > include /etc/modprobe.conf.pp > > > ###ENDPP > > > ###BEGINPP > > > include /etc/modprobe.conf.pp > > > ###ENDPP > > > ###BEGINPP > > > include /etc/modprobe.conf.pp > > > ###ENDPP > > > > > > ________________________________________________ > > > [root at server ~]# rpm -qa |grep -i EMC > > > EMCpower.LINUX-4.5.1-022 > > > > > > ________________________________________________ > > > [root at server ~]# rpm -qa|grep -i scli > > > scli-1.06.16-57 > > > > > > ________________________________________________ > > > [root at server ~]# rpm -qa|grep -i nav > > > naviagentcli-6.19.1.3.0-1 > > > > > > ________________________________________________ > > > product: QLA2312 Fibre Channel Adapter > > > > > > ________________________________________________ > > > [root at server ~]# rpm -qa|grep -i lvm > > > lvm2-2.02.06-6.0.RHEL4 > > > system-config-lvm-1.0.19-1.0 > > > > > > ________________________________________________ > > > > > > If I missed any info, pl. let me know. > > > > > > It would be really appreciated if I get some hints to solve the issues > > > > > > Thanks in advance > > > -swapana > > ___________________________________________________________________________ >_________ > > > > Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's > > > updated > > > > for today's economy) at Yahoo! Games. > > > > > http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow > > > > > > _______________________________________________ > > > Ext3-users mailing list > > > Ext3-users at redhat.com > > > https://www.redhat.com/mailman/listinfo/ext3-users > > > > -- > > ...................................................................... > > __ > > / / Jordi Prats > > C E / S / C A Dept. de Sistemes > > /_/ Centre de Supercomputaci? de Catalunya > > > > Gran Capit?, 2-4 (Edifici Nexus) ? 08034 Barcelona > > T. 93 205 6464 ? F. 93 205 6979 ? jprats at cesca.es > > ...................................................................... > > ___________________________________________________________________________ >_________ Be a better Heartthrob. Get better relationship answers from > someone who knows. Yahoo! Answers - Check it out. > http://answers.yahoo.com/dir/?link=list&sid=396545433 > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users From tytso at mit.edu Tue Sep 25 11:37:11 2007 From: tytso at mit.edu (Theodore Tso) Date: Tue, 25 Sep 2007 07:37:11 -0400 Subject: ext3 file system becoming read only In-Reply-To: <46F8AA9A.9060804@cesca.es> References: <241858.40398.qm@web58309.mail.re3.yahoo.com> <46F8AA9A.9060804@cesca.es> Message-ID: <20070925113710.GC21736@thunk.org> On Tue, Sep 25, 2007 at 08:28:42AM +0200, Jordi Prats wrote: > It seems like what it happened to me. I did this to solve this issue: > > Mark the filesystem as it does not have a journal (take it to ext2) > > tune2fs -O ^has_journal /dev/cciss/c0d0p2 > > fsck it to delete the journal: > > e2fsck /dev/cciss/c0d0p2 > > Create the journal (take it back to ext3) > > tune2fs -j /dev/cciss/c0d0p2 > > and finaly, remount it. > > In my case it was with a local disk, but with your SAN disk should be > the same. If this helped for you, and e2fsck was formerly not complaining about any problems with the filesystem, it is very likely that you have a bad block on your disk that happened to overlap with the journal. You may want to use smartctl to see how the disk is doing, and replace it if necessary, and use "e2fsck -c" to test for bad blocks and lock them out from being used in the future. (Note that if smartctl indicates the disk is about to fail, you'll want to omit the e2fsck -c, and instead backup the hard drive and replace it ASAP!) - Ted From swapana_ghosh at yahoo.com Wed Sep 26 02:56:09 2007 From: swapana_ghosh at yahoo.com (Swapana Ghosh) Date: Tue, 25 Sep 2007 19:56:09 -0700 (PDT) Subject: ext3 file system becoming read only In-Reply-To: <200709251327.05400.tweeks@rackspace.com> Message-ID: <131827.91597.qm@web58308.mail.re3.yahoo.com> Hi, As I explained in my first posting that the 'read-only' issue is not for one server, it is happening for few servers which are generally 'oracle' database oriented. Very recently it happned to an 'oracle' application server. For temporary basis , we are re-mounting the file system and also doing fsck. While searching the redhat knowledge base, found the following url, the problem they were explaining it is similar to our issues, https://bugzilla.redhat.com/show_bug.cgi?id=213921 It is telling that it is the bug of the kernel.. Not sure whether we will proceed for the higher version of kernel or not, please advice. Thanks --- tweeks wrote: > The EL4 kernel is wacky when it comes the the I/O scheduler locking up and > and > causing ext3 to remount RO. Various hardware hiccups can cause it to go RO. > > And when it does.. you need to tread lightly or you could lose everything. > > If your ext3 filesystem had problems and remounted read-only, I would > strongly > advise /against/ simply fscking it. Often times when your filesystem has > gone RO, it may have been that way for 30 minutes or more. Just rebooting ro > > fscking is a great way to lose everything (i.e. everything being dumped > into /lost+found/" > > Instead, I would recommend: > 1) rebooting into a rescue CD environment (not allowing the rescue > environment > to mount or fsck your filesystems). > 2) Nuke the ext3 journal: > tune2fs -O ^has_journal /dev/ > (possibly doing the same for other problem partitions) > 3) Do a fake fsck to see the extent of damage: > fsck -fn /dev/ > (after checking things out.. use "-fy" once you're sure that it's safe) > 4) Rebuild the journal w, "tune2fs -j /dev/ > (rerun at least once until "clean" result is repeatable) > 5) Mount and check things out, > "mkdir /mnt/tmp && mount -t ext3 /dev/ /mnt/tmp" > 6) Gracefully umount & reboot: > "umount /mnt/tmp && shutdown -rf now && exit" > > Tweeks > > On Tuesday 25 September 2007 11:47, Swapana Ghosh wrote: > > Hi Jordi, > > > > Thanks for your reply. I will test the way you suggested. > > > > Thanks > > -swapna > > > > --- Jordi Prats wrote: > > > Hi, > > > It seems like what it happened to me. I did this to solve this issue: > > > > > > Mark the filesystem as it does not have a journal (take it to ext2) > > > > > > tune2fs -O ^has_journal /dev/cciss/c0d0p2 > > > > > > fsck it to delete the journal: > > > > > > e2fsck /dev/cciss/c0d0p2 > > > > > > Create the journal (take it back to ext3) > > > > > > tune2fs -j /dev/cciss/c0d0p2 > > > > > > and finaly, remount it. > > > > > > In my case it was with a local disk, but with your SAN disk should be > > > the same. > > > > > > Jordi > > > > > > Swapana Ghosh wrote: > > > > Hi > > > > > > > > In our office environment few servers mostly database servers and > > > > > > yesterday it > > > > > > > happened > > > > for one application server(first time) the partion is getting "read > > > > only". > > > > > > > > I was checking the archives, found may be similar kind of issues in the > > > > 2007-July archives. > > > > But how it has been solved if someone describes me that will be really > > > > > > helpful. > > > > > > > In our case, just at the problem started found the line in log file as > > > > > > follows: > > > > EXT3-fs error (device dm-12): edxt3_find_entry: reading directory > > > > > > #2015496 > > > > > > > offset 2 > > > > > > > > Then one blank line > > > > Then the line is > > > > > > > > Aborting journal on device dm-12. > > > > ext3_abort called > > > > > > > > Ext3-fs error (device dm-12): ext3_journal_start_sb: Detected > > > > aborted journal > > > > Remounting filesysem read-only > > > > > > > > Then the continuous line as follows: > > > > > > > > > > > > EXT3-fs error (device dm-12) in start_transaction: Journal has > > > > aborted > > > > > > > > > > > > > > > > The above message is continuous until we remount the filesystem and > > > > > > partion > > > > > > > becomes > > > > 'read-write'. > > > > > > > > We could not figure it out what is the root cause of the system. > > > > > > > > We are using individual EMC luns and are configured with LVM volume > > > > groups > > > > > > and > > > > > > > then mounted on logical > > > > volumes. > > > > > > > > Here i am giving the server description: > > > > > > > > ____________________________________________________________ > > > > > > > > [root at server ~]# lsmod |grep -i qla > > > > qla2300 130304 0 > > > > qla2xxx_conf 305924 0 > > > > qla2xxx 307448 21 qla2300 > > > > scsi_mod 117709 5 sg,emcp,qla2xxx,cciss,sd_mod > > > > > > > > ____________________________________________________________ > > > > [root at server ~]# cat /etc/modprobe.conf > > > > alias eth0 tg3 > > > > alias eth1 tg3 > > > > alias eth2 e1000 > > > > alias eth3 e1000 > > > > alias eth4 e1000 > > > > alias eth5 e1000 > > > > alias bond0 bonding > > > > alias scsi_hostadapter cciss > > > > options bond0 max_bonds=2 miimon=100 mode=1 > > > > alias scsi_hostadapter1 qla2xxx > > > > alias scsi_hostadapter2 qla2xxx_conf > > > > #alias scsi_hostadapter3 qla6312 > > > > options qla2xxx ql2xmaxqdepth=16 qlport_down_retry=64 > > > > ql2xloginretrycount=30 ql2xfailover=0 ql2xlbType=0 > > > > install qla2xxx /sbin/modprobe qla2xxx_conf; /sbin/modprobe > > > > --ignore-install qla2xxx > > > > remove qla2xxx /sbin/modprobe -r --first-time --ignore-remove qla2xxx > > > > && { /sbin/modprobe -r --ignore-remove qla2xxx_conf; } > > > > ###BEGINPP > > > > include /etc/modprobe.conf.pp > > > > ###ENDPP > > > > ###BEGINPP > > > > include /etc/modprobe.conf.pp > > > > ###ENDPP > > > > ###BEGINPP > > > > include /etc/modprobe.conf.pp > > > > ###ENDPP > > > > > > > > ________________________________________________ > > > > [root at server ~]# rpm -qa |grep -i EMC > > > > EMCpower.LINUX-4.5.1-022 > > > > > > > > ________________________________________________ > > > > [root at server ~]# rpm -qa|grep -i scli > > > > scli-1.06.16-57 > > > > > > > > ________________________________________________ > > > > [root at server ~]# rpm -qa|grep -i nav > > > > naviagentcli-6.19.1.3.0-1 > > > > > > > > ________________________________________________ > > > > product: QLA2312 Fibre Channel Adapter > > > > > > > > ________________________________________________ > > > > [root at server ~]# rpm -qa|grep -i lvm > > > > lvm2-2.02.06-6.0.RHEL4 > > > > system-config-lvm-1.0.19-1.0 > > > > > > > > ________________________________________________ > > > > > > > > If I missed any info, pl. let me know. > > > > > > > > It would be really appreciated if I get some hints to solve the issues > > > > > > > > Thanks in advance > > > > -swapana > > > > ___________________________________________________________________________ > >_________ > > > > > > Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's > > > > updated > > > > > > for today's economy) at Yahoo! Games. > === message truncated === ____________________________________________________________________________________ Tonight's top picks. What will you watch tonight? Preview the hottest shows on Yahoo! TV. http://tv.yahoo.com/ From tweeks at rackspace.com Wed Sep 26 19:06:36 2007 From: tweeks at rackspace.com (tweeks) Date: Wed, 26 Sep 2007 14:06:36 -0500 Subject: ext3 file system becoming read only In-Reply-To: <131827.91597.qm@web58308.mail.re3.yahoo.com> References: <131827.91597.qm@web58308.mail.re3.yahoo.com> Message-ID: <200709261406.36689.tweeks@rackspace.com> On Tuesday 25 September 2007 21:56, Swapana Ghosh wrote: > Hi, > > As I explained in my first posting that the 'read-only' issue is not for > one server, it is happening for few servers which are generally 'oracle' > database oriented. Very recently it happned to an 'oracle' application > server. For temporary basis , we are re-mounting the file system and also > doing fsck. While searching the redhat knowledge base, found the following > url, the problem they were explaining it is similar to our issues, > > https://bugzilla.redhat.com/show_bug.cgi?id=213921 > > It is telling that it is the bug of the kernel.. > > Not sure whether we will proceed for the higher version of kernel or not, > please advice. The fix for us was to move to the U4.5 or U5 kernel (the latest). Try that on a test system and see if it does it for you. Tweeks From adilger at clusterfs.com Thu Sep 27 10:18:48 2007 From: adilger at clusterfs.com (Andreas Dilger) Date: Thu, 27 Sep 2007 04:18:48 -0600 Subject: How are alternate superblocks repaired? In-Reply-To: <23459344.1190324177569.JavaMail.root@mswamui-backed.atl.sa.earthlink.net> References: <23459344.1190324177569.JavaMail.root@mswamui-backed.atl.sa.earthlink.net> Message-ID: <20070927101848.GD32520@schatzie.adilger.int> On Sep 20, 2007 17:36 -0400, Thomas Watt wrote: > Using dumpe2fs I have been able to determine that all of my alternate > ext3 superblocks are corrupted (not clean), and only the primary > superblock is valid, i.e. mount works and the ordered journal is applied. > When the primary superblock gets flakey, i.e. the ext_attr Filesystem > feature goes missing - not sure why this occurs. At this point, the > mount does not apply the journal using the primary superblock and mount > completes without it. Usually, I will resort to booting up the FC3 > OS hard drive on which the ext3 filesystem resides to fix at least the > primary superblock via fsck. Normally the superblock backups are touched only when e2fsck runs. This ensures that the backups are not touched by the kernel and not also "updated" with corruption in case of a software/hardware problem. The only code I know that updates the backup superblocks is the online resizing code, because if it doesn't the backup copies will no longer be useful (i.e. any data written beyond the old end-of-filesystem would be lost). > Will the command: e2fsck -fp /dev/sdb2 repair the alternate superblocks, > and if so, should it only be run from the Live CD environment? Or, > do I need to get into runlevel 1 as single user to issue the command > after unmounting the hard drive in order to run it? The e2fsck should handle it. Can you post the differences between the bad and good superblocks before you do so? Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From tango at tiac.net Thu Sep 27 14:39:28 2007 From: tango at tiac.net (Thomas Watt) Date: Thu, 27 Sep 2007 10:39:28 -0400 (EDT) Subject: How are alternate superblocks repaired? Message-ID: <26853540.1190903968996.JavaMail.root@mswamui-andean.atl.sa.earthlink.net> Hi Andreas, Thank you for replying. I have written a metascript which generates a shell script to dump all of the superblock information you have requested, however, only the primary and first backup are compared and then the first backup is compared with the remaining bad backup superblocks. That's not exactly what you requested, but will it suffice? If not, I can easily fix it to make the comparisons using the primary (good) superblock as the anchor instead of the (bad) first backup superblock. Outputs are: text, binary, and xxd hex dump formats with separate subdirectories and comparison scripts. I have noticed that a lack of textual difference in broken superblocks does not correlate in binary difference - i.e. they are different. I understand that there was a previous kernel bug in the mount command regarding the -o sync option, and that it has been fixed in a subsequent release. Consequently, I am for the moment not using that option on mounting the FC3 disk. In what version of the kernel was the fix for the mount command -o sync bug? I suppose I still have a question that if the online resizing code is the only other code to touch the backup superblocks (out of necessity), then what is the recommended frequency on running e2fsck with what parameters to insure that the backup superblocks remain in good, usable condition? Should this be done within FC3 natively in single user mode or will a Live CD environment with a newer version of e2fsck be more appropriate? Regards, -- Tom On Sep 20, 2007 17:36 -0400, Thomas Watt wrote: > Using dumpe2fs I have been able to determine that all of my alternate > ext3 superblocks are corrupted (not clean), and only the primary > superblock is valid, i.e. mount works and the ordered journal is applied. > When the primary superblock gets flakey, i.e. the ext_attr Filesystem > feature goes missing - not sure why this occurs. At this point, the > mount does not apply the journal using the primary superblock and mount > completes without it. Usually, I will resort to booting up the FC3 > OS hard drive on which the ext3 filesystem resides to fix at least the > primary superblock via fsck. Normally the superblock backups are touched only when e2fsck runs. This ensures that the backups are not touched by the kernel and not also "updated" with corruption in case of a software/hardware problem. The only code I know that updates the backup superblocks is the online resizing code, because if it doesn't the backup copies will no longer be useful (i.e. any data written beyond the old end-of-filesystem would be lost). > Will the command: e2fsck -fp /dev/sdb2 repair the alternate superblocks, > and if so, should it only be run from the Live CD environment? Or, > do I need to get into runlevel 1 as single user to issue the command > after unmounting the hard drive in order to run it? The e2fsck should handle it. Can you post the differences between the bad and good superblocks before you do so? Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From adilger at clusterfs.com Thu Sep 27 20:21:35 2007 From: adilger at clusterfs.com (Andreas Dilger) Date: Thu, 27 Sep 2007 14:21:35 -0600 Subject: How are alternate superblocks repaired? In-Reply-To: <26853540.1190903968996.JavaMail.root@mswamui-andean.atl.sa.earthlink.net> References: <26853540.1190903968996.JavaMail.root@mswamui-andean.atl.sa.earthlink.net> Message-ID: <20070927202135.GO32520@schatzie.adilger.int> On Sep 27, 2007 10:39 -0400, Thomas Watt wrote: > I have written a metascript which generates a shell script to dump > all of the superblock information you have requested, however, only > the primary and first backup are compared and then the first backup > is compared with the remaining bad backup superblocks. That's not > exactly what you requested, but will it suffice? Seems fine. Note that it may also be the backup group descriptors that are bad. They are also only updated at e2fsck and online resize time. > I understand that there was a previous kernel bug in the mount command regarding the -o sync option, and that it has been fixed in a subsequent release. Consequently, I am for the moment not using that option on mounting the FC3 disk. In what version of the kernel was the fix for the mount command -o sync bug? Sorry, no idea. > I suppose I still have a question that if the online resizing code is > the only other code to touch the backup superblocks (out of necessity), > then what is the recommended frequency on running e2fsck with what > parameters to insure that the backup superblocks remain in good, > usable condition? Should this be done within FC3 natively in single > user mode or will a Live CD environment with a newer version of e2fsck > be more appropriate? By default, e2fsck will be run on a filesystem every 20-40 mounts, or 6 months. Some people turn that off, however. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From tango at tiac.net Fri Sep 28 05:18:16 2007 From: tango at tiac.net (Thomas Watt) Date: Fri, 28 Sep 2007 01:18:16 -0400 (GMT-04:00) Subject: How are alternate superblocks repaired? Message-ID: <9501720.1190956696834.JavaMail.root@mswamui-bichon.atl.sa.earthlink.net> Hi Andreas, Here is the textual difference between the primary (good) superblock and backup (bad) superblock #1 and the xxd hex difference between them: primarysb-bkupsb1-txt.diff: < Filesystem features: has_journal ext_attr filetype sparse_super --- > Filesystem features: has_journal filetype sparse_super 8c8 < Filesystem state: clean --- > Filesystem state: not clean 14,15c14,15 < Free blocks: 13907467 < Free inodes: 9246071 --- > Free blocks: 18818929 > Free inodes: 9568245 24,26c24,26 < Last mount time: Thu Sep 27 23:47:27 2007 < Last write time: Fri Sep 28 00:20:22 2007 < Mount count: 17 --- > Last mount time: n/a > Last write time: Wed Mar 30 16:59:15 2005 > Mount count: 0 28c28 < Last checked: Tue Sep 18 06:56:43 2007 --- > Last checked: Wed Mar 30 16:58:55 2005 30c30 < Next check after: Sun Mar 16 06:56:43 2008 --- > Next check after: Mon Sep 26 17:58:55 2005 primarysb-bkupsb1-xxd.diff: < 0000000: 0000 9200 5edc 2301 d197 0e00 0b36 d400 ....^.#......6.. < 0000010: 7715 8d00 0000 0000 0200 0000 0200 0000 w............... < 0000020: 0080 0000 0080 0000 0040 0000 4f79 fc46 ......... at ..Oy.F < 0000030: 0681 fc46 1100 1e00 53ef 0100 0100 0000 ...F....S....... < 0000040: ebae ef46 004e ed00 0000 0000 0100 0000 ...F.N.......... < 0000050: 0000 0000 0b00 0000 8000 0000 0c00 0000 ................ --- > 0000000: 0000 9200 5edc 2301 d197 0e00 7127 1f01 ....^.#.....q'.. > 0000010: f5ff 9100 0000 0000 0200 0000 0200 0000 ................ > 0000020: 0080 0000 0080 0000 0040 0000 0000 0000 ......... at ...... > 0000030: 3321 4b42 0000 1e00 53ef 0000 0100 0000 3!KB....S....... > 0000040: 1f21 4b42 004e ed00 0000 0000 0100 0000 .!KB.N.......... > 0000050: 0000 0000 0b00 0000 8000 0100 0400 0000 ................ The Maximum mount count is 30, and I have no reason to believe that e2fsck has ever been run against this particular FC3 ext filesystem. I have every reason to believe, however, that fsck has been run on occasion when I either boot the FC3 system manually and the mount count is over 30 or when I experience the situation where the ext_attr goes missing and I then manually boot the system when it is not clean in the primary superblock. The system was created at the end of March, 2005 and as you can see from the differences the backup superblock(s) have never even been touched after their creation. What parameters do you suggest be used when e2fsck is run to repair the backup superblocks? -- Tom From jprats at cesca.es Fri Sep 28 06:25:13 2007 From: jprats at cesca.es (Jordi Prats) Date: Fri, 28 Sep 2007 08:25:13 +0200 Subject: ext3 file system becoming read only In-Reply-To: <131827.91597.qm@web58308.mail.re3.yahoo.com> References: <131827.91597.qm@web58308.mail.re3.yahoo.com> Message-ID: <46FC9E49.6090900@cesca.es> Hi Swapana, A update is always a good idea. On RHEL updates use to go smoothly, but I have you checked your FC switch for errors on each port? You could also check your SAN controllers, or run some diagnostics to be sure it's not a problem on your SAN. If your active controller reboots suddenly it can cause some IO errors causing your journal corruption. regards, Jordi Swapana Ghosh wrote: > Hi, > > As I explained in my first posting that the 'read-only' issue is not for one > server, it is happening for few servers which are generally 'oracle' database > oriented. Very recently it happned to an 'oracle' application server. For > temporary basis , we are re-mounting the file system and also doing fsck. > While searching the redhat knowledge base, found the following url, the problem > they were explaining it is similar to our issues, > > https://bugzilla.redhat.com/show_bug.cgi?id=213921 > > It is telling that it is the bug of the kernel.. > > Not sure whether we will proceed for the higher version of kernel or not, > please advice. > > Thanks > > > --- tweeks wrote: > > >> The EL4 kernel is wacky when it comes the the I/O scheduler locking up and >> and >> causing ext3 to remount RO. Various hardware hiccups can cause it to go RO. >> >> And when it does.. you need to tread lightly or you could lose everything. >> >> If your ext3 filesystem had problems and remounted read-only, I would >> strongly >> advise /against/ simply fscking it. Often times when your filesystem has >> gone RO, it may have been that way for 30 minutes or more. Just rebooting ro >> >> fscking is a great way to lose everything (i.e. everything being dumped >> into /lost+found/" >> >> Instead, I would recommend: >> 1) rebooting into a rescue CD environment (not allowing the rescue >> environment >> to mount or fsck your filesystems). >> 2) Nuke the ext3 journal: >> tune2fs -O ^has_journal /dev/ >> (possibly doing the same for other problem partitions) >> 3) Do a fake fsck to see the extent of damage: >> fsck -fn /dev/ >> (after checking things out.. use "-fy" once you're sure that it's safe) >> 4) Rebuild the journal w, "tune2fs -j /dev/ >> (rerun at least once until "clean" result is repeatable) >> 5) Mount and check things out, >> "mkdir /mnt/tmp && mount -t ext3 /dev/ /mnt/tmp" >> 6) Gracefully umount & reboot: >> "umount /mnt/tmp && shutdown -rf now && exit" >> >> Tweeks >> >> On Tuesday 25 September 2007 11:47, Swapana Ghosh wrote: >> >>> Hi Jordi, >>> >>> Thanks for your reply. I will test the way you suggested. >>> >>> Thanks >>> -swapna >>> >>> --- Jordi Prats wrote: >>> >>>> Hi, >>>> It seems like what it happened to me. I did this to solve this issue: >>>> >>>> Mark the filesystem as it does not have a journal (take it to ext2) >>>> >>>> tune2fs -O ^has_journal /dev/cciss/c0d0p2 >>>> >>>> fsck it to delete the journal: >>>> >>>> e2fsck /dev/cciss/c0d0p2 >>>> >>>> Create the journal (take it back to ext3) >>>> >>>> tune2fs -j /dev/cciss/c0d0p2 >>>> >>>> and finaly, remount it. >>>> >>>> In my case it was with a local disk, but with your SAN disk should be >>>> the same. >>>> >>>> Jordi >>>> >>>> Swapana Ghosh wrote: >>>> >>>>> Hi >>>>> >>>>> In our office environment few servers mostly database servers and >>>>> >>>> yesterday it >>>> >>>> >>>>> happened >>>>> for one application server(first time) the partion is getting "read >>>>> only". >>>>> >>>>> I was checking the archives, found may be similar kind of issues in the >>>>> 2007-July archives. >>>>> But how it has been solved if someone describes me that will be really >>>>> >>>> helpful. >>>> >>>> >>>>> In our case, just at the problem started found the line in log file as >>>>> >>>> follows: >>>> >>>>> EXT3-fs error (device dm-12): edxt3_find_entry: reading directory >>>>> >>>> #2015496 >>>> >>>> >>>>> offset 2 >>>>> >>>>> Then one blank line >>>>> Then the line is >>>>> >>>>> Aborting journal on device dm-12. >>>>> ext3_abort called >>>>> >>>>> Ext3-fs error (device dm-12): ext3_journal_start_sb: Detected >>>>> aborted journal >>>>> Remounting filesysem read-only >>>>> >>>>> Then the continuous line as follows: >>>>> >>>>> >>>>> EXT3-fs error (device dm-12) in start_transaction: Journal has >>>>> aborted >>>>> >>>>> >>>>> >>>>> The above message is continuous until we remount the filesystem and >>>>> >>>> partion >>>> >>>> >>>>> becomes >>>>> 'read-write'. >>>>> >>>>> We could not figure it out what is the root cause of the system. >>>>> >>>>> We are using individual EMC luns and are configured with LVM volume >>>>> groups >>>>> >>>> and >>>> >>>> >>>>> then mounted on logical >>>>> volumes. >>>>> >>>>> Here i am giving the server description: >>>>> >>>>> ____________________________________________________________ >>>>> >>>>> [root at server ~]# lsmod |grep -i qla >>>>> qla2300 130304 0 >>>>> qla2xxx_conf 305924 0 >>>>> qla2xxx 307448 21 qla2300 >>>>> scsi_mod 117709 5 sg,emcp,qla2xxx,cciss,sd_mod >>>>> >>>>> ____________________________________________________________ >>>>> [root at server ~]# cat /etc/modprobe.conf >>>>> alias eth0 tg3 >>>>> alias eth1 tg3 >>>>> alias eth2 e1000 >>>>> alias eth3 e1000 >>>>> alias eth4 e1000 >>>>> alias eth5 e1000 >>>>> alias bond0 bonding >>>>> alias scsi_hostadapter cciss >>>>> options bond0 max_bonds=2 miimon=100 mode=1 >>>>> alias scsi_hostadapter1 qla2xxx >>>>> alias scsi_hostadapter2 qla2xxx_conf >>>>> #alias scsi_hostadapter3 qla6312 >>>>> options qla2xxx ql2xmaxqdepth=16 qlport_down_retry=64 >>>>> ql2xloginretrycount=30 ql2xfailover=0 ql2xlbType=0 >>>>> install qla2xxx /sbin/modprobe qla2xxx_conf; /sbin/modprobe >>>>> --ignore-install qla2xxx >>>>> remove qla2xxx /sbin/modprobe -r --first-time --ignore-remove qla2xxx >>>>> && { /sbin/modprobe -r --ignore-remove qla2xxx_conf; } >>>>> ###BEGINPP >>>>> include /etc/modprobe.conf.pp >>>>> ###ENDPP >>>>> ###BEGINPP >>>>> include /etc/modprobe.conf.pp >>>>> ###ENDPP >>>>> ###BEGINPP >>>>> include /etc/modprobe.conf.pp >>>>> ###ENDPP >>>>> >>>>> ________________________________________________ >>>>> [root at server ~]# rpm -qa |grep -i EMC >>>>> EMCpower.LINUX-4.5.1-022 >>>>> >>>>> ________________________________________________ >>>>> [root at server ~]# rpm -qa|grep -i scli >>>>> scli-1.06.16-57 >>>>> >>>>> ________________________________________________ >>>>> [root at server ~]# rpm -qa|grep -i nav >>>>> naviagentcli-6.19.1.3.0-1 >>>>> >>>>> ________________________________________________ >>>>> product: QLA2312 Fibre Channel Adapter >>>>> >>>>> ________________________________________________ >>>>> [root at server ~]# rpm -qa|grep -i lvm >>>>> lvm2-2.02.06-6.0.RHEL4 >>>>> system-config-lvm-1.0.19-1.0 >>>>> >>>>> ________________________________________________ >>>>> >>>>> If I missed any info, pl. let me know. >>>>> >>>>> It would be really appreciated if I get some hints to solve the issues >>>>> >>>>> Thanks in advance >>>>> -swapana >>>>> >>> ___________________________________________________________________________ >>> _________ >>> >>> >>>>> Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's >>>>> updated >>>>> >>>> for today's economy) at Yahoo! Games. >>>> > === message truncated === > > > > ____________________________________________________________________________________ > Tonight's top picks. What will you watch tonight? Preview the hottest shows on Yahoo! TV. > http://tv.yahoo.com/ > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users > > > -- ...................................................................... __ / / Jordi Prats C E / S / C A Dept. de Sistemes /_/ Centre de Supercomputaci? de Catalunya Gran Capit?, 2-4 (Edifici Nexus) ? 08034 Barcelona T. 93 205 6464 ? F. 93 205 6979 ? jprats at cesca.es ...................................................................... From tytso at mit.edu Fri Sep 28 18:55:48 2007 From: tytso at mit.edu (Theodore Tso) Date: Fri, 28 Sep 2007 14:55:48 -0400 Subject: How are alternate superblocks repaired? In-Reply-To: <9501720.1190956696834.JavaMail.root@mswamui-bichon.atl.sa.earthlink.net> References: <9501720.1190956696834.JavaMail.root@mswamui-bichon.atl.sa.earthlink.net> Message-ID: <20070928185548.GH8688@thunk.org> On Fri, Sep 28, 2007 at 01:18:16AM -0400, Thomas Watt wrote: > The Maximum mount count is 30, and I have no reason to believe that > e2fsck has ever been run against this particular FC3 ext filesystem. > I have every reason to believe, however, that fsck has been run on > occasion when I either boot the FC3 system manually and the mount > count is over 30 or when I experience the situation where the > ext_attr goes missing and I then manually boot the system when it is > not clean in the primary superblock. The system was created at the > end of March, 2005 and as you can see from the differences the > backup superblock(s) have never even been touched after their > creation. > > What parameters do you suggest be used when e2fsck is run to repair > the backup superblocks? Hi Tom, There are a couple of things going on here. First of all, out of general paranoia, neither e2fsck nor the kernel touch backup superblocks out of general paranoia. Most of the changes that you pointed out between the primary and backup superblocks are no big deal, and can easily be regenerated by e2fsck. The one exception to is the feature bitmasks. Most of the time it's only tune2fs which makes changes to the feature compatibility bitmasks. Unfortunately, the kernel does make some changes "behind the user's back"; and one of them is the ext_attr feature flag. So thanks for pointing that out, and I'll have to make an enhacement to e2fsck to detect if the backup superblock's compatibility flags are different, and if so, to update the backup superblocks. For now, you can work around this and force an update to the backup superblocks by running the following command as root: e2label /dev/hdXXX "`e2label /dev/hdXXX`" This reads out the label from the filesystem, and thes sets the label to its current value. This will force a copy from the primary to the backup superblocks. Regards, - Ted From tango at tiac.net Fri Sep 28 21:28:18 2007 From: tango at tiac.net (Thomas Watt) Date: Fri, 28 Sep 2007 17:28:18 -0400 (GMT-04:00) Subject: How are alternate superblocks repaired? Message-ID: <12298429.1191014898941.JavaMail.root@mswamui-billy.atl.sa.earthlink.net> Hi Ted, Thanks for the workaround, I appreciate it very much. Cheers, -- Tom -----Original Message----- >From: Theodore Tso >Sent: Sep 28, 2007 2:55 PM >To: Thomas Watt >Cc: Andreas Dilger , ext3-users at redhat.com >Subject: Re: How are alternate superblocks repaired? > >On Fri, Sep 28, 2007 at 01:18:16AM -0400, Thomas Watt wrote: >> The Maximum mount count is 30, and I have no reason to believe that >> e2fsck has ever been run against this particular FC3 ext filesystem. >> I have every reason to believe, however, that fsck has been run on >> occasion when I either boot the FC3 system manually and the mount >> count is over 30 or when I experience the situation where the >> ext_attr goes missing and I then manually boot the system when it is >> not clean in the primary superblock. The system was created at the >> end of March, 2005 and as you can see from the differences the >> backup superblock(s) have never even been touched after their >> creation. >> >> What parameters do you suggest be used when e2fsck is run to repair >> the backup superblocks? > >Hi Tom, > >There are a couple of things going on here. First of all, out of >general paranoia, neither e2fsck nor the kernel touch backup >superblocks out of general paranoia. Most of the changes that you >pointed out between the primary and backup superblocks are no big >deal, and can easily be regenerated by e2fsck. The one exception to >is the feature bitmasks. Most of the time it's only tune2fs which >makes changes to the feature compatibility bitmasks. > >Unfortunately, the kernel does make some changes "behind the user's >back"; and one of them is the ext_attr feature flag. So thanks for >pointing that out, and I'll have to make an enhacement to e2fsck to >detect if the backup superblock's compatibility flags are different, >and if so, to update the backup superblocks. > >For now, you can work around this and force an update to the backup >superblocks by running the following command as root: > >e2label /dev/hdXXX "`e2label /dev/hdXXX`" > >This reads out the label from the filesystem, and thes sets the label >to its current value. This will force a copy from the primary to the >backup superblocks. > >Regards, > > - Ted > From tango at tiac.net Sat Sep 29 07:29:13 2007 From: tango at tiac.net (Thomas Watt) Date: Sat, 29 Sep 2007 03:29:13 -0400 (GMT-04:00) Subject: How are alternate superblocks repaired? Message-ID: <28995484.1191050953557.JavaMail.root@mswamui-billy.atl.sa.earthlink.net> Hi Ted, I just wanted to give you some feedback on running the e2label command to fix the problem of backup superblock inconsistency with the primary superblock. Since Linux filesystem name labels are optional and my filesystem volume name was not set, I wondered if that would make a difference. It did not. I did not opt to set a label, but just followed your suggested command. The following fields were updated: Filesystem features Free blocks Free inodes Last mount time Last write time Mount count Last checked Next check after The only field not updated was the Filesystem state field. So, all of the backup superblocks remain "not clean" and are now at least a lot closer to being consistent with the primary superblock - just not quite there yet as far as being usable in case the primary superblock gets hosed. At this point I don't suppose there is anyway for e2fsck to make the backup superblocks "clean" (i.e. only when the primary is clean) until your enhancement gets released. It was fairly easy to make this assessment using the script I wrote to dump all of the superblocks and make the comparisons of before and after superblock states. Checking the result was the easy part. I want to make a few changes, test them out and donate the script to the e2fsprogs project. It should make it just a little bit easier for system administrators to keep an eye on the backup superblocks, and you also might find it useful in testing your enhancement to e2fsck. The only caveat is that the script has not been tested on ext2/ext3 filesystems with blocksizes of 1024 or 2048s. There are provisions for 1024 and 2048 blocksized sytsems - that's the speculative part of the script that needs testing - assumptions always need testing/challenging - right? :) I hope this feedback helps in your enhancement efforts to e2fsck. Regards, -- Tom -----Original Message----- >From: Theodore Tso >Sent: Sep 28, 2007 2:55 PM >To: Thomas Watt >Cc: Andreas Dilger , ext3-users at redhat.com >Subject: Re: How are alternate superblocks repaired? > >On Fri, Sep 28, 2007 at 01:18:16AM -0400, Thomas Watt wrote: >> The Maximum mount count is 30, and I have no reason to believe that >> e2fsck has ever been run against this particular FC3 ext filesystem. >> I have every reason to believe, however, that fsck has been run on >> occasion when I either boot the FC3 system manually and the mount >> count is over 30 or when I experience the situation where the >> ext_attr goes missing and I then manually boot the system when it is >> not clean in the primary superblock. The system was created at the >> end of March, 2005 and as you can see from the differences the >> backup superblock(s) have never even been touched after their >> creation. >> >> What parameters do you suggest be used when e2fsck is run to repair >> the backup superblocks? > >Hi Tom, > >There are a couple of things going on here. First of all, out of >general paranoia, neither e2fsck nor the kernel touch backup >superblocks out of general paranoia. Most of the changes that you >pointed out between the primary and backup superblocks are no big >deal, and can easily be regenerated by e2fsck. The one exception to >is the feature bitmasks. Most of the time it's only tune2fs which >makes changes to the feature compatibility bitmasks. > >Unfortunately, the kernel does make some changes "behind the user's >back"; and one of them is the ext_attr feature flag. So thanks for >pointing that out, and I'll have to make an enhacement to e2fsck to >detect if the backup superblock's compatibility flags are different, >and if so, to update the backup superblocks. > >For now, you can work around this and force an update to the backup >superblocks by running the following command as root: > >e2label /dev/hdXXX "`e2label /dev/hdXXX`" > >This reads out the label from the filesystem, and thes sets the label >to its current value. This will force a copy from the primary to the >backup superblocks. > >Regards, > > - Ted > From tytso at mit.edu Sat Sep 29 13:01:21 2007 From: tytso at mit.edu (Theodore Tso) Date: Sat, 29 Sep 2007 09:01:21 -0400 Subject: How are alternate superblocks repaired? In-Reply-To: <28995484.1191050953557.JavaMail.root@mswamui-billy.atl.sa.earthlink.net> References: <28995484.1191050953557.JavaMail.root@mswamui-billy.atl.sa.earthlink.net> Message-ID: <20070929130121.GA1541@thunk.org> On Sat, Sep 29, 2007 at 03:29:13AM -0400, Thomas Watt wrote: > The only field not updated was the Filesystem state field. So, all > of the backup superblocks remain "not clean" and are now at least a > lot closer to being consistent with the primary superblock - just > not quite there yet as far as being usable in case the primary > superblock gets hosed. That's by design. The backup superblock always have the filesystem state set to "not clean". They are written out that way! Keep in mind that kernel does *not* update the backup superblocks under normal operations. So by definition, fields such as the free blocks, free inodes, last mount time, mount count, are always going to be out of date in the backup superblocks. AND THAT'S OK. The whole point of the backup superblocks is to have an extra copy of the fundamental filesystem parameters --- the blocksize, the number of inodes per block group, the block group size, the location of the inode table and the allocation bitmaps, and so on. That doesn't change under normal circumstances except when the filesystem is resized, so that's why it's OK for the kernel to not bother to update them. If the primary superblock is destroyed, e2fsck will use the backup superblocks to reconstruct the filesystem, and in the process of reconstructing the filesystem, it will update the free blocks, free inodes, and the other more transient portions of the filesystem. I'm not sure why you are so concerned about keeping every last field in the backup superblocks identical to that of the primary. There are lots of good reasons why they are not the same; the less they are modified, more likely they won't get corrupted or otherwise messed up. (For example, in addition to making the umount operation take a lot longer, the fact that the kernel never writes the backup superblocks means that we don't have to worry about what happens if the in-memory copy of the superblocks are corrupted --- say because the system administrator was too cheap to use ECC memory --- even if they are written to the primaries, the backups will still be OK for e2fsck to use for recovery purposes.) - Ted