From evilninja at gmx.net Fri Jul 1 00:20:13 2005 From: evilninja at gmx.net (evilninja) Date: Fri, 01 Jul 2005 02:20:13 +0200 Subject: [Q] Is errors=panic safe to use, and will it detect a RAID gone psycho? In-Reply-To: References: Message-ID: <42C48C3D.50305@gmx.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Maurice Volaski schrieb: > If I set the error behavior with tune2fs to panic, would this happen? > That is, is this the type of error that would trigger a panic? Are there > minor errors that could unnecessarily trigger one? panic seems to be triggered in fs/ext3/super.c, there are some conditions to met in ext3_handle_error() and ext3_abort(). either get a clue from that or perhaps some ext2 guru can comment as to *when* exactly a panic is triggered ;-) - -- BOFH excuse #12: dry joints on cable plug -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCxIw9C/PVm5+NVoYRAqVRAJ4iTIbmFvi1OoqqcZPyuFtzeo7OkQCg2fAO kIOJsD6artMMh49BIYfj1Ks= =thhq -----END PGP SIGNATURE----- From evilninja at gmx.net Fri Jul 1 00:23:35 2005 From: evilninja at gmx.net (evilninja) Date: Fri, 01 Jul 2005 02:23:35 +0200 Subject: Assertion failure in do_get_write_access() In-Reply-To: <20050626151224.L25249-100000@xs3.xs4all.nl> References: <20050626151224.L25249-100000@xs3.xs4all.nl> Message-ID: <42C48D07.9090801@gmx.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Yuri van Oers schrieb: > Hi, > > I just had my server cry this out to the console: > > Assertion failure in do_get_write_access() at transaction.c:658: > "jh->b_transaction == journal->j_committing_transaction" > kernel BUG at transaction.c:658! is it reproducible? does it happpen with later/earlier kernels too? was the fs corrupted after this? if yes, did fsck fix it? - -- BOFH excuse #115: your keyboard's space bar is generating spurious keycodes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCxI0HC/PVm5+NVoYRAmNdAJ0Z3LgMzliSJoRks23q2h/4F+j9wQCfc/Lx 6wXFBsuKbsRZaDXk2nImZps= =bXLa -----END PGP SIGNATURE----- From evilninja at gmx.net Fri Jul 1 00:38:04 2005 From: evilninja at gmx.net (evilninja) Date: Fri, 01 Jul 2005 02:38:04 +0200 Subject: How to figure out underlying failed disk(parttions) and sector(s) position ??? In-Reply-To: <20050628220151.27698.qmail@web30202.mail.mud.yahoo.com> References: <20050628220151.27698.qmail@web30202.mail.mud.yahoo.com> Message-ID: <42C4906C.2010008@gmx.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 ha haha schrieb: > Jun 21 16:55:09 host1 kernel: end_request: I/O error, > dev 03:0b (hda), sector 196487120 afaict, this message comes from drivers/block/ll_rw_blk.c, so not really ext2 specific and should probably go to lkml (mind to send a patch?) i think the reiserfs folks had somethnig simliar in the works for reiserfs-specific error messages but i can't remember the name of the (mail)thread. > Q3: what does the "high=13, low=16200426" means? again not ext2 spcific, as this is from drivers/ide/ide-lib.c. - -- BOFH excuse #47: Complete Transient Lockout -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCxJBsC/PVm5+NVoYRAkdFAJ455ZWjNo9XulUZy2wJl+6H4AMqZwCg1fZ0 zzlr1OYCyAiLMJXG1Fyi5Cc= =09yx -----END PGP SIGNATURE----- From bunk at stusta.de Sat Jul 2 23:51:12 2005 From: bunk at stusta.de (Adrian Bunk) Date: Sun, 3 Jul 2005 01:51:12 +0200 Subject: [2.6 patch] fs/jbd/: possible cleanups Message-ID: <20050702235112.GK5346@stusta.de> This patch contains the following possible cleanups: - make needlessly global functions static - journal.c: remove the unused global function __journal_internal_check and move the check to journal_init - remove the following write-only global variable: - journal.c: current_journal - remove the following unneeded EXPORT_SYMBOL's: - journal.c: journal_check_used_features - journal.c: journal_recover Signed-off-by: Adrian Bunk --- This patch was already sent on: - 14 Jun 2005 fs/jbd/journal.c | 41 ++++++++++++++++++----------------------- fs/jbd/revoke.c | 3 ++- include/linux/jbd.h | 3 --- 3 files changed, 20 insertions(+), 27 deletions(-) --- linux-2.6.12-rc6-mm1-full/include/linux/jbd.h.old 2005-06-14 03:58:20.000000000 +0200 +++ linux-2.6.12-rc6-mm1-full/include/linux/jbd.h 2005-06-14 04:00:56.000000000 +0200 @@ -900,8 +900,6 @@ int start, int len, int bsize); extern journal_t * journal_init_inode (struct inode *); extern int journal_update_format (journal_t *); -extern int journal_check_used_features - (journal_t *, unsigned long, unsigned long, unsigned long); extern int journal_check_available_features (journal_t *, unsigned long, unsigned long, unsigned long); extern int journal_set_features @@ -914,7 +912,6 @@ extern int journal_skip_recovery (journal_t *); extern void journal_update_superblock (journal_t *, int); extern void __journal_abort_hard (journal_t *); -extern void __journal_abort_soft (journal_t *, int); extern void journal_abort (journal_t *, int); extern int journal_errno (journal_t *); extern void journal_ack_err (journal_t *); --- linux-2.6.12-rc6-mm1-full/fs/jbd/journal.c.old 2005-06-14 03:57:39.000000000 +0200 +++ linux-2.6.12-rc6-mm1-full/fs/jbd/journal.c 2005-06-14 04:08:24.000000000 +0200 @@ -59,13 +59,11 @@ EXPORT_SYMBOL(journal_init_dev); EXPORT_SYMBOL(journal_init_inode); EXPORT_SYMBOL(journal_update_format); -EXPORT_SYMBOL(journal_check_used_features); EXPORT_SYMBOL(journal_check_available_features); EXPORT_SYMBOL(journal_set_features); EXPORT_SYMBOL(journal_create); EXPORT_SYMBOL(journal_load); EXPORT_SYMBOL(journal_destroy); -EXPORT_SYMBOL(journal_recover); EXPORT_SYMBOL(journal_update_superblock); EXPORT_SYMBOL(journal_abort); EXPORT_SYMBOL(journal_errno); @@ -81,6 +79,7 @@ EXPORT_SYMBOL(journal_force_commit); static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *); +static void __journal_abort_soft (journal_t *journal, int errno); /* * Helper function used to manage commit timeouts @@ -93,16 +92,6 @@ wake_up_process(p); } -/* Static check for data structure consistency. There's no code - * invoked --- we'll just get a linker failure if things aren't right. - */ -void __journal_internal_check(void) -{ - extern void journal_bad_superblock_size(void); - if (sizeof(struct journal_superblock_s) != 1024) - journal_bad_superblock_size(); -} - /* * kjournald: The main thread function used to manage a logging device * journal. @@ -119,16 +108,12 @@ * known as checkpointing, and this thread is responsible for that job. */ -journal_t *current_journal; // AKPM: debug - -int kjournald(void *arg) +static int kjournald(void *arg) { journal_t *journal = (journal_t *) arg; transaction_t *transaction; struct timer_list timer; - current_journal = journal; - daemonize("kjournald"); /* Set up an interval timer which can be used to trigger a @@ -1181,8 +1166,10 @@ * features. Return true (non-zero) if it does. **/ -int journal_check_used_features (journal_t *journal, unsigned long compat, - unsigned long ro, unsigned long incompat) +static int journal_check_used_features (journal_t *journal, + unsigned long compat, + unsigned long ro, + unsigned long incompat) { journal_superblock_t *sb; @@ -1439,7 +1426,7 @@ * device this journal is present. */ -const char *journal_dev_name(journal_t *journal, char *buffer) +static const char *journal_dev_name(journal_t *journal, char *buffer) { struct block_device *bdev; @@ -1485,7 +1472,7 @@ /* Soft abort: record the abort error status in the journal superblock, * but don't do any other IO. */ -void __journal_abort_soft (journal_t *journal, int errno) +static void __journal_abort_soft (journal_t *journal, int errno) { if (journal->j_flags & JFS_ABORT) return; @@ -1888,7 +1875,7 @@ static struct proc_dir_entry *proc_jbd_debug; -int read_jbd_debug(char *page, char **start, off_t off, +static int read_jbd_debug(char *page, char **start, off_t off, int count, int *eof, void *data) { int ret; @@ -1898,7 +1885,7 @@ return ret; } -int write_jbd_debug(struct file *file, const char __user *buffer, +static int write_jbd_debug(struct file *file, const char __user *buffer, unsigned long count, void *data) { char buf[32]; @@ -1987,6 +1974,14 @@ { int ret; +/* Static check for data structure consistency. There's no code + * invoked --- we'll just get a linker failure if things aren't right. + */ + extern void journal_bad_superblock_size(void); + if (sizeof(struct journal_superblock_s) != 1024) + journal_bad_superblock_size(); + + ret = journal_init_caches(); if (ret != 0) journal_destroy_caches(); --- linux-2.6.12-rc6-mm1-full/fs/jbd/revoke.c.old 2005-06-14 03:58:36.000000000 +0200 +++ linux-2.6.12-rc6-mm1-full/fs/jbd/revoke.c 2005-06-14 03:58:41.000000000 +0200 @@ -116,7 +116,8 @@ (block << (hash_shift - 12))) & (table->hash_size - 1); } -int insert_revoke_hash(journal_t *journal, unsigned long blocknr, tid_t seq) +static int insert_revoke_hash(journal_t *journal, unsigned long blocknr, + tid_t seq) { struct list_head *hash_list; struct jbd_revoke_record_s *record; From lists at wolfram.schlich.org Fri Jul 8 16:03:37 2005 From: lists at wolfram.schlich.org (Wolfram Schlich) Date: Fri, 8 Jul 2005 18:03:37 +0200 Subject: Accidentally issued "mkswap" on ext3 fs -- recovery possible? Message-ID: <20050708160337.ALLYOURBASEAREBELONGTOUS.A15231@bla.fasel.org> Hi, I accidentally issued "mkswap" on a used ext3 fs partition (~30G) :-/ I have analyzed the behaviour of mkswap using two test files and it appears to only change "some" bytes: --8<-- --- swap2.xxd 2005-07-04 21:00:10.157261360 +0200 +++ swap1.xxd 2005-07-04 21:00:01.894517488 +0200 @@ -62,7 +62,7 @@ 00003d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00003e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00003f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ -0000400: 0000 0000 0000 0000 0000 0000 0000 0000 ................ +0000400: 0100 0000 ff09 0000 0000 0000 0000 0000 ................ 0000410: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000420: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000430: 0000 0000 0000 0000 0000 0000 0000 0000 ................ @@ -253,7 +253,7 @@ 0000fc0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000fd0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000fe0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ -0000ff0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ +0000ff0: 0000 0000 0000 5357 4150 5350 4143 4532 ......SWAPSPACE2 0001000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0001010: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0001020: 0000 0000 0000 0000 0000 0000 0000 0000 ................ --8<-- I created an image (hdb1.img) of the damaged partition using dd and tried to work with various tools on it. Here is the output of 'fsck.ext3 -n -v hdb1.img': --8<-- e2fsck 1.35 (28-Feb-2004) Couldn't find ext2 superblock, trying backup blocks... hdb1.img was not cleanly unmounted, check forced. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Free blocks count wrong for group #0 (24043, counted=0). Fix? no Free blocks count wrong for group #1 (32250, counted=0). Fix? no Free blocks count wrong for group #2 (32253, counted=0). Fix? no Free blocks count wrong for group #3 (32250, counted=158). Fix? no Free blocks count wrong for group #4 (32253, counted=8). Fix? no Free blocks count wrong for group #5 (32250, counted=28). Fix? no Free blocks count wrong for group #6 (32253, counted=6822). Fix? no Free blocks count wrong for group #7 (32250, counted=10428). Fix? no Free blocks count wrong for group #8 (32253, counted=11170). Fix? no Free blocks count wrong for group #9 (32250, counted=4239). Fix? no Free blocks count wrong for group #10 (32253, counted=24482). Fix? no Free blocks count wrong for group #11 (32253, counted=21184). Fix? no Free blocks count wrong for group #12 (32253, counted=25657). Fix? no Free blocks count wrong for group #13 (32253, counted=13674). Fix? no Free blocks count wrong for group #14 (32253, counted=15007). Fix? no Free blocks count wrong for group #15 (32253, counted=11366). Fix? no [ removed many lines, complete log file at http://wolfram.schlich.org/tmp/fsck.ext3_-n_-v_hdb1.img ] Free inodes count wrong for group #213 (16416, counted=14498). Fix? no Directories count wrong for group #213 (0, counted=241). Fix? no Free inodes count wrong for group #214 (16416, counted=14524). Fix? no Directories count wrong for group #214 (0, counted=126). Fix? no Free inodes count wrong for group #215 (16416, counted=14441). Fix? no Directories count wrong for group #215 (0, counted=114). Fix? no Free inodes count wrong for group #216 (16416, counted=15214). Fix? no Directories count wrong for group #216 (0, counted=99). Fix? no Free inodes count wrong for group #217 (16416, counted=14898). Fix? no Directories count wrong for group #217 (0, counted=216). Fix? no Free inodes count wrong for group #218 (16416, counted=14878). Fix? no Directories count wrong for group #218 (0, counted=187). Fix? no Free inodes count wrong for group #219 (16416, counted=16033). Fix? no Directories count wrong for group #219 (0, counted=37). Fix? no Free inodes count wrong for group #220 (16416, counted=14949). Fix? no Directories count wrong for group #220 (0, counted=128). Fix? no Free inodes count wrong for group #221 (16416, counted=15167). Fix? no Directories count wrong for group #221 (0, counted=102). Fix? no Free inodes count wrong for group #222 (16416, counted=15908). Fix? no Directories count wrong for group #222 (0, counted=79). Fix? no Free inodes count wrong for group #223 (16416, counted=14719). Fix? no Directories count wrong for group #223 (0, counted=117). Fix? no Free inodes count wrong for group #224 (16416, counted=14212). Fix? no Directories count wrong for group #224 (0, counted=165). Fix? no Free inodes count wrong for group #225 (16416, counted=14104). Fix? no Directories count wrong for group #225 (0, counted=118). Fix? no Free inodes count wrong for group #226 (16416, counted=14634). Fix? no Directories count wrong for group #226 (0, counted=227). Fix? no Free inodes count wrong for group #227 (16416, counted=14616). Fix? no Directories count wrong for group #227 (0, counted=198). Fix? no Free inodes count wrong for group #228 (16416, counted=14622). Fix? no Directories count wrong for group #228 (0, counted=139). Fix? no Free inodes count wrong (3759253, counted=3348765). Fix? no hdb1.img: ********** WARNING: Filesystem still has errors ********** 11 inodes used (0%) 7136 non-contiguous inodes (64872.7%) # of inodes with ind/dind/tind blocks: 50303/388/0 126175 blocks used (1%) 0 bad blocks 0 large files 377976 regular files 30868 directories 0 character device files 0 block device files 13 fifos 48 links 1632 symbolic links (1631 fast symbolic links) 0 sockets -------- 410537 files --8<-- I guess if I would let fsck "fix" it, the damage would be bigger than the benefit -- those numbers look scary to me: --8<-- 11 inodes used (0%) 7136 non-contiguous inodes (64872.7%) # of inodes with ind/dind/tind blocks: 50303/388/0 126175 blocks used (1%) --8<-- I haven't even tried to have it "fix" the fs on the image file. What do you think? e2salvage doesn't recognize any superblocks (not even the backup superblocks dumpe2fs happily uses to display some information on the fs image), maybe because it's not an ext2 but ext3 fs. Well. Currently I'm running e2retrieve on the image to see whether this is able to do some recovery, but no results yet. Any suggestions? It's hard to believe that those few changed bytes should make the whole fs unrecoverable, isn't it? Thanks in advance! -- Wolfram Schlich From adilger at clusterfs.com Fri Jul 8 16:51:37 2005 From: adilger at clusterfs.com (Andreas Dilger) Date: Fri, 8 Jul 2005 10:51:37 -0600 Subject: Accidentally issued "mkswap" on ext3 fs -- recovery possible? In-Reply-To: <20050708160337.ALLYOURBASEAREBELONGTOUS.A15231@bla.fasel.org> References: <20050708160337.ALLYOURBASEAREBELONGTOUS.A15231@bla.fasel.org> Message-ID: <20050708165137.GB5335@schatzie.adilger.int> On Jul 08, 2005 18:03 +0200, Wolfram Schlich wrote: > I accidentally issued "mkswap" on a used ext3 fs partition (~30G) :-/ > > I have analyzed the behaviour of mkswap using two test files and it > appears to only change "some" bytes: > --8<-- > --- swap2.xxd 2005-07-04 21:00:10.157261360 +0200 > +++ swap1.xxd 2005-07-04 21:00:01.894517488 +0200 > @@ -62,7 +62,7 @@ > 00003d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > 00003e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > 00003f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > -0000400: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > +0000400: 0100 0000 ff09 0000 0000 0000 0000 0000 ................ > 0000410: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > 0000420: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > 0000430: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > @@ -253,7 +253,7 @@ > 0000fc0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > 0000fd0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > 0000fe0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > -0000ff0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > +0000ff0: 0000 0000 0000 5357 4150 5350 4143 4532 ......SWAPSPACE2 > 0001000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > 0001010: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > 0001020: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > --8<-- Try starting with a test file which is not all zero (e.g. copy from /dev/urandom) and see how much is changed. > Any suggestions? It's hard to believe that those few changed bytes > should make the whole fs unrecoverable, isn't it? Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From hahaha_30k at yahoo.com Fri Jul 8 17:33:19 2005 From: hahaha_30k at yahoo.com (ha haha) Date: Fri, 8 Jul 2005 10:33:19 -0700 (PDT) Subject: Accidentally issued "mkswap" on ext3 fs -- recovery possible? In-Reply-To: <20050708160337.ALLYOURBASEAREBELONGTOUS.A15231@bla.fasel.org> Message-ID: <20050708173319.57811.qmail@web30202.mail.mud.yahoo.com> Try to follow the steps below: 1, save all the contents on the partitions to another hard disk, or a big file. So that you test work will not destroy any useful data. dd if= of= 2, run " mkfs.ext3 -n " to get a series of super-block backup copies. for example: Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 2388787 3, use the super block backups near the end of the partitions to recovery file system structure. e2fsck -b 20480000 4, If the above doesn't work, try the following to recover as many file contents as possible: dd if= | strings > allStrings.txt Then try to read the big jumboFile and recover paragraphs. --- Wolfram Schlich wrote: > Hi, > > I accidentally issued "mkswap" on a used ext3 fs > partition (~30G) :-/ > > I have analyzed the behaviour of mkswap using two > test files and it > appears to only change "some" bytes: > --8<-- > --- swap2.xxd 2005-07-04 21:00:10.157261360 +0200 > +++ swap1.xxd 2005-07-04 21:00:01.894517488 +0200 > @@ -62,7 +62,7 @@ > 00003d0: 0000 0000 0000 0000 0000 0000 0000 0000 > ................ > 00003e0: 0000 0000 0000 0000 0000 0000 0000 0000 > ................ > 00003f0: 0000 0000 0000 0000 0000 0000 0000 0000 > ................ > -0000400: 0000 0000 0000 0000 0000 0000 0000 0000 > ................ > +0000400: 0100 0000 ff09 0000 0000 0000 0000 0000 > ................ > 0000410: 0000 0000 0000 0000 0000 0000 0000 0000 > ................ > 0000420: 0000 0000 0000 0000 0000 0000 0000 0000 > ................ > 0000430: 0000 0000 0000 0000 0000 0000 0000 0000 > ................ > @@ -253,7 +253,7 @@ > 0000fc0: 0000 0000 0000 0000 0000 0000 0000 0000 > ................ > 0000fd0: 0000 0000 0000 0000 0000 0000 0000 0000 > ................ > 0000fe0: 0000 0000 0000 0000 0000 0000 0000 0000 > ................ > -0000ff0: 0000 0000 0000 0000 0000 0000 0000 0000 > ................ > +0000ff0: 0000 0000 0000 5357 4150 5350 4143 4532 > ......SWAPSPACE2 > 0001000: 0000 0000 0000 0000 0000 0000 0000 0000 > ................ > 0001010: 0000 0000 0000 0000 0000 0000 0000 0000 > ................ > 0001020: 0000 0000 0000 0000 0000 0000 0000 0000 > ................ > --8<-- > > I created an image (hdb1.img) of the damaged > partition using dd > and tried to work with various tools on it. > > Here is the output of 'fsck.ext3 -n -v hdb1.img': > --8<-- > e2fsck 1.35 (28-Feb-2004) > Couldn't find ext2 superblock, trying backup > blocks... > hdb1.img was not cleanly unmounted, check forced. > Pass 1: Checking inodes, blocks, and sizes > Pass 2: Checking directory structure > Pass 3: Checking directory connectivity > Pass 4: Checking reference counts > Pass 5: Checking group summary information > Free blocks count wrong for group #0 (24043, > counted=0). > Fix? no > > Free blocks count wrong for group #1 (32250, > counted=0). > Fix? no > > Free blocks count wrong for group #2 (32253, > counted=0). > Fix? no > > Free blocks count wrong for group #3 (32250, > counted=158). > Fix? no > > Free blocks count wrong for group #4 (32253, > counted=8). > Fix? no > > Free blocks count wrong for group #5 (32250, > counted=28). > Fix? no > > Free blocks count wrong for group #6 (32253, > counted=6822). > Fix? no > > Free blocks count wrong for group #7 (32250, > counted=10428). > Fix? no > > Free blocks count wrong for group #8 (32253, > counted=11170). > Fix? no > > Free blocks count wrong for group #9 (32250, > counted=4239). > Fix? no > > Free blocks count wrong for group #10 (32253, > counted=24482). > Fix? no > > Free blocks count wrong for group #11 (32253, > counted=21184). > Fix? no > > Free blocks count wrong for group #12 (32253, > counted=25657). > Fix? no > > Free blocks count wrong for group #13 (32253, > counted=13674). > Fix? no > > Free blocks count wrong for group #14 (32253, > counted=15007). > Fix? no > > Free blocks count wrong for group #15 (32253, > counted=11366). > Fix? no > > [ removed many lines, complete log file at > > http://wolfram.schlich.org/tmp/fsck.ext3_-n_-v_hdb1.img > ] > > Free inodes count wrong for group #213 (16416, > counted=14498). > Fix? no > > Directories count wrong for group #213 (0, > counted=241). > Fix? no > > Free inodes count wrong for group #214 (16416, > counted=14524). > Fix? no > > Directories count wrong for group #214 (0, > counted=126). > Fix? no > > Free inodes count wrong for group #215 (16416, > counted=14441). > Fix? no > > Directories count wrong for group #215 (0, > counted=114). > Fix? no > > Free inodes count wrong for group #216 (16416, > counted=15214). > Fix? no > > Directories count wrong for group #216 (0, > counted=99). > Fix? no > > Free inodes count wrong for group #217 (16416, > counted=14898). > Fix? no > > Directories count wrong for group #217 (0, > counted=216). > Fix? no > > Free inodes count wrong for group #218 (16416, > counted=14878). > Fix? no > > Directories count wrong for group #218 (0, > counted=187). > Fix? no > > Free inodes count wrong for group #219 (16416, > counted=16033). > Fix? no > > Directories count wrong for group #219 (0, > counted=37). > Fix? no > > Free inodes count wrong for group #220 (16416, > counted=14949). > Fix? no > > Directories count wrong for group #220 (0, > counted=128). > Fix? no > > Free inodes count wrong for group #221 (16416, > counted=15167). > Fix? no > > Directories count wrong for group #221 (0, > counted=102). > === message truncated === ____________________________________________________ Sell on Yahoo! Auctions ? no fees. Bid on great items. http://auctions.yahoo.com/ From adilger at clusterfs.com Fri Jul 8 18:20:25 2005 From: adilger at clusterfs.com (Andreas Dilger) Date: Fri, 8 Jul 2005 12:20:25 -0600 Subject: Accidentally issued "mkswap" on ext3 fs -- recovery possible? In-Reply-To: <20050708173319.57811.qmail@web30202.mail.mud.yahoo.com> References: <20050708160337.ALLYOURBASEAREBELONGTOUS.A15231@bla.fasel.org> <20050708173319.57811.qmail@web30202.mail.mud.yahoo.com> Message-ID: <20050708182025.GC5335@schatzie.adilger.int> On Jul 08, 2005 10:33 -0700, ha haha wrote: > > I accidentally issued "mkswap" on a used ext3 fs > > partition (~30G) :-/ > > > > I have analyzed the behaviour of mkswap using two > > test files and it > > appears to only change "some" bytes: > > --8<-- > > --- swap2.xxd 2005-07-04 21:00:10.157261360 +0200 > > +++ swap1.xxd 2005-07-04 21:00:01.894517488 +0200 > > @@ -62,7 +62,7 @@ > > 00003d0: 0000 0000 0000 0000 0000 0000 0000 0000 > > ................ > > 00003e0: 0000 0000 0000 0000 0000 0000 0000 0000 > > ................ > > 00003f0: 0000 0000 0000 0000 0000 0000 0000 0000 > > ................ > > -0000400: 0000 0000 0000 0000 0000 0000 0000 0000 > > ................ > > +0000400: 0100 0000 ff09 0000 0000 0000 0000 0000 > > ................ > > 0000410: 0000 0000 0000 0000 0000 0000 0000 0000 > > ................ > > 0000420: 0000 0000 0000 0000 0000 0000 0000 0000 > > ................ > > 0000430: 0000 0000 0000 0000 0000 0000 0000 0000 > > ................ > > @@ -253,7 +253,7 @@ > > 0000fc0: 0000 0000 0000 0000 0000 0000 0000 0000 > > ................ > > 0000fd0: 0000 0000 0000 0000 0000 0000 0000 0000 > > ................ > > 0000fe0: 0000 0000 0000 0000 0000 0000 0000 0000 > > ................ > > -0000ff0: 0000 0000 0000 0000 0000 0000 0000 0000 > > ................ > > +0000ff0: 0000 0000 0000 5357 4150 5350 4143 4532 > > ......SWAPSPACE2 > > 0001000: 0000 0000 0000 0000 0000 0000 0000 0000 > > ................ > > 0001010: 0000 0000 0000 0000 0000 0000 0000 0000 > > ................ > > 0001020: 0000 0000 0000 0000 0000 0000 0000 0000 > > ................ > > --8<-- > > > > Here is the output of 'fsck.ext3 -n -v hdb1.img': > > --8<-- > > e2fsck 1.35 (28-Feb-2004) > > Couldn't find ext2 superblock, trying backup > > blocks... > > hdb1.img was not cleanly unmounted, check forced. > > Pass 1: Checking inodes, blocks, and sizes > > Pass 2: Checking directory structure > > Pass 3: Checking directory connectivity > > Pass 4: Checking reference counts > > Pass 5: Checking group summary information > > Free blocks count wrong for group #0 (24043, > > counted=0). > > Fix? no > > > > Free blocks count wrong for group #1 (32250, > > counted=0). > > Fix? no Should have read original email better. This is only be blocks count summary, which is never up-to-date on a backup superblock+group descriptor. The checking shows everything is OK. A good time to make a backup! Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From menscher at uiuc.edu Fri Jul 8 18:55:05 2005 From: menscher at uiuc.edu (Damian Menscher) Date: Fri, 8 Jul 2005 13:55:05 -0500 (CDT) Subject: [Q] Is this true and does it mean there is dynamic entation in ext2/3? In-Reply-To: <42B7A0A6.7010806@iki.fi> References: <20050618191451.GC16314@thunk.org> <42B7A0A6.7010806@iki.fi> Message-ID: [Resurrecting an ancient thread...] On Tue, 21 Jun 2005, Markus Peuhkuri wrote: > Theodore Ts'o wrote: > >> Ext2/3 has advanced algorithms to make sure that the blocks that are >> allocated avoid fragmentation, but it is not doing any kind of dynamic >> > And there is a tool 'filefrag' in e2fsprogs that reports how fragmented > a particular file is. If your disk grows full (over 90-95%, depending > on file sizes etc..) then it is more difficult to find continuous blocks > for files. Now, if you delete files, then new files most probably are > non-fragmented but those files that were written when disk was full are > still fragmented. > > You can "unfragment" those files just by copying them and deleting old > ones (if you have plenty of free space), but as Damian told, you must be > careful with locks and nfs handles. I have a user who is complaining of general slowness on his machine (RH9, so e2fsprogs-1.32-6). The slowness is particularly bad when he types "mail" to read his email. I can't find anything wrong with the system itself (no cpu load, free ram, no heavy disk activity, etc) but did note that when opening his mailbox (only 26M, so nothing huge) it does hang for 5-10 secs, apparently waiting on disk. Running filefrag on his mailbox indicates it has 1119 extents. I suspect that may be the source of the problem. Now, I realize some fragmentation is normal, but I'd have expected that a mailbox with 328 messages wouldn't get more than 328 fragments (or possibly twice that many) maximum. And the filesystem is only 42% full, so that's not causing extra fragmentation. Is there anything I should be watching for here, or should I just give up and copy the file? Damian Menscher -- -=#| Physics Grad Student & SysAdmin @ U Illinois Urbana-Champaign |#=- -=#| 488 LLP, 1110 W. Green St, Urbana, IL 61801 Ofc:(217)333-0038 |#=- -=#| 4602 Beckman, VMIL/MS, Imaging Technology Group:(217)244-3074 |#=- -=#| www.uiuc.edu/~menscher/ Fax:(217)333-9819 |#=- -=#| The above opinions are not necessarily those of my employers. |#=- From menscher at uiuc.edu Fri Jul 8 21:55:55 2005 From: menscher at uiuc.edu (Damian Menscher) Date: Fri, 8 Jul 2005 16:55:55 -0500 (CDT) Subject: filesystem fragmentation stats? Message-ID: Let me preface this by saying "Yes, I know *nix filesystems don't need to worry about fragmentation". That said, is there a way to check the overall level of fragmentation of a live ext3 filesystem? I know about filefrag, but that's for specific files. And I think e2fsck tells you, but only if you take the filesystem offline for the scan. Is there anything that will give a percentage for a *live* filesystem? (I have a fs that's been at >95% usage for quite some time, and I want to check for any fragmentation that could have resulted.) Damian Menscher -- -=#| Physics Grad Student & SysAdmin @ U Illinois Urbana-Champaign |#=- -=#| 488 LLP, 1110 W. Green St, Urbana, IL 61801 Ofc:(217)333-0038 |#=- -=#| 4602 Beckman, VMIL/MS, Imaging Technology Group:(217)244-3074 |#=- -=#| www.uiuc.edu/~menscher/ Fax:(217)333-9819 |#=- -=#| The above opinions are not necessarily those of my employers. |#=- From evilninja at gmx.net Fri Jul 8 22:09:00 2005 From: evilninja at gmx.net (evilninja) Date: Sat, 09 Jul 2005 00:09:00 +0200 Subject: filesystem fragmentation stats? In-Reply-To: References: Message-ID: <42CEF97C.4080807@gmx.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Damian Menscher schrieb: > Let me preface this by saying "Yes, I know *nix filesystems don't need > to worry about fragmentation". this was discussed *very* and again in june by Theodore: https://www.redhat.com/archives/ext3-users/2005-June/msg00026.html > That said, is there a way to check the overall level of fragmentation of > a live ext3 filesystem? I know about filefrag, but that's for specific > files. And I think e2fsck tells you, but only if you take the > filesystem offline for the scan. "tune2fs -l" tells you about "Fragments per group", and "fsck.ext2 -nv" opens the fs read-only and print some nice stats after that. - -- BOFH excuse #341: HTTPD Error 666 : BOFH was here -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCzvl8C/PVm5+NVoYRAvc7AJwKRTYhWussiZquiawLNZzjSnSJ7ACg7uoU 39MB4i90ajg+ckER52pqfZ4= =LFM+ -----END PGP SIGNATURE----- From evilninja at gmx.net Fri Jul 8 22:10:09 2005 From: evilninja at gmx.net (evilninja) Date: Sat, 09 Jul 2005 00:10:09 +0200 Subject: filesystem fragmentation stats? In-Reply-To: <42CEF97C.4080807@gmx.net> References: <42CEF97C.4080807@gmx.net> Message-ID: <42CEF9C1.7050908@gmx.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 evilninja schrieb: > Damian Menscher schrieb: > >>>Let me preface this by saying "Yes, I know *nix filesystems don't need >>>to worry about fragmentation". > > > this was discussed *very* and again in june by Theodore: - ---------------------------^ often ;-) - -- BOFH excuse #422: Someone else stole your IP address, call the Internet detectives! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCzvnAC/PVm5+NVoYRAgfTAKDuGNL6CpEcjdIKIVkW8vNBg0+7OACg1L90 7qB6it50hyYlY7hIYq3Gx4U= =omCQ -----END PGP SIGNATURE----- From menscher at uiuc.edu Sat Jul 9 07:15:41 2005 From: menscher at uiuc.edu (Damian Menscher) Date: Sat, 9 Jul 2005 02:15:41 -0500 (CDT) Subject: filesystem fragmentation stats? In-Reply-To: <42CEF97C.4080807@gmx.net> References: <42CEF97C.4080807@gmx.net> Message-ID: On Sat, 9 Jul 2005, evilninja wrote: > Damian Menscher schrieb: >> That said, is there a way to check the overall level of fragmentation of >> a live ext3 filesystem? I know about filefrag, but that's for specific >> files. And I think e2fsck tells you, but only if you take the >> filesystem offline for the scan. > > "tune2fs -l" tells you about "Fragments per group", and "fsck.ext2 -nv" > opens the fs read-only and print some nice stats after that. I noticed the "Fragments per group", but haven't been able to find anything that documents what it means. Could someone here comment? Running e2fsck -nvf gave the info I was looking for: On my mail partition: 93 inodes used (0%) 30 non-contiguous inodes (32.3%) # of inodes with ind/dind/tind blocks: 44/17/0 60540 blocks used (45%) 0 bad blocks 0 large files 80 regular files 4 directories -------- 84 files On my home partition: 191326 inodes used (7%) 11084 non-contiguous inodes (5.8%) # of inodes with ind/dind/tind blocks: 21649/707/0 4581594 blocks used (86%) 0 bad blocks 0 large files 174096 regular files 15787 directories 5 fifos 274 links 1421 symbolic links (1356 fast symbolic links) 8 sockets -------- 191591 files So, interestingly, the home directories haven't gotten too fragmented despite being at >90% usage for several months (much of that time at >95%). Apparently the 5% reserved for the system factors in here. That's certainly a relief! On the other hand, the mail spools are getting horribly fragmented (>30%), probably because mail programs are deleting messages out of the middle of the spools? It's hard to imagine any other reason for 30% fragmentation on a filesystem that's less than half full. (Another system I manage also shows high fragmentation for the mail spool, so I think this must be a generic problem.) Damian Menscher -- -=#| Physics Grad Student & SysAdmin @ U Illinois Urbana-Champaign |#=- -=#| 488 LLP, 1110 W. Green St, Urbana, IL 61801 Ofc:(217)333-0038 |#=- -=#| 4602 Beckman, VMIL/MS, Imaging Technology Group:(217)244-3074 |#=- -=#| www.uiuc.edu/~menscher/ Fax:(217)333-9819 |#=- -=#| The above opinions are not necessarily those of my employers. |#=- From bunk at stusta.de Tue Jul 12 20:27:42 2005 From: bunk at stusta.de (Adrian Bunk) Date: Tue, 12 Jul 2005 22:27:42 +0200 Subject: [2.6 patch] fs/jbd/: possible cleanups Message-ID: <20050712202742.GM4034@stusta.de> This patch contains the following possible cleanups: - make needlessly global functions static - journal.c: remove the unused global function __journal_internal_check and move the check to journal_init - remove the following write-only global variable: - journal.c: current_journal - remove the following unneeded EXPORT_SYMBOL's: - journal.c: journal_check_used_features - journal.c: journal_recover Signed-off-by: Adrian Bunk --- This patch was already sent on: - 3 Jul 2005 - 14 Jun 2005 fs/jbd/journal.c | 41 ++++++++++++++++++----------------------- fs/jbd/revoke.c | 3 ++- include/linux/jbd.h | 3 --- 3 files changed, 20 insertions(+), 27 deletions(-) --- linux-2.6.12-rc6-mm1-full/include/linux/jbd.h.old 2005-06-14 03:58:20.000000000 +0200 +++ linux-2.6.12-rc6-mm1-full/include/linux/jbd.h 2005-06-14 04:00:56.000000000 +0200 @@ -900,8 +900,6 @@ int start, int len, int bsize); extern journal_t * journal_init_inode (struct inode *); extern int journal_update_format (journal_t *); -extern int journal_check_used_features - (journal_t *, unsigned long, unsigned long, unsigned long); extern int journal_check_available_features (journal_t *, unsigned long, unsigned long, unsigned long); extern int journal_set_features @@ -914,7 +912,6 @@ extern int journal_skip_recovery (journal_t *); extern void journal_update_superblock (journal_t *, int); extern void __journal_abort_hard (journal_t *); -extern void __journal_abort_soft (journal_t *, int); extern void journal_abort (journal_t *, int); extern int journal_errno (journal_t *); extern void journal_ack_err (journal_t *); --- linux-2.6.12-rc6-mm1-full/fs/jbd/journal.c.old 2005-06-14 03:57:39.000000000 +0200 +++ linux-2.6.12-rc6-mm1-full/fs/jbd/journal.c 2005-06-14 04:08:24.000000000 +0200 @@ -59,13 +59,11 @@ EXPORT_SYMBOL(journal_init_dev); EXPORT_SYMBOL(journal_init_inode); EXPORT_SYMBOL(journal_update_format); -EXPORT_SYMBOL(journal_check_used_features); EXPORT_SYMBOL(journal_check_available_features); EXPORT_SYMBOL(journal_set_features); EXPORT_SYMBOL(journal_create); EXPORT_SYMBOL(journal_load); EXPORT_SYMBOL(journal_destroy); -EXPORT_SYMBOL(journal_recover); EXPORT_SYMBOL(journal_update_superblock); EXPORT_SYMBOL(journal_abort); EXPORT_SYMBOL(journal_errno); @@ -81,6 +79,7 @@ EXPORT_SYMBOL(journal_force_commit); static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *); +static void __journal_abort_soft (journal_t *journal, int errno); /* * Helper function used to manage commit timeouts @@ -93,16 +92,6 @@ wake_up_process(p); } -/* Static check for data structure consistency. There's no code - * invoked --- we'll just get a linker failure if things aren't right. - */ -void __journal_internal_check(void) -{ - extern void journal_bad_superblock_size(void); - if (sizeof(struct journal_superblock_s) != 1024) - journal_bad_superblock_size(); -} - /* * kjournald: The main thread function used to manage a logging device * journal. @@ -119,16 +108,12 @@ * known as checkpointing, and this thread is responsible for that job. */ -journal_t *current_journal; // AKPM: debug - -int kjournald(void *arg) +static int kjournald(void *arg) { journal_t *journal = (journal_t *) arg; transaction_t *transaction; struct timer_list timer; - current_journal = journal; - daemonize("kjournald"); /* Set up an interval timer which can be used to trigger a @@ -1181,8 +1166,10 @@ * features. Return true (non-zero) if it does. **/ -int journal_check_used_features (journal_t *journal, unsigned long compat, - unsigned long ro, unsigned long incompat) +static int journal_check_used_features (journal_t *journal, + unsigned long compat, + unsigned long ro, + unsigned long incompat) { journal_superblock_t *sb; @@ -1439,7 +1426,7 @@ * device this journal is present. */ -const char *journal_dev_name(journal_t *journal, char *buffer) +static const char *journal_dev_name(journal_t *journal, char *buffer) { struct block_device *bdev; @@ -1485,7 +1472,7 @@ /* Soft abort: record the abort error status in the journal superblock, * but don't do any other IO. */ -void __journal_abort_soft (journal_t *journal, int errno) +static void __journal_abort_soft (journal_t *journal, int errno) { if (journal->j_flags & JFS_ABORT) return; @@ -1888,7 +1875,7 @@ static struct proc_dir_entry *proc_jbd_debug; -int read_jbd_debug(char *page, char **start, off_t off, +static int read_jbd_debug(char *page, char **start, off_t off, int count, int *eof, void *data) { int ret; @@ -1898,7 +1885,7 @@ return ret; } -int write_jbd_debug(struct file *file, const char __user *buffer, +static int write_jbd_debug(struct file *file, const char __user *buffer, unsigned long count, void *data) { char buf[32]; @@ -1987,6 +1974,14 @@ { int ret; +/* Static check for data structure consistency. There's no code + * invoked --- we'll just get a linker failure if things aren't right. + */ + extern void journal_bad_superblock_size(void); + if (sizeof(struct journal_superblock_s) != 1024) + journal_bad_superblock_size(); + + ret = journal_init_caches(); if (ret != 0) journal_destroy_caches(); --- linux-2.6.12-rc6-mm1-full/fs/jbd/revoke.c.old 2005-06-14 03:58:36.000000000 +0200 +++ linux-2.6.12-rc6-mm1-full/fs/jbd/revoke.c 2005-06-14 03:58:41.000000000 +0200 @@ -116,7 +116,8 @@ (block << (hash_shift - 12))) & (table->hash_size - 1); } -int insert_revoke_hash(journal_t *journal, unsigned long blocknr, tid_t seq) +static int insert_revoke_hash(journal_t *journal, unsigned long blocknr, + tid_t seq) { struct list_head *hash_list; struct jbd_revoke_record_s *record; From adilger at clusterfs.com Tue Jul 12 22:32:44 2005 From: adilger at clusterfs.com (Andreas Dilger) Date: Tue, 12 Jul 2005 16:32:44 -0600 Subject: [2.6 patch] fs/jbd/: possible cleanups In-Reply-To: <20050712202742.GM4034@stusta.de> References: <20050712202742.GM4034@stusta.de> Message-ID: <20050712223243.GW5335@schatzie.adilger.int> On Jul 12, 2005 22:27 +0200, Adrian Bunk wrote: > - make needlessly global functions static I had previously commented on this patch: > - journal.c: remove the unused global function __journal_internal_check > and move the check to journal_init I don't mind removing this function, but it shouldn't be put inside #ifdef JBD_DEBUG, as that would remove the check from the compiler-parsed code and defeat the purpose of the check. > - remove the following write-only global variable: > - journal.c: current_journal Seems fine. > - remove the following unneeded EXPORT_SYMBOL's: > - journal.c: journal_check_used_features Should be kept for API completeness. > - remove the following unneeded EXPORT_SYMBOL's: > - journal.c: journal_recover Doesn't appear usable in any case, should be removed. > Signed-off-by: Adrian Bunk > > --- > > This patch was already sent on: > - 3 Jul 2005 > - 14 Jun 2005 > > fs/jbd/journal.c | 41 ++++++++++++++++++----------------------- > fs/jbd/revoke.c | 3 ++- > include/linux/jbd.h | 3 --- > 3 files changed, 20 insertions(+), 27 deletions(-) > > --- linux-2.6.12-rc6-mm1-full/include/linux/jbd.h.old 2005-06-14 03:58:20.000000000 +0200 > +++ linux-2.6.12-rc6-mm1-full/include/linux/jbd.h 2005-06-14 04:00:56.000000000 +0200 > @@ -900,8 +900,6 @@ > int start, int len, int bsize); > extern journal_t * journal_init_inode (struct inode *); > extern int journal_update_format (journal_t *); > -extern int journal_check_used_features > - (journal_t *, unsigned long, unsigned long, unsigned long); > extern int journal_check_available_features > (journal_t *, unsigned long, unsigned long, unsigned long); > extern int journal_set_features > @@ -914,7 +912,6 @@ > extern int journal_skip_recovery (journal_t *); > extern void journal_update_superblock (journal_t *, int); > extern void __journal_abort_hard (journal_t *); > -extern void __journal_abort_soft (journal_t *, int); > extern void journal_abort (journal_t *, int); > extern int journal_errno (journal_t *); > extern void journal_ack_err (journal_t *); > --- linux-2.6.12-rc6-mm1-full/fs/jbd/journal.c.old 2005-06-14 03:57:39.000000000 +0200 > +++ linux-2.6.12-rc6-mm1-full/fs/jbd/journal.c 2005-06-14 04:08:24.000000000 +0200 > @@ -59,13 +59,11 @@ > EXPORT_SYMBOL(journal_init_dev); > EXPORT_SYMBOL(journal_init_inode); > EXPORT_SYMBOL(journal_update_format); > -EXPORT_SYMBOL(journal_check_used_features); > EXPORT_SYMBOL(journal_check_available_features); > EXPORT_SYMBOL(journal_set_features); > EXPORT_SYMBOL(journal_create); > EXPORT_SYMBOL(journal_load); > EXPORT_SYMBOL(journal_destroy); > -EXPORT_SYMBOL(journal_recover); > EXPORT_SYMBOL(journal_update_superblock); > EXPORT_SYMBOL(journal_abort); > EXPORT_SYMBOL(journal_errno); > @@ -81,6 +79,7 @@ > EXPORT_SYMBOL(journal_force_commit); > > static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *); > +static void __journal_abort_soft (journal_t *journal, int errno); > > /* > * Helper function used to manage commit timeouts > @@ -93,16 +92,6 @@ > wake_up_process(p); > } > > -/* Static check for data structure consistency. There's no code > - * invoked --- we'll just get a linker failure if things aren't right. > - */ > -void __journal_internal_check(void) > -{ > - extern void journal_bad_superblock_size(void); > - if (sizeof(struct journal_superblock_s) != 1024) > - journal_bad_superblock_size(); > -} > - > /* > * kjournald: The main thread function used to manage a logging device > * journal. > @@ -119,16 +108,12 @@ > * known as checkpointing, and this thread is responsible for that job. > */ > > -journal_t *current_journal; // AKPM: debug > - > -int kjournald(void *arg) > +static int kjournald(void *arg) > { > journal_t *journal = (journal_t *) arg; > transaction_t *transaction; > struct timer_list timer; > > - current_journal = journal; > - > daemonize("kjournald"); > > /* Set up an interval timer which can be used to trigger a > @@ -1181,8 +1166,10 @@ > * features. Return true (non-zero) if it does. > **/ > > -int journal_check_used_features (journal_t *journal, unsigned long compat, > - unsigned long ro, unsigned long incompat) > +static int journal_check_used_features (journal_t *journal, > + unsigned long compat, > + unsigned long ro, > + unsigned long incompat) > { > journal_superblock_t *sb; > > @@ -1439,7 +1426,7 @@ > * device this journal is present. > */ > > -const char *journal_dev_name(journal_t *journal, char *buffer) > +static const char *journal_dev_name(journal_t *journal, char *buffer) > { > struct block_device *bdev; > > @@ -1485,7 +1472,7 @@ > > /* Soft abort: record the abort error status in the journal superblock, > * but don't do any other IO. */ > -void __journal_abort_soft (journal_t *journal, int errno) > +static void __journal_abort_soft (journal_t *journal, int errno) > { > if (journal->j_flags & JFS_ABORT) > return; > @@ -1888,7 +1875,7 @@ > > static struct proc_dir_entry *proc_jbd_debug; > > -int read_jbd_debug(char *page, char **start, off_t off, > +static int read_jbd_debug(char *page, char **start, off_t off, > int count, int *eof, void *data) > { > int ret; > @@ -1898,7 +1885,7 @@ > return ret; > } > > -int write_jbd_debug(struct file *file, const char __user *buffer, > +static int write_jbd_debug(struct file *file, const char __user *buffer, > unsigned long count, void *data) > { > char buf[32]; > @@ -1987,6 +1974,14 @@ > { > int ret; > > +/* Static check for data structure consistency. There's no code > + * invoked --- we'll just get a linker failure if things aren't right. > + */ > + extern void journal_bad_superblock_size(void); > + if (sizeof(struct journal_superblock_s) != 1024) > + journal_bad_superblock_size(); > + > + > ret = journal_init_caches(); > if (ret != 0) > journal_destroy_caches(); > --- linux-2.6.12-rc6-mm1-full/fs/jbd/revoke.c.old 2005-06-14 03:58:36.000000000 +0200 > +++ linux-2.6.12-rc6-mm1-full/fs/jbd/revoke.c 2005-06-14 03:58:41.000000000 +0200 > @@ -116,7 +116,8 @@ > (block << (hash_shift - 12))) & (table->hash_size - 1); > } > > -int insert_revoke_hash(journal_t *journal, unsigned long blocknr, tid_t seq) > +static int insert_revoke_hash(journal_t *journal, unsigned long blocknr, > + tid_t seq) > { > struct list_head *hash_list; > struct jbd_revoke_record_s *record; Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From bunk at stusta.de Tue Jul 12 22:43:53 2005 From: bunk at stusta.de (Adrian Bunk) Date: Wed, 13 Jul 2005 00:43:53 +0200 Subject: [2.6 patch] fs/jbd/: possible cleanups In-Reply-To: <20050712223243.GW5335@schatzie.adilger.int> References: <20050712202742.GM4034@stusta.de> <20050712223243.GW5335@schatzie.adilger.int> Message-ID: <20050712224353.GN4034@stusta.de> On Tue, Jul 12, 2005 at 04:32:44PM -0600, Andreas Dilger wrote: > On Jul 12, 2005 22:27 +0200, Adrian Bunk wrote: >... > > - journal.c: remove the unused global function __journal_internal_check > > and move the check to journal_init > > I don't mind removing this function, but it shouldn't be put inside #ifdef > JBD_DEBUG, as that would remove the check from the compiler-parsed code > and defeat the purpose of the check. ??? That's not what my patch is doing. journal_init() is not inside an #ifdef JBD_DEBUG. >... > > - remove the following unneeded EXPORT_SYMBOL's: > > - journal.c: journal_check_used_features > > Should be kept for API completeness. >... The function itself isn't removed. Does it really has to stay exported or isn't it enough to re-export it when a user appears? > Cheers, Andreas cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed From adilger at clusterfs.com Tue Jul 12 23:05:39 2005 From: adilger at clusterfs.com (Andreas Dilger) Date: Tue, 12 Jul 2005 17:05:39 -0600 Subject: [2.6 patch] fs/jbd/: possible cleanups In-Reply-To: <20050712224353.GN4034@stusta.de> References: <20050712202742.GM4034@stusta.de> <20050712223243.GW5335@schatzie.adilger.int> <20050712224353.GN4034@stusta.de> Message-ID: <20050712230539.GX5335@schatzie.adilger.int> On Jul 13, 2005 00:43 +0200, Adrian Bunk wrote: > On Tue, Jul 12, 2005 at 04:32:44PM -0600, Andreas Dilger wrote: > > I don't mind removing this function, but it shouldn't be put inside #ifdef > > JBD_DEBUG, as that would remove the check from the compiler-parsed code > > and defeat the purpose of the check. > > That's not what my patch is doing. > > journal_init() is not inside an #ifdef JBD_DEBUG. My bad. You didn't generate diff with -p (which I normally do and is incredibly useful when reviewing patches) and I saw "write_jbd_debug()" above and my brain went on autopilot assuming the code had moved into that function. Objection withdrawn. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From jwbaker at acm.org Thu Jul 14 00:12:26 2005 From: jwbaker at acm.org (Jeffrey W. Baker) Date: Wed, 13 Jul 2005 17:12:26 -0700 Subject: a comparison of ext3, jfs, and xfs on hardware raid Message-ID: <1121299946.20950.26.camel@toonses.gghcwest.com> I'm setting up a new file server and I just can't seem to get the expected performance from ext3. Unfortunately I'm stuck with ext3 due to my use of Lustre. So I'm hoping you dear readers will send me some tips for increasing ext3 performance. The system is using an Areca hardware raid controller with 5 7200RPM SATA disks. The RAID controller has 128MB of cache and the disks each have 8MB. The cache is write-back. The system is Linux 2.6.12 on amd64 with 1GB system memory. Using bonnie++ with a 10GB fileset, in MB/s: ext3 jfs xfs Read 112 188 141 Write 97 157 167 Rewrite 51 71 60 These number were obtained using the mkfs defaults for all filesystems and the deadline scheduler. As you can see JFS is kicking butt on this test. Next I used pgbench to test parallel random I/O. pgbench has configurable number of clients and transactions per client, and can change the size of its database. I used a database of 100 million tuples (scale factor 1000). I times 100,000 transactions on each filesystem, with 10 and 100 clients per run. Figures are in transactions per second. ext3 jfs xfs 10 Clients 55 81 68 100 Clients 61 100 64 Here XFS is not substantially faster but JFS continues to lead. JFS is roughly 60% faster than ext3 on pgbench and 40-70% faster on bonnie++ linear I/O. Are there any tunables that I might want to adjust to get better performance from ext3? -jwb From adilger at clusterfs.com Thu Jul 14 18:33:30 2005 From: adilger at clusterfs.com (Andreas Dilger) Date: Thu, 14 Jul 2005 12:33:30 -0600 Subject: a comparison of ext3, jfs, and xfs on hardware raid In-Reply-To: <1121299946.20950.26.camel@toonses.gghcwest.com> References: <1121299946.20950.26.camel@toonses.gghcwest.com> Message-ID: <20050714183330.GN5335@schatzie.adilger.int> On Jul 13, 2005 17:12 -0700, Jeffrey W. Baker wrote: > I'm setting up a new file server and I just can't seem to get the > expected performance from ext3. Unfortunately I'm stuck with ext3 due > to my use of Lustre. So I'm hoping you dear readers will send me some > tips for increasing ext3 performance. > > The system is using an Areca hardware raid controller with 5 7200RPM > SATA disks. The RAID controller has 128MB of cache and the disks each > have 8MB. The cache is write-back. The system is Linux 2.6.12 on amd64 > with 1GB system memory. > > Using bonnie++ with a 10GB fileset, in MB/s: > > ext3 jfs xfs > Read 112 188 141 > Write 97 157 167 > Rewrite 51 71 60 > > These number were obtained using the mkfs defaults for all filesystems > and the deadline scheduler. As you can see JFS is kicking butt on this > test. One thing that is important for Lustre is performance of EAs. See http://samba.org/~tridge/xattr_results/ for a comparison. Lustre uses large inodes (-I 256 or larger) to store the EAs efficiently. > Next I used pgbench to test parallel random I/O. pgbench has > configurable number of clients and transactions per client, and can > change the size of its database. I used a database of 100 million > tuples (scale factor 1000). I times 100,000 transactions on each > filesystem, with 10 and 100 clients per run. Figures are in > transactions per second. > > ext3 jfs xfs > 10 Clients 55 81 68 > 100 Clients 61 100 64 > > Here XFS is not substantially faster but JFS continues to lead. > > JFS is roughly 60% faster than ext3 on pgbench and 40-70% faster on > bonnie++ linear I/O. This is a bit surprising, I've never heard JFS as a leader in many performance tests. Is pgbench at all related to dbench? The problem with dbench is that for cases where the filesystem does no IO at all it reports a best result. In real life the data has to make it to disk at some point. See http://sudhaa.com/~benchmark/ext3/newtiobenchresults.ext3gold/newtiobench/newtiobench.html for a comparison of ext3, xfs, jfs in the mode that Lustre runs in (specifically column 7, 14, 18). > Are there any tunables that I might want to adjust to get better > performance from ext3? Try creating your ext3 filesystem with a larger journal, as Lustre does: mkfs -J size=400 ... size is in MB, 400 might be excessive for your setup - I'd be interested in hearing where the "sweet spot" is for journal size. The latest e2fsprogs use 128MB as the largest default size (up from 32MB) for large filesystems. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From sonny at burdell.org Thu Jul 14 18:53:16 2005 From: sonny at burdell.org (Sonny Rao) Date: Thu, 14 Jul 2005 14:53:16 -0400 Subject: a comparison of ext3, jfs, and xfs on hardware raid In-Reply-To: <20050714183330.GN5335@schatzie.adilger.int> References: <1121299946.20950.26.camel@toonses.gghcwest.com> <20050714183330.GN5335@schatzie.adilger.int> Message-ID: <20050714185316.GA25794@kevlar.burdell.org> On Thu, Jul 14, 2005 at 12:33:30PM -0600, Andreas Dilger wrote: > On Jul 13, 2005 17:12 -0700, Jeffrey W. Baker wrote: > > I'm setting up a new file server and I just can't seem to get the > > expected performance from ext3. Unfortunately I'm stuck with ext3 due > > to my use of Lustre. So I'm hoping you dear readers will send me some > > tips for increasing ext3 performance. > > > > The system is using an Areca hardware raid controller with 5 7200RPM > > SATA disks. The RAID controller has 128MB of cache and the disks each > > have 8MB. The cache is write-back. The system is Linux 2.6.12 on amd64 > > with 1GB system memory. > > > > Using bonnie++ with a 10GB fileset, in MB/s: > > > > ext3 jfs xfs > > Read 112 188 141 > > Write 97 157 167 > > Rewrite 51 71 60 > > > > These number were obtained using the mkfs defaults for all filesystems > > and the deadline scheduler. As you can see JFS is kicking butt on this > > test. > > One thing that is important for Lustre is performance of EAs. See > http://samba.org/~tridge/xattr_results/ for a comparison. Lustre > uses large inodes (-I 256 or larger) to store the EAs efficiently. > > > Next I used pgbench to test parallel random I/O. pgbench has > > configurable number of clients and transactions per client, and can > > change the size of its database. I used a database of 100 million > > tuples (scale factor 1000). I times 100,000 transactions on each > > filesystem, with 10 and 100 clients per run. Figures are in > > transactions per second. > > > > ext3 jfs xfs > > 10 Clients 55 81 68 > > 100 Clients 61 100 64 > > > > Here XFS is not substantially faster but JFS continues to lead. > > > > JFS is roughly 60% faster than ext3 on pgbench and 40-70% faster on > > bonnie++ linear I/O. > > This is a bit surprising, I've never heard JFS as a leader in many > performance tests. Is pgbench at all related to dbench? The problem > with dbench is that for cases where the filesystem does no IO at all > it reports a best result. In real life the data has to make it to > disk at some point. JFS tends to lead in two areas, low cpu utilization compared to other filesystems, and on a new filesystem, layout is generally very good. The low CPU utilization helps in environments where you have a lot of filesystems or just a lot of I/O going on, we've seen on SPEC SFS that JFS tends to be the best because of that. (Yes, SPEC SFS is a rather crazy workload, but then so are a lot of other common ones) JFS's main weak point is on meta-data intensive workloads (like dbench) because of deficiencies in the logging system and some poorly placed synchronous operations which are currently being tackled. We've also been slowly pushing in changes to improve JFS performance, some of them have made it into 2.6.12. Sonny From jwbaker at acm.org Thu Jul 14 18:56:15 2005 From: jwbaker at acm.org (Jeffrey W. Baker) Date: Thu, 14 Jul 2005 11:56:15 -0700 Subject: a comparison of ext3, jfs, and xfs on hardware raid In-Reply-To: <20050714183330.GN5335@schatzie.adilger.int> References: <1121299946.20950.26.camel@toonses.gghcwest.com> <20050714183330.GN5335@schatzie.adilger.int> Message-ID: <1121367375.20950.64.camel@toonses.gghcwest.com> On Thu, 2005-07-14 at 12:33 -0600, Andreas Dilger wrote: > On Jul 13, 2005 17:12 -0700, Jeffrey W. Baker wrote: > > Using bonnie++ with a 10GB fileset, in MB/s: > > > > ext3 jfs xfs > > Read 112 188 141 > > Write 97 157 167 > > Rewrite 51 71 60 > > > > These number were obtained using the mkfs defaults for all filesystems > > and the deadline scheduler. As you can see JFS is kicking butt on this > > test. > > One thing that is important for Lustre is performance of EAs. See > http://samba.org/~tridge/xattr_results/ for a comparison. Lustre > uses large inodes (-I 256 or larger) to store the EAs efficiently. This is of importance for only the metadata backend, or for OSTs as well? > > Next I used pgbench to test parallel random I/O. pgbench has > > configurable number of clients and transactions per client, and can > > change the size of its database. I used a database of 100 million > > tuples (scale factor 1000). I times 100,000 transactions on each > > filesystem, with 10 and 100 clients per run. Figures are in > > transactions per second. > > > > ext3 jfs xfs > > 10 Clients 55 81 68 > > 100 Clients 61 100 64 > > > > Here XFS is not substantially faster but JFS continues to lead. > > > > JFS is roughly 60% faster than ext3 on pgbench and 40-70% faster on > > bonnie++ linear I/O. > > This is a bit surprising, I've never heard JFS as a leader in many > performance tests. Is pgbench at all related to dbench? The problem > with dbench is that for cases where the filesystem does no IO at all > it reports a best result. In real life the data has to make it to > disk at some point. pgbench comes in postgresql's contrib. Believe me, the filesystem does plenty of I/O. It sustains roughly 600 iops for 15-20 minutes. The "scale factor of 1000" means pgbench is using a database with 100 million tuples, or about 16GB of data. The entire run uses up only about 2 minutes of CPU time. > > See http://sudhaa.com/~benchmark/ext3/newtiobenchresults.ext3gold/newtiobench/newtiobench.html > for a comparison of ext3, xfs, jfs in the mode that Lustre runs in > (specifically column 7, 14, 18). > > > Are there any tunables that I might want to adjust to get better > > performance from ext3? > > Try creating your ext3 filesystem with a larger journal, as Lustre does: > > mkfs -J size=400 ... > > size is in MB, 400 might be excessive for your setup - I'd be interested > in hearing where the "sweet spot" is for journal size. The latest e2fsprogs > use 128MB as the largest default size (up from 32MB) for large filesystems. I intend to run many more benchmarks using various ext3 mount options. I'll make sure to modulate the journal size as well. However, it is my impression that mballoc/delalloc/extents will be of use mainly to workloads like tarring and untarring a large archive. For linear reads of one giant file, will these mount options make any difference? Regards, Jeffrey From sonny at burdell.org Thu Jul 14 23:49:29 2005 From: sonny at burdell.org (Sonny Rao) Date: Thu, 14 Jul 2005 19:49:29 -0400 Subject: a comparison of ext3, jfs, and xfs on hardware raid In-Reply-To: <1121367375.20950.64.camel@toonses.gghcwest.com> References: <1121299946.20950.26.camel@toonses.gghcwest.com> <20050714183330.GN5335@schatzie.adilger.int> <1121367375.20950.64.camel@toonses.gghcwest.com> Message-ID: <20050714234929.GA27538@kevlar.burdell.org> On Thu, Jul 14, 2005 at 11:56:15AM -0700, Jeffrey W. Baker wrote: > I intend to run many more benchmarks using various ext3 mount options. > I'll make sure to modulate the journal size as well. However, it is my > impression that mballoc/delalloc/extents will be of use mainly to > workloads like tarring and untarring a large archive. For linear reads > of one giant file, will these mount options make any difference? The difference they will make will be in terms of file layout, because they will give you better layout during creation which will give you higher sustained throughput during your linear reads. Check out the ext2-devel mailing list back in Feb-March of this year for some benchmark info about the difference these options make on sequential read/write tests. Sonny From jwbaker at acm.org Sat Jul 16 17:37:52 2005 From: jwbaker at acm.org (Jeffrey W. Baker) Date: Sat, 16 Jul 2005 10:37:52 -0700 Subject: a comparison of ext3, jfs, and xfs on hardware raid In-Reply-To: <20050714183330.GN5335@schatzie.adilger.int> References: <1121299946.20950.26.camel@toonses.gghcwest.com> <20050714183330.GN5335@schatzie.adilger.int> Message-ID: <1121535472.7101.37.camel@noodles> On Thu, 2005-07-14 at 12:33 -0600, Andreas Dilger wrote: > On Jul 13, 2005 17:12 -0700, Jeffrey W. Baker wrote: ... > > The system is using an Areca hardware raid controller with 5 7200RPM > > SATA disks. The RAID controller has 128MB of cache and the disks each > > have 8MB. The cache is write-back. The system is Linux 2.6.12 on amd64 > > with 1GB system memory. ... > > Next I used pgbench to test parallel random I/O. pgbench has > > configurable number of clients and transactions per client, and can > > change the size of its database. I used a database of 100 million > > tuples (scale factor 1000). I times 100,000 transactions on each > > filesystem, with 10 and 100 clients per run. Figures are in > > transactions per second. > > > > ext3 jfs xfs > > 10 Clients 55 81 68 > > 100 Clients 61 100 64 > > > > Here XFS is not substantially faster but JFS continues to lead. > > > > JFS is roughly 60% faster than ext3 on pgbench and 40-70% faster on > > bonnie++ linear I/O. > > This is a bit surprising, I've never heard JFS as a leader in many > performance tests. Is pgbench at all related to dbench? The problem > with dbench is that for cases where the filesystem does no IO at all > it reports a best result. In real life the data has to make it to > disk at some point. ... > Try creating your ext3 filesystem with a larger journal, as Lustre does: > > mkfs -J size=400 ... > > size is in MB, 400 might be excessive for your setup - I'd be interested > in hearing where the "sweet spot" is for journal size. The latest e2fsprogs > use 128MB as the largest default size (up from 32MB) for large filesystems. The journal size doesn't seem to make any difference to pgbench, except that 256MB seems to be the worst. 400MB and 32MB are roughly equal on the pgbench workload. 400MB was the optimal journal size on the bonnie ++ workload. Perhaps it is silly to benchmark a database with its journal files on a journalling filesystem, but here is the result. journal pgbench tps bonnie++ MB/s -------------------------------------------------------- size | mode | 1 | 10 | 100 | write | rewrite | read -------------------------------------------------------- 32 journal 57 35 112 32 ordered 28 51 57 83 33 101 32 writeback 34 70 88 57 31 103 64 journal 55 33 113 64 ordered 29 52 61 84 33 100 64 writeback 32 69 87 59 31 100 128 journal 52 33 109 128 ordered 32 54 62 86 34 102 128 writeback 34 70 88 61 32 102 256 journal 54 30 110 256 ordered 28 51 60 90 34 106 256 writeback 29 64 79 59 31 104 400 journal 52 28 108 400 ordered 26 49 59 89 33 104 400 writeback 32 70 87 60 32 101 --- ext2 105 118 32 107 -jwb From bunk at stusta.de Tue Jul 19 14:15:25 2005 From: bunk at stusta.de (Adrian Bunk) Date: Tue, 19 Jul 2005 16:15:25 +0200 Subject: [2.6 patch] fs/jbd/: cleanups Message-ID: <20050719141525.GJ5031@stusta.de> This patch contains the following cleanups: - make needlessly global functions static - journal.c: remove the unused global function __journal_internal_check and move the check to journal_init - remove the following write-only global variable: - journal.c: current_journal - remove the following unneeded EXPORT_SYMBOL: - journal.c: journal_recover Signed-off-by: Adrian Bunk --- fs/jbd/journal.c | 34 ++++++++++++++-------------------- fs/jbd/revoke.c | 3 ++- include/linux/jbd.h | 1 - 3 files changed, 16 insertions(+), 22 deletions(-) --- linux-2.6.12-rc6-mm1-full/include/linux/jbd.h.old 2005-06-14 03:58:20.000000000 +0200 +++ linux-2.6.12-rc6-mm1-full/include/linux/jbd.h 2005-06-14 04:00:56.000000000 +0200 @@ -914,7 +912,6 @@ extern int journal_skip_recovery (journal_t *); extern void journal_update_superblock (journal_t *, int); extern void __journal_abort_hard (journal_t *); -extern void __journal_abort_soft (journal_t *, int); extern void journal_abort (journal_t *, int); extern int journal_errno (journal_t *); extern void journal_ack_err (journal_t *); --- linux-2.6.12-rc6-mm1-full/fs/jbd/revoke.c.old 2005-06-14 03:58:36.000000000 +0200 +++ linux-2.6.12-rc6-mm1-full/fs/jbd/revoke.c 2005-06-14 03:58:41.000000000 +0200 @@ -116,7 +116,8 @@ (block << (hash_shift - 12))) & (table->hash_size - 1); } -int insert_revoke_hash(journal_t *journal, unsigned long blocknr, tid_t seq) +static int insert_revoke_hash(journal_t *journal, unsigned long blocknr, + tid_t seq) { struct list_head *hash_list; struct jbd_revoke_record_s *record; --- linux-2.6.13-rc3-mm1-full/fs/jbd/journal.c.old 2005-07-19 15:53:16.000000000 +0200 +++ linux-2.6.13-rc3-mm1-full/fs/jbd/journal.c 2005-07-19 15:53:39.000000000 +0200 @@ -65,7 +65,6 @@ EXPORT_SYMBOL(journal_set_features); EXPORT_SYMBOL(journal_create); EXPORT_SYMBOL(journal_load); EXPORT_SYMBOL(journal_destroy); -EXPORT_SYMBOL(journal_recover); EXPORT_SYMBOL(journal_update_superblock); EXPORT_SYMBOL(journal_abort); EXPORT_SYMBOL(journal_errno); @@ -81,6 +80,7 @@ EXPORT_SYMBOL(journal_try_to_free_buffer EXPORT_SYMBOL(journal_force_commit); static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *); +static void __journal_abort_soft (journal_t *journal, int errno); /* * Helper function used to manage commit timeouts @@ -93,16 +93,6 @@ static void commit_timeout(unsigned long wake_up_process(p); } -/* Static check for data structure consistency. There's no code - * invoked --- we'll just get a linker failure if things aren't right. - */ -void __journal_internal_check(void) -{ - extern void journal_bad_superblock_size(void); - if (sizeof(struct journal_superblock_s) != 1024) - journal_bad_superblock_size(); -} - /* * kjournald: The main thread function used to manage a logging device * journal. @@ -119,16 +109,12 @@ void __journal_internal_check(void) * known as checkpointing, and this thread is responsible for that job. */ -journal_t *current_journal; // AKPM: debug - -int kjournald(void *arg) +static int kjournald(void *arg) { journal_t *journal = (journal_t *) arg; transaction_t *transaction; struct timer_list timer; - current_journal = journal; - daemonize("kjournald"); /* Set up an interval timer which can be used to trigger a @@ -1441,7 +1427,7 @@ int journal_wipe(journal_t *journal, int * device this journal is present. */ -const char *journal_dev_name(journal_t *journal, char *buffer) +static const char *journal_dev_name(journal_t *journal, char *buffer) { struct block_device *bdev; @@ -1487,7 +1473,7 @@ void __journal_abort_hard(journal_t *jou /* Soft abort: record the abort error status in the journal superblock, * but don't do any other IO. */ -void __journal_abort_soft (journal_t *journal, int errno) +static void __journal_abort_soft (journal_t *journal, int errno) { if (journal->j_flags & JFS_ABORT) return; @@ -1890,7 +1876,7 @@ EXPORT_SYMBOL(journal_enable_debug); static struct proc_dir_entry *proc_jbd_debug; -int read_jbd_debug(char *page, char **start, off_t off, +static int read_jbd_debug(char *page, char **start, off_t off, int count, int *eof, void *data) { int ret; @@ -1900,7 +1886,7 @@ int read_jbd_debug(char *page, char **st return ret; } -int write_jbd_debug(struct file *file, const char __user *buffer, +static int write_jbd_debug(struct file *file, const char __user *buffer, unsigned long count, void *data) { char buf[32]; @@ -1989,6 +1975,14 @@ static int __init journal_init(void) { int ret; +/* Static check for data structure consistency. There's no code + * invoked --- we'll just get a linker failure if things aren't right. + */ + extern void journal_bad_superblock_size(void); + if (sizeof(struct journal_superblock_s) != 1024) + journal_bad_superblock_size(); + + ret = journal_init_caches(); if (ret != 0) journal_destroy_caches(); From adilger at clusterfs.com Wed Jul 20 15:24:15 2005 From: adilger at clusterfs.com (Andreas Dilger) Date: Wed, 20 Jul 2005 11:24:15 -0400 Subject: [2.6 patch] fs/jbd/: cleanups In-Reply-To: <20050719141525.GJ5031@stusta.de> References: <20050719141525.GJ5031@stusta.de> Message-ID: <20050720152415.GA6704@schatzie.adilger.int> On Jul 19, 2005 16:15 +0200, Adrian Bunk wrote: > This patch contains the following cleanups: > - make needlessly global functions static > - journal.c: remove the unused global function __journal_internal_check > and move the check to journal_init > - remove the following write-only global variable: > - journal.c: current_journal > - remove the following unneeded EXPORT_SYMBOL: > - journal.c: journal_recover > > Signed-off-by: Adrian Bunk Signed-off-by: Andreas Dilger > --- > > fs/jbd/journal.c | 34 ++++++++++++++-------------------- > fs/jbd/revoke.c | 3 ++- > include/linux/jbd.h | 1 - > 3 files changed, 16 insertions(+), 22 deletions(-) > > --- linux-2.6.12-rc6-mm1-full/include/linux/jbd.h.old 2005-06-14 03:58:20.000000000 +0200 > +++ linux-2.6.12-rc6-mm1-full/include/linux/jbd.h 2005-06-14 04:00:56.000000000 +0200 > @@ -914,7 +912,6 @@ > extern int journal_skip_recovery (journal_t *); > extern void journal_update_superblock (journal_t *, int); > extern void __journal_abort_hard (journal_t *); > -extern void __journal_abort_soft (journal_t *, int); > extern void journal_abort (journal_t *, int); > extern int journal_errno (journal_t *); > extern void journal_ack_err (journal_t *); > --- linux-2.6.12-rc6-mm1-full/fs/jbd/revoke.c.old 2005-06-14 03:58:36.000000000 +0200 > +++ linux-2.6.12-rc6-mm1-full/fs/jbd/revoke.c 2005-06-14 03:58:41.000000000 +0200 > @@ -116,7 +116,8 @@ > (block << (hash_shift - 12))) & (table->hash_size - 1); > } > > -int insert_revoke_hash(journal_t *journal, unsigned long blocknr, tid_t seq) > +static int insert_revoke_hash(journal_t *journal, unsigned long blocknr, > + tid_t seq) > { > struct list_head *hash_list; > struct jbd_revoke_record_s *record; > > --- linux-2.6.13-rc3-mm1-full/fs/jbd/journal.c.old 2005-07-19 15:53:16.000000000 +0200 > +++ linux-2.6.13-rc3-mm1-full/fs/jbd/journal.c 2005-07-19 15:53:39.000000000 +0200 > @@ -65,7 +65,6 @@ EXPORT_SYMBOL(journal_set_features); > EXPORT_SYMBOL(journal_create); > EXPORT_SYMBOL(journal_load); > EXPORT_SYMBOL(journal_destroy); > -EXPORT_SYMBOL(journal_recover); > EXPORT_SYMBOL(journal_update_superblock); > EXPORT_SYMBOL(journal_abort); > EXPORT_SYMBOL(journal_errno); > @@ -81,6 +80,7 @@ EXPORT_SYMBOL(journal_try_to_free_buffer > EXPORT_SYMBOL(journal_force_commit); > > static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *); > +static void __journal_abort_soft (journal_t *journal, int errno); > > /* > * Helper function used to manage commit timeouts > @@ -93,16 +93,6 @@ static void commit_timeout(unsigned long > wake_up_process(p); > } > > -/* Static check for data structure consistency. There's no code > - * invoked --- we'll just get a linker failure if things aren't right. > - */ > -void __journal_internal_check(void) > -{ > - extern void journal_bad_superblock_size(void); > - if (sizeof(struct journal_superblock_s) != 1024) > - journal_bad_superblock_size(); > -} > - > /* > * kjournald: The main thread function used to manage a logging device > * journal. > @@ -119,16 +109,12 @@ void __journal_internal_check(void) > * known as checkpointing, and this thread is responsible for that job. > */ > > -journal_t *current_journal; // AKPM: debug > - > -int kjournald(void *arg) > +static int kjournald(void *arg) > { > journal_t *journal = (journal_t *) arg; > transaction_t *transaction; > struct timer_list timer; > > - current_journal = journal; > - > daemonize("kjournald"); > > /* Set up an interval timer which can be used to trigger a > @@ -1441,7 +1427,7 @@ int journal_wipe(journal_t *journal, int > * device this journal is present. > */ > > -const char *journal_dev_name(journal_t *journal, char *buffer) > +static const char *journal_dev_name(journal_t *journal, char *buffer) > { > struct block_device *bdev; > > @@ -1487,7 +1473,7 @@ void __journal_abort_hard(journal_t *jou > > /* Soft abort: record the abort error status in the journal superblock, > * but don't do any other IO. */ > -void __journal_abort_soft (journal_t *journal, int errno) > +static void __journal_abort_soft (journal_t *journal, int errno) > { > if (journal->j_flags & JFS_ABORT) > return; > @@ -1890,7 +1876,7 @@ EXPORT_SYMBOL(journal_enable_debug); > > static struct proc_dir_entry *proc_jbd_debug; > > -int read_jbd_debug(char *page, char **start, off_t off, > +static int read_jbd_debug(char *page, char **start, off_t off, > int count, int *eof, void *data) > { > int ret; > @@ -1900,7 +1886,7 @@ int read_jbd_debug(char *page, char **st > return ret; > } > > -int write_jbd_debug(struct file *file, const char __user *buffer, > +static int write_jbd_debug(struct file *file, const char __user *buffer, > unsigned long count, void *data) > { > char buf[32]; > @@ -1989,6 +1975,14 @@ static int __init journal_init(void) > { > int ret; > > +/* Static check for data structure consistency. There's no code > + * invoked --- we'll just get a linker failure if things aren't right. > + */ > + extern void journal_bad_superblock_size(void); > + if (sizeof(struct journal_superblock_s) != 1024) > + journal_bad_superblock_size(); > + > + > ret = journal_init_caches(); > if (ret != 0) > journal_destroy_caches(); Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From tuttle at bbs.fsik.cvut.cz Wed Jul 20 16:42:59 2005 From: tuttle at bbs.fsik.cvut.cz (Vlada Macek) Date: Wed, 20 Jul 2005 18:42:59 +0200 Subject: ext3 nodump attribute inheritance Message-ID: <42DE7F13.40503@bbs.cvut.cz> Hi, in the past I was considering the ways to back up files in my Linux home box. The important part of such thinking is how to set exclusion paths wisely. I finally found out that setting ext2/ext3 nodump attribute using chattr for files, but mainly directories, suits me best. Setting the match (regexp) lists for the backup script exetrnally seems sub-optimal to me, since my dump/nodump data arrangement is not stable in time. I started using Schily's star because of its multiple advantages over the GNU tar, true incremental dumps and honoring the nodump ext2/ext3 flag among others. But then I noticed what I call an unfortunate feature of ext2/ext3. The nodump flag of the directory is inherited by every new file/directory created inside that directory (the same goes for the most of the ext2/ext3 flags). This feature would quickly wipe off my settings made by hand! Consider for example: ~/tmp/ is nodump. When I create a file inside, it gets nodump too. When later this file develops into something useful and deserves to stay under ~/myprogs/ for example, I'll move it, but it still carries the nodump flag and therefore wont be dumped. I googled for a while for other users experience with this feature, but it seems to me the nodump flag is not used much or the people feel ok or are unaware. For myself and for now, I solved this problem with the following oneline patch against kernel 2.6.8, ialloc.c/ext3_new_inode(): --- kernel-source-2.6.8-orig/fs/ext3/ialloc.c 2004-08-14 07:36:58.000000000 +0200 +++ kernel-source-2.6.8/fs/ext3/ialloc.c 2005-07-19 11:20:36.000000000 +0200 @@ -566,9 +566,9 @@ ei->i_next_alloc_goal = 0; ei->i_dir_start_lookup = 0; ei->i_disksize = 0; - ei->i_flags = EXT3_I(dir)->i_flags & ~EXT3_INDEX_FL; + ei->i_flags = EXT3_I(dir)->i_flags & ~(EXT3_INDEX_FL | EXT3_NODUMP_FL); if (S_ISLNK(mode)) ei->i_flags &= ~(EXT3_IMMUTABLE_FL|EXT3_APPEND_FL); /* dirsync only applies to directories */ if (!S_ISDIR(mode)) Now my new files and dirs do not inherit nodump flag from their parent dir anymore. Of course, however tiny this change is, I rate is so useful for others, that I'm going out with it. Changing the filesystem behaviour for everyone is of course problematic. I do not know whether there are (or will be in the future) another flags deserving selective non-inheritance. So this is my generalized idea: To leave the present behaviour as default and let user configure the mask of flags that should not be inherited from parent dirs via for example tune2fs in the way of chattr. Maybe like this syntax: tune2fs -I+-=[ASacDdIijsTtu] where -I is the new option meaning Inheritance. What you all think? Would you be satisfield with the nodump flag inheritance in case you would use the flag for your backups. Am I alone? Thanks in advance, -- \//\/\ (Sometimes credited as 1494 F8DD 6379 4CD7 E7E3 1FC9 D750 4243 1F05 9424.) -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 256 bytes Desc: OpenPGP digital signature URL: From poiqwepoi at gmail.com Sat Jul 23 02:14:20 2005 From: poiqwepoi at gmail.com (ESM) Date: Fri, 22 Jul 2005 22:14:20 -0400 Subject: Recovering lost file... Message-ID: <38b74966050722191410c04670@mail.gmail.com> Here goes the story... I was manipulating a picture (jpg) with gimp and mistakingly saved the file instead of doing a save as. I assume that what gimp did was to write the new file then ulink the old one. Now I found the adress where the original file starts and I cand read the first 8192 bytes of it. I found it by searching for the dateand time specified in the jpeg header which did not change from the original file to the new. Is there a way to find the missing blocks? Thanks in advance. From tom at thesnail.org Mon Jul 25 23:50:02 2005 From: tom at thesnail.org (Tom Coleman) Date: Tue, 26 Jul 2005 09:50:02 +1000 Subject: [Fwd: e2fsck Segmentation Fault] Message-ID: <1122335402.10813.2.camel@kofi> Hi, Somehow I've managed to get e2fsck to seg fault.. The filesystem in question started acting very strangely (e.g. filenames changing from music to MuSiC etc) so I rebooted, and since when fsck has crashed every time it has been run. I'm not really sure what any of this means, so I didn't know what debugging output to include, but below is the output of e2fsck. (I would have attached the results of dumpe2fs, but the mailing list complained). Thanks for any help; let me know any more information I can provide (I am running debian unstable with a new (2.6.11-ac) kernel (although it segfaulted on an older kernel too)) e2fsck output: e2fsck 1.38 (30-Jun-2005) /dev/hde1 contains a file system with errors, check forced. Pass 1: Checking inodes, blocks, and sizes Root inode is not a directory. Clear? yes Pass 2: Checking directory structure Entry '..' in ??? (1785857) has deleted/unused inode 12. Clear? yes Missing '..' in directory inode 2162689. Fix? yes Entry '..' in ... (2162689) has deleted/unused inode 2. Clear? yes Missing '..' in directory inode 3129345. Fix? yes Entry '..' in ... (3129345) has deleted/unused inode 2. Clear? yes Entry '..' in ??? (5931009) has deleted/unused inode 12. Clear? yes Missing '..' in directory inode 8601601. Fix? yes Entry '..' in ... (8601601) has deleted/unused inode 2. Clear? yes Pass 3: Checking directory connectivity Root inode not allocated. Allocate? yes Unconnected directory inode 5931009 (...) Connect to /lost+found? yes /lost+found not found. Create? yes Unconnected directory inode 1785857 (...) Connect to /lost+found? yes Unconnected directory inode 2162689 (...) Connect to /lost+found? yes Unconnected directory inode 3129345 (...) Connect to /lost+found? yes Unconnected directory inode 8601601 (...) Connect to /lost+found? yes Pass 4: Checking reference counts i_file_acl for inode 11 (...) is 536879104, should be zero. Clear? yes i_faddr for inode 11 (...) is 536879104, should be zero. Clear? yes i_fsize for inode 11 (...) is 32, should be zero. Clear? yes Segmentation fault From adilger at clusterfs.com Tue Jul 26 07:30:39 2005 From: adilger at clusterfs.com (Andreas Dilger) Date: Tue, 26 Jul 2005 01:30:39 -0600 Subject: [Fwd: e2fsck Segmentation Fault] In-Reply-To: <1122335402.10813.2.camel@kofi> References: <1122335402.10813.2.camel@kofi> Message-ID: <20050726073038.GV6126@schatzie.adilger.int> On Jul 26, 2005 09:50 +1000, Tom Coleman wrote: > Somehow I've managed to get e2fsck to seg fault.. The filesystem in > question started acting very strangely (e.g. filenames changing from > music to MuSiC etc) so I rebooted, and since when fsck has crashed every > time it has been run. It appears you are getting single-bit errors, either from your RAM, cable or internal to the drive. > Thanks for any help; let me know any more information I can provide > (I am running debian unstable with a new (2.6.11-ac) kernel (although it > segfaulted on an older kernel too)) > > e2fsck output: > e2fsck 1.38 (30-Jun-2005) > /dev/hde1 contains a file system with errors, check forced. > Pass 1: Checking inodes, blocks, and sizes > Root inode is not a directory. Clear? yes This might be interesting to look at, if only to prove the single-bit error theory. If you start debugfs /dev/hde1, and "stat <2>" it should show what is wrong with the root directory, as will "stat <12>". It may well be that they are just corrupted outright, hard to say. > Pass 2: Checking directory structure > Entry '..' in ??? (1785857) has deleted/unused inode 12. Clear? yes > > Missing '..' in directory inode 2162689. > Fix? yes > > Entry '..' in ... (2162689) has deleted/unused inode 2. Clear? yes > > Missing '..' in directory inode 3129345. > Fix? yes > > Entry '..' in ... (3129345) has deleted/unused inode 2. Clear? yes > > Entry '..' in ??? (5931009) has deleted/unused inode 12. Clear? yes > > Missing '..' in directory inode 8601601. > Fix? yes > > Entry '..' in ... (8601601) has deleted/unused inode 2. Clear? yes > > Pass 3: Checking directory connectivity > Root inode not allocated. Allocate? yes > > Unconnected directory inode 5931009 (...) > Connect to /lost+found? yes > > /lost+found not found. Create? yes > > Unconnected directory inode 1785857 (...) > Connect to /lost+found? yes > > Unconnected directory inode 2162689 (...) > Connect to /lost+found? yes > > Unconnected directory inode 3129345 (...) > Connect to /lost+found? yes > > Unconnected directory inode 8601601 (...) > Connect to /lost+found? yes > > Pass 4: Checking reference counts > i_file_acl for inode 11 (...) is 536879104, should be zero. > Clear? yes > > i_faddr for inode 11 (...) is 536879104, should be zero. > Clear? yes > > i_fsize for inode 11 (...) is 32, should be zero. > Clear? yes These also appear to be single bit errors, 0x20002000 or 0x20. > Segmentation fault If you compile a new e2fsck (with -g) and run it under gdb it will tell you what is going wrong. Up until here there are only a couple of minor errors, with / and lost+found. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From adilger at clusterfs.com Tue Jul 26 07:37:18 2005 From: adilger at clusterfs.com (Andreas Dilger) Date: Tue, 26 Jul 2005 01:37:18 -0600 Subject: [Fwd: e2fsck Segmentation Fault] In-Reply-To: <1122335402.10813.2.camel@kofi> References: <1122335402.10813.2.camel@kofi> Message-ID: <20050726073718.GX6126@schatzie.adilger.int> On Jul 26, 2005 09:50 +1000, Tom Coleman wrote: > Somehow I've managed to get e2fsck to seg fault.. The filesystem in > question started acting very strangely (e.g. filenames changing from > music to MuSiC etc) so I rebooted, and since when fsck has crashed every > time it has been run. Oh, see also (just posted to ext2-devel): http://thunk.org/hg/e2fsprogs/?cmd=changeset;node=0502b63a5be9cb490c0c9086fa05edc1b1712a78 Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From tom at thesnail.org Wed Jul 27 04:21:32 2005 From: tom at thesnail.org (Tom Coleman) Date: Wed, 27 Jul 2005 14:21:32 +1000 Subject: [Fwd: e2fsck Segmentation Fault] In-Reply-To: <20050726073718.GX6126@schatzie.adilger.int> References: <1122335402.10813.2.camel@kofi> <20050726073718.GX6126@schatzie.adilger.int> Message-ID: <1122438092.10820.1.camel@kofi> Bingo. I tried e2fsprogs-1.35 and there was no seg fault. Thanks heaps for the help. P.S I think you were right and the RAM was screwed too. On Tue, 2005-07-26 at 01:37 -0600, Andreas Dilger wrote: > On Jul 26, 2005 09:50 +1000, Tom Coleman wrote: > > Somehow I've managed to get e2fsck to seg fault.. The filesystem in > > question started acting very strangely (e.g. filenames changing from > > music to MuSiC etc) so I rebooted, and since when fsck has crashed every > > time it has been run. > > Oh, see also (just posted to ext2-devel): > http://thunk.org/hg/e2fsprogs/?cmd=changeset;node=0502b63a5be9cb490c0c9086fa05edc1b1712a78 > > Cheers, Andreas > -- > Andreas Dilger > Principal Software Engineer > Cluster File Systems, Inc. > > From mag.andersen at gmail.com Wed Jul 27 21:05:34 2005 From: mag.andersen at gmail.com (Magnus Andersen) Date: Wed, 27 Jul 2005 17:05:34 -0400 Subject: high context switching and high load averages slowing down system Message-ID: <5ea1658405072714052ad381d2@mail.gmail.com> Hi All, I have a HP DL 580 with 4 3 GHz CPUs and 4 GB RAM. I'm running Oracle on it. Throughout the day I am getting high load averages (6 - 18) and at the same time I see context switching go over 300,000. Sometimes over 500,000. This is slowing the system down to a crawl. My OS is RHEL 3 AS Update 4 with the 2.4.21-32.0.1.ELsmp kernel. Any ideas on why this is happening and how to fix it? Thanks in advance, -- Magnus Andersen Systems Administrator / Oracle DBA Walker & Associates, Inc. From theman at josephdwagner.info Wed Jul 27 22:01:11 2005 From: theman at josephdwagner.info (Joseph D. Wagner) Date: Wed, 27 Jul 2005 17:01:11 -0500 Subject: high context switching and high load averages slowing down system In-Reply-To: <5ea1658405072714052ad381d2@mail.gmail.com> Message-ID: <48vksc$18b7ab4@mxip19a.cluster1.charter.net> This is the EXT3 File System mailing list. I don't mean to be rude; it's just that you may get better answers to your questions on another mailing list. > Any ideas on why this is happening and how to fix it? Off the top of my head, it sounds like your system is thrashing. Is there some out-of-control process hogging all the memory? Some links I found on google.com include: http://www.unix.org.ua/orelly/oracle/guide8i/ch05_01.htm The 2.6 kernel has better multitasking capabilities. You may want to try building a custom kernel with SMP and 4 GB High Memory with Preemptive Multitasking turned on. But like I said, this is all off the top of my head. Joseph D. Wagner From drakoulegonas at kolasi.gr Mon Jul 25 12:24:19 2005 From: drakoulegonas at kolasi.gr (Professor Stafylopaths) Date: Mon, 25 Jul 2005 15:24:19 +0300 Subject: Strange corruption (?) problem Message-ID: <1122294259.42e4d9f34b277@webmail.teilam.gr> Hello, I seem to have a strange problem with an ext3 fs partition. Whenever I transfer several files to this partition and compare the md5 and sha1 sums with the originals don't match. Seems like this takes place only with large files (~700MB). I am using vanilla 2.4.27 linux kernel (haven't seen any significant changelog entries through the latest kernel that might be relevant to my problem). I've also run e2fsck -C -v -c -f as well as on the partition which didn't reveal any problems at all. This partition was corrupted somehow in the past, but was completely recovered by fsck, although probably it now makes use of a backup superblock (when I try to mount the partition without any parameters the kernel mounts it as ext2 with the warning "EXT2-fs warning (device ide0(3,65)): ext2_read_super: mounting ext3 filesystem as ext2", so I always specify the fs type by -t ext3 parameter). I don't know whether this is relevant to the current problem. Any ideas?