From evilninja at gmx.net  Fri Jul  1 00:20:13 2005
From: evilninja at gmx.net (evilninja)
Date: Fri, 01 Jul 2005 02:20:13 +0200
Subject: [Q] Is errors=panic safe to use, and will it detect a RAID gone
 psycho?
In-Reply-To: <a06230913bee365817415@[129.98.90.227]>
References: <a06230913bee365817415@[129.98.90.227]>
Message-ID: <42C48C3D.50305@gmx.net>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Maurice Volaski schrieb:
> If I set the error behavior with tune2fs to panic, would this happen?
> That is, is this the type of error that would trigger a panic? Are there
> minor errors that could unnecessarily trigger one?

panic seems to be triggered in fs/ext3/super.c, there are some conditions
to met in ext3_handle_error() and ext3_abort(). either get a clue from
that or perhaps some ext2 guru can comment as to *when* exactly a panic is
triggered ;-)


- --
BOFH excuse #12:

dry joints on cable plug
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCxIw9C/PVm5+NVoYRAqVRAJ4iTIbmFvi1OoqqcZPyuFtzeo7OkQCg2fAO
kIOJsD6artMMh49BIYfj1Ks=
=thhq
-----END PGP SIGNATURE-----


From evilninja at gmx.net  Fri Jul  1 00:23:35 2005
From: evilninja at gmx.net (evilninja)
Date: Fri, 01 Jul 2005 02:23:35 +0200
Subject: Assertion failure in do_get_write_access()
In-Reply-To: <20050626151224.L25249-100000@xs3.xs4all.nl>
References: <20050626151224.L25249-100000@xs3.xs4all.nl>
Message-ID: <42C48D07.9090801@gmx.net>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Yuri van Oers schrieb:
> Hi,
> 
> I just had my server cry this out to the console:
> 
> Assertion failure in do_get_write_access() at transaction.c:658:
> "jh->b_transaction == journal->j_committing_transaction"
> kernel BUG at transaction.c:658!

is it reproducible? does it happpen with later/earlier kernels too?
was the fs corrupted after this? if yes, did fsck fix it?

- --
BOFH excuse #115:

your keyboard's space bar is generating spurious keycodes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCxI0HC/PVm5+NVoYRAmNdAJ0Z3LgMzliSJoRks23q2h/4F+j9wQCfc/Lx
6wXFBsuKbsRZaDXk2nImZps=
=bXLa
-----END PGP SIGNATURE-----


From evilninja at gmx.net  Fri Jul  1 00:38:04 2005
From: evilninja at gmx.net (evilninja)
Date: Fri, 01 Jul 2005 02:38:04 +0200
Subject: How to figure out underlying failed disk(parttions) and
 sector(s) position ???
In-Reply-To: <20050628220151.27698.qmail@web30202.mail.mud.yahoo.com>
References: <20050628220151.27698.qmail@web30202.mail.mud.yahoo.com>
Message-ID: <42C4906C.2010008@gmx.net>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

ha haha schrieb:
> Jun 21 16:55:09 host1 kernel: end_request: I/O error,
> dev 03:0b (hda), sector 196487120 

afaict, this message comes from drivers/block/ll_rw_blk.c, so not really
ext2 specific and should probably go to lkml (mind to send a patch?)
i think the reiserfs folks had somethnig simliar in the works for
reiserfs-specific error messages but i can't remember the name of the
(mail)thread.

> Q3: what does the "high=13, low=16200426" means?

again not ext2 spcific, as this is from drivers/ide/ide-lib.c.

- --
BOFH excuse #47:

Complete Transient Lockout
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCxJBsC/PVm5+NVoYRAkdFAJ455ZWjNo9XulUZy2wJl+6H4AMqZwCg1fZ0
zzlr1OYCyAiLMJXG1Fyi5Cc=
=09yx
-----END PGP SIGNATURE-----


From bunk at stusta.de  Sat Jul  2 23:51:12 2005
From: bunk at stusta.de (Adrian Bunk)
Date: Sun, 3 Jul 2005 01:51:12 +0200
Subject: [2.6 patch] fs/jbd/: possible cleanups
Message-ID: <20050702235112.GK5346@stusta.de>

This patch contains the following possible cleanups:
- make needlessly global functions static
- journal.c: remove the unused global function __journal_internal_check
             and move the check to journal_init
- remove the following write-only global variable:
  - journal.c: current_journal
- remove the following unneeded EXPORT_SYMBOL's:
  - journal.c: journal_check_used_features
  - journal.c: journal_recover

Signed-off-by: Adrian Bunk <bunk at stusta.de>

---

This patch was already sent on:
- 14 Jun 2005

 fs/jbd/journal.c    |   41 ++++++++++++++++++-----------------------
 fs/jbd/revoke.c     |    3 ++-
 include/linux/jbd.h |    3 ---
 3 files changed, 20 insertions(+), 27 deletions(-)

--- linux-2.6.12-rc6-mm1-full/include/linux/jbd.h.old	2005-06-14 03:58:20.000000000 +0200
+++ linux-2.6.12-rc6-mm1-full/include/linux/jbd.h	2005-06-14 04:00:56.000000000 +0200
@@ -900,8 +900,6 @@
 				int start, int len, int bsize);
 extern journal_t * journal_init_inode (struct inode *);
 extern int	   journal_update_format (journal_t *);
-extern int	   journal_check_used_features 
-		   (journal_t *, unsigned long, unsigned long, unsigned long);
 extern int	   journal_check_available_features 
 		   (journal_t *, unsigned long, unsigned long, unsigned long);
 extern int	   journal_set_features 
@@ -914,7 +912,6 @@
 extern int	   journal_skip_recovery	(journal_t *);
 extern void	   journal_update_superblock	(journal_t *, int);
 extern void	   __journal_abort_hard	(journal_t *);
-extern void	   __journal_abort_soft	(journal_t *, int);
 extern void	   journal_abort      (journal_t *, int);
 extern int	   journal_errno      (journal_t *);
 extern void	   journal_ack_err    (journal_t *);
--- linux-2.6.12-rc6-mm1-full/fs/jbd/journal.c.old	2005-06-14 03:57:39.000000000 +0200
+++ linux-2.6.12-rc6-mm1-full/fs/jbd/journal.c	2005-06-14 04:08:24.000000000 +0200
@@ -59,13 +59,11 @@
 EXPORT_SYMBOL(journal_init_dev);
 EXPORT_SYMBOL(journal_init_inode);
 EXPORT_SYMBOL(journal_update_format);
-EXPORT_SYMBOL(journal_check_used_features);
 EXPORT_SYMBOL(journal_check_available_features);
 EXPORT_SYMBOL(journal_set_features);
 EXPORT_SYMBOL(journal_create);
 EXPORT_SYMBOL(journal_load);
 EXPORT_SYMBOL(journal_destroy);
-EXPORT_SYMBOL(journal_recover);
 EXPORT_SYMBOL(journal_update_superblock);
 EXPORT_SYMBOL(journal_abort);
 EXPORT_SYMBOL(journal_errno);
@@ -81,6 +79,7 @@
 EXPORT_SYMBOL(journal_force_commit);
 
 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
+static void __journal_abort_soft (journal_t *journal, int errno);
 
 /*
  * Helper function used to manage commit timeouts
@@ -93,16 +92,6 @@
 	wake_up_process(p);
 }
 
-/* Static check for data structure consistency.  There's no code
- * invoked --- we'll just get a linker failure if things aren't right.
- */
-void __journal_internal_check(void)
-{
-	extern void journal_bad_superblock_size(void);
-	if (sizeof(struct journal_superblock_s) != 1024)
-		journal_bad_superblock_size();
-}
-
 /*
  * kjournald: The main thread function used to manage a logging device
  * journal.
@@ -119,16 +108,12 @@
  *    known as checkpointing, and this thread is responsible for that job.
  */
 
-journal_t *current_journal;		// AKPM: debug
-
-int kjournald(void *arg)
+static int kjournald(void *arg)
 {
 	journal_t *journal = (journal_t *) arg;
 	transaction_t *transaction;
 	struct timer_list timer;
 
-	current_journal = journal;
-
 	daemonize("kjournald");
 
 	/* Set up an interval timer which can be used to trigger a
@@ -1181,8 +1166,10 @@
  * features.  Return true (non-zero) if it does. 
  **/
 
-int journal_check_used_features (journal_t *journal, unsigned long compat,
-				 unsigned long ro, unsigned long incompat)
+static int journal_check_used_features (journal_t *journal,
+					unsigned long compat,
+					unsigned long ro,
+					unsigned long incompat)
 {
 	journal_superblock_t *sb;
 
@@ -1439,7 +1426,7 @@
  * device this journal is present.
  */
 
-const char *journal_dev_name(journal_t *journal, char *buffer)
+static const char *journal_dev_name(journal_t *journal, char *buffer)
 {
 	struct block_device *bdev;
 
@@ -1485,7 +1472,7 @@
 
 /* Soft abort: record the abort error status in the journal superblock,
  * but don't do any other IO. */
-void __journal_abort_soft (journal_t *journal, int errno)
+static void __journal_abort_soft (journal_t *journal, int errno)
 {
 	if (journal->j_flags & JFS_ABORT)
 		return;
@@ -1888,7 +1875,7 @@
 
 static struct proc_dir_entry *proc_jbd_debug;
 
-int read_jbd_debug(char *page, char **start, off_t off,
+static int read_jbd_debug(char *page, char **start, off_t off,
 			  int count, int *eof, void *data)
 {
 	int ret;
@@ -1898,7 +1885,7 @@
 	return ret;
 }
 
-int write_jbd_debug(struct file *file, const char __user *buffer,
+static int write_jbd_debug(struct file *file, const char __user *buffer,
 			   unsigned long count, void *data)
 {
 	char buf[32];
@@ -1987,6 +1974,14 @@
 {
 	int ret;
 
+/* Static check for data structure consistency.  There's no code
+ * invoked --- we'll just get a linker failure if things aren't right.
+ */
+	extern void journal_bad_superblock_size(void);
+	if (sizeof(struct journal_superblock_s) != 1024)
+		journal_bad_superblock_size();
+
+
 	ret = journal_init_caches();
 	if (ret != 0)
 		journal_destroy_caches();
--- linux-2.6.12-rc6-mm1-full/fs/jbd/revoke.c.old	2005-06-14 03:58:36.000000000 +0200
+++ linux-2.6.12-rc6-mm1-full/fs/jbd/revoke.c	2005-06-14 03:58:41.000000000 +0200
@@ -116,7 +116,8 @@
 		(block << (hash_shift - 12))) & (table->hash_size - 1);
 }
 
-int insert_revoke_hash(journal_t *journal, unsigned long blocknr, tid_t seq)
+static int insert_revoke_hash(journal_t *journal, unsigned long blocknr,
+			      tid_t seq)
 {
 	struct list_head *hash_list;
 	struct jbd_revoke_record_s *record;


From lists at wolfram.schlich.org  Fri Jul  8 16:03:37 2005
From: lists at wolfram.schlich.org (Wolfram Schlich)
Date: Fri, 8 Jul 2005 18:03:37 +0200
Subject: Accidentally issued "mkswap" on ext3 fs -- recovery possible?
Message-ID: <20050708160337.ALLYOURBASEAREBELONGTOUS.A15231@bla.fasel.org>

Hi,

I accidentally issued "mkswap" on a used ext3 fs partition (~30G) :-/

I have analyzed the behaviour of mkswap using two test files and it
appears to only change "some" bytes:
--8<--
--- swap2.xxd   2005-07-04 21:00:10.157261360 +0200
+++ swap1.xxd   2005-07-04 21:00:01.894517488 +0200
@@ -62,7 +62,7 @@
 00003d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
 00003e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
 00003f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
-0000400: 0000 0000 0000 0000 0000 0000 0000 0000  ................
+0000400: 0100 0000 ff09 0000 0000 0000 0000 0000  ................
 0000410: 0000 0000 0000 0000 0000 0000 0000 0000  ................
 0000420: 0000 0000 0000 0000 0000 0000 0000 0000  ................
 0000430: 0000 0000 0000 0000 0000 0000 0000 0000  ................
@@ -253,7 +253,7 @@
 0000fc0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
 0000fd0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
 0000fe0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
-0000ff0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
+0000ff0: 0000 0000 0000 5357 4150 5350 4143 4532  ......SWAPSPACE2
 0001000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
 0001010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
 0001020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
--8<--

I created an image (hdb1.img) of the damaged partition using dd
and tried to work with various tools on it.

Here is the output of 'fsck.ext3 -n -v hdb1.img':
--8<--
e2fsck 1.35 (28-Feb-2004)
Couldn't find ext2 superblock, trying backup blocks...
hdb1.img was not cleanly unmounted, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong for group #0 (24043, counted=0).
Fix? no

Free blocks count wrong for group #1 (32250, counted=0).
Fix? no

Free blocks count wrong for group #2 (32253, counted=0).
Fix? no

Free blocks count wrong for group #3 (32250, counted=158).
Fix? no

Free blocks count wrong for group #4 (32253, counted=8).
Fix? no

Free blocks count wrong for group #5 (32250, counted=28).
Fix? no

Free blocks count wrong for group #6 (32253, counted=6822).
Fix? no

Free blocks count wrong for group #7 (32250, counted=10428).
Fix? no

Free blocks count wrong for group #8 (32253, counted=11170).
Fix? no

Free blocks count wrong for group #9 (32250, counted=4239).
Fix? no

Free blocks count wrong for group #10 (32253, counted=24482).
Fix? no

Free blocks count wrong for group #11 (32253, counted=21184).
Fix? no

Free blocks count wrong for group #12 (32253, counted=25657).
Fix? no

Free blocks count wrong for group #13 (32253, counted=13674).
Fix? no

Free blocks count wrong for group #14 (32253, counted=15007).
Fix? no

Free blocks count wrong for group #15 (32253, counted=11366).
Fix? no

[ removed many lines, complete log file at
  http://wolfram.schlich.org/tmp/fsck.ext3_-n_-v_hdb1.img ]

Free inodes count wrong for group #213 (16416, counted=14498).
Fix? no

Directories count wrong for group #213 (0, counted=241).
Fix? no

Free inodes count wrong for group #214 (16416, counted=14524).
Fix? no

Directories count wrong for group #214 (0, counted=126).
Fix? no

Free inodes count wrong for group #215 (16416, counted=14441).
Fix? no

Directories count wrong for group #215 (0, counted=114).
Fix? no

Free inodes count wrong for group #216 (16416, counted=15214).
Fix? no

Directories count wrong for group #216 (0, counted=99).
Fix? no

Free inodes count wrong for group #217 (16416, counted=14898).
Fix? no

Directories count wrong for group #217 (0, counted=216).
Fix? no

Free inodes count wrong for group #218 (16416, counted=14878).
Fix? no

Directories count wrong for group #218 (0, counted=187).
Fix? no

Free inodes count wrong for group #219 (16416, counted=16033).
Fix? no

Directories count wrong for group #219 (0, counted=37).
Fix? no

Free inodes count wrong for group #220 (16416, counted=14949).
Fix? no

Directories count wrong for group #220 (0, counted=128).
Fix? no

Free inodes count wrong for group #221 (16416, counted=15167).
Fix? no

Directories count wrong for group #221 (0, counted=102).
Fix? no

Free inodes count wrong for group #222 (16416, counted=15908).
Fix? no

Directories count wrong for group #222 (0, counted=79).
Fix? no

Free inodes count wrong for group #223 (16416, counted=14719).
Fix? no

Directories count wrong for group #223 (0, counted=117).
Fix? no

Free inodes count wrong for group #224 (16416, counted=14212).
Fix? no

Directories count wrong for group #224 (0, counted=165).
Fix? no

Free inodes count wrong for group #225 (16416, counted=14104).
Fix? no

Directories count wrong for group #225 (0, counted=118).
Fix? no

Free inodes count wrong for group #226 (16416, counted=14634).
Fix? no

Directories count wrong for group #226 (0, counted=227).
Fix? no

Free inodes count wrong for group #227 (16416, counted=14616).
Fix? no

Directories count wrong for group #227 (0, counted=198).
Fix? no

Free inodes count wrong for group #228 (16416, counted=14622).
Fix? no

Directories count wrong for group #228 (0, counted=139).
Fix? no

Free inodes count wrong (3759253, counted=3348765).
Fix? no


hdb1.img: ********** WARNING: Filesystem still has errors **********


      11 inodes used (0%)
    7136 non-contiguous inodes (64872.7%)
         # of inodes with ind/dind/tind blocks: 50303/388/0
  126175 blocks used (1%)
       0 bad blocks
       0 large files

  377976 regular files
   30868 directories
       0 character device files
       0 block device files
      13 fifos
      48 links
    1632 symbolic links (1631 fast symbolic links)
       0 sockets
--------
  410537 files
--8<--
I guess if I would let fsck "fix" it, the damage would be bigger than
the benefit -- those numbers look scary to me:
--8<--
      11 inodes used (0%)
    7136 non-contiguous inodes (64872.7%)
         # of inodes with ind/dind/tind blocks: 50303/388/0
  126175 blocks used (1%)
--8<--
I haven't even tried to have it "fix" the fs on the image file.
What do you think?

e2salvage doesn't recognize any superblocks (not even the backup
superblocks dumpe2fs happily uses to display some information on the
fs image), maybe because it's not an ext2 but ext3 fs. Well.

Currently I'm running e2retrieve on the image to see whether this is
able to do some recovery, but no results yet.

Any suggestions? It's hard to believe that those few changed bytes
should make the whole fs unrecoverable, isn't it?

Thanks in advance!
-- 
Wolfram Schlich


From adilger at clusterfs.com  Fri Jul  8 16:51:37 2005
From: adilger at clusterfs.com (Andreas Dilger)
Date: Fri, 8 Jul 2005 10:51:37 -0600
Subject: Accidentally issued "mkswap" on ext3 fs -- recovery possible?
In-Reply-To: <20050708160337.ALLYOURBASEAREBELONGTOUS.A15231@bla.fasel.org>
References: <20050708160337.ALLYOURBASEAREBELONGTOUS.A15231@bla.fasel.org>
Message-ID: <20050708165137.GB5335@schatzie.adilger.int>

On Jul 08, 2005  18:03 +0200, Wolfram Schlich wrote:
> I accidentally issued "mkswap" on a used ext3 fs partition (~30G) :-/
> 
> I have analyzed the behaviour of mkswap using two test files and it
> appears to only change "some" bytes:
> --8<--
> --- swap2.xxd   2005-07-04 21:00:10.157261360 +0200
> +++ swap1.xxd   2005-07-04 21:00:01.894517488 +0200
> @@ -62,7 +62,7 @@
>  00003d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
>  00003e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
>  00003f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> -0000400: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> +0000400: 0100 0000 ff09 0000 0000 0000 0000 0000  ................
>  0000410: 0000 0000 0000 0000 0000 0000 0000 0000  ................
>  0000420: 0000 0000 0000 0000 0000 0000 0000 0000  ................
>  0000430: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> @@ -253,7 +253,7 @@
>  0000fc0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
>  0000fd0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
>  0000fe0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> -0000ff0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> +0000ff0: 0000 0000 0000 5357 4150 5350 4143 4532  ......SWAPSPACE2
>  0001000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
>  0001010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
>  0001020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> --8<--

Try starting with a test file which is not all zero (e.g. copy from
/dev/urandom) and see how much is changed.

> Any suggestions? It's hard to believe that those few changed bytes
> should make the whole fs unrecoverable, isn't it?

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


From hahaha_30k at yahoo.com  Fri Jul  8 17:33:19 2005
From: hahaha_30k at yahoo.com (ha haha)
Date: Fri, 8 Jul 2005 10:33:19 -0700 (PDT)
Subject: Accidentally issued "mkswap" on ext3 fs -- recovery possible?
In-Reply-To: <20050708160337.ALLYOURBASEAREBELONGTOUS.A15231@bla.fasel.org>
Message-ID: <20050708173319.57811.qmail@web30202.mail.mud.yahoo.com>


Try to follow the steps below:

1, save all the contents on the partitions to another
hard disk, or a big file. So that you test work will
not destroy any useful data.

 dd if=<mkswapped_partition> of=<someWhereToSave>

2, run " mkfs.ext3 -n <mkswapped_partition> " to get a
series of super-block backup copies. for example:

Superblock backups stored on blocks: 
        32768, 98304, 163840, 229376, 294912, 819200,
884736, 1605632, 2654208, 
        4096000, 7962624, 11239424, 20480000, 2388787


3, use the super block backups near the end of the
partitions to recovery file system structure.

 e2fsck -b 20480000 <mkswapped_partition>

4, If the above doesn't work, try the following to
recover as many file contents as possible:

dd if=<mkswapped_partition> | strings > allStrings.txt

Then try to read the big jumboFile and recover
paragraphs.

 
--- Wolfram Schlich <lists at wolfram.schlich.org> wrote:

> Hi,
> 
> I accidentally issued "mkswap" on a used ext3 fs
> partition (~30G) :-/
> 
> I have analyzed the behaviour of mkswap using two
> test files and it
> appears to only change "some" bytes:
> --8<--
> --- swap2.xxd   2005-07-04 21:00:10.157261360 +0200
> +++ swap1.xxd   2005-07-04 21:00:01.894517488 +0200
> @@ -62,7 +62,7 @@
>  00003d0: 0000 0000 0000 0000 0000 0000 0000 0000 
> ................
>  00003e0: 0000 0000 0000 0000 0000 0000 0000 0000 
> ................
>  00003f0: 0000 0000 0000 0000 0000 0000 0000 0000 
> ................
> -0000400: 0000 0000 0000 0000 0000 0000 0000 0000 
> ................
> +0000400: 0100 0000 ff09 0000 0000 0000 0000 0000 
> ................
>  0000410: 0000 0000 0000 0000 0000 0000 0000 0000 
> ................
>  0000420: 0000 0000 0000 0000 0000 0000 0000 0000 
> ................
>  0000430: 0000 0000 0000 0000 0000 0000 0000 0000 
> ................
> @@ -253,7 +253,7 @@
>  0000fc0: 0000 0000 0000 0000 0000 0000 0000 0000 
> ................
>  0000fd0: 0000 0000 0000 0000 0000 0000 0000 0000 
> ................
>  0000fe0: 0000 0000 0000 0000 0000 0000 0000 0000 
> ................
> -0000ff0: 0000 0000 0000 0000 0000 0000 0000 0000 
> ................
> +0000ff0: 0000 0000 0000 5357 4150 5350 4143 4532 
> ......SWAPSPACE2
>  0001000: 0000 0000 0000 0000 0000 0000 0000 0000 
> ................
>  0001010: 0000 0000 0000 0000 0000 0000 0000 0000 
> ................
>  0001020: 0000 0000 0000 0000 0000 0000 0000 0000 
> ................
> --8<--
> 
> I created an image (hdb1.img) of the damaged
> partition using dd
> and tried to work with various tools on it.
> 
> Here is the output of 'fsck.ext3 -n -v hdb1.img':
> --8<--
> e2fsck 1.35 (28-Feb-2004)
> Couldn't find ext2 superblock, trying backup
> blocks...
> hdb1.img was not cleanly unmounted, check forced.
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> Free blocks count wrong for group #0 (24043,
> counted=0).
> Fix? no
> 
> Free blocks count wrong for group #1 (32250,
> counted=0).
> Fix? no
> 
> Free blocks count wrong for group #2 (32253,
> counted=0).
> Fix? no
> 
> Free blocks count wrong for group #3 (32250,
> counted=158).
> Fix? no
> 
> Free blocks count wrong for group #4 (32253,
> counted=8).
> Fix? no
> 
> Free blocks count wrong for group #5 (32250,
> counted=28).
> Fix? no
> 
> Free blocks count wrong for group #6 (32253,
> counted=6822).
> Fix? no
> 
> Free blocks count wrong for group #7 (32250,
> counted=10428).
> Fix? no
> 
> Free blocks count wrong for group #8 (32253,
> counted=11170).
> Fix? no
> 
> Free blocks count wrong for group #9 (32250,
> counted=4239).
> Fix? no
> 
> Free blocks count wrong for group #10 (32253,
> counted=24482).
> Fix? no
> 
> Free blocks count wrong for group #11 (32253,
> counted=21184).
> Fix? no
> 
> Free blocks count wrong for group #12 (32253,
> counted=25657).
> Fix? no
> 
> Free blocks count wrong for group #13 (32253,
> counted=13674).
> Fix? no
> 
> Free blocks count wrong for group #14 (32253,
> counted=15007).
> Fix? no
> 
> Free blocks count wrong for group #15 (32253,
> counted=11366).
> Fix? no
> 
> [ removed many lines, complete log file at
>  
>
http://wolfram.schlich.org/tmp/fsck.ext3_-n_-v_hdb1.img
> ]
> 
> Free inodes count wrong for group #213 (16416,
> counted=14498).
> Fix? no
> 
> Directories count wrong for group #213 (0,
> counted=241).
> Fix? no
> 
> Free inodes count wrong for group #214 (16416,
> counted=14524).
> Fix? no
> 
> Directories count wrong for group #214 (0,
> counted=126).
> Fix? no
> 
> Free inodes count wrong for group #215 (16416,
> counted=14441).
> Fix? no
> 
> Directories count wrong for group #215 (0,
> counted=114).
> Fix? no
> 
> Free inodes count wrong for group #216 (16416,
> counted=15214).
> Fix? no
> 
> Directories count wrong for group #216 (0,
> counted=99).
> Fix? no
> 
> Free inodes count wrong for group #217 (16416,
> counted=14898).
> Fix? no
> 
> Directories count wrong for group #217 (0,
> counted=216).
> Fix? no
> 
> Free inodes count wrong for group #218 (16416,
> counted=14878).
> Fix? no
> 
> Directories count wrong for group #218 (0,
> counted=187).
> Fix? no
> 
> Free inodes count wrong for group #219 (16416,
> counted=16033).
> Fix? no
> 
> Directories count wrong for group #219 (0,
> counted=37).
> Fix? no
> 
> Free inodes count wrong for group #220 (16416,
> counted=14949).
> Fix? no
> 
> Directories count wrong for group #220 (0,
> counted=128).
> Fix? no
> 
> Free inodes count wrong for group #221 (16416,
> counted=15167).
> Fix? no
> 
> Directories count wrong for group #221 (0,
> counted=102).
> 
=== message truncated ===


____________________________________________________
Sell on Yahoo! Auctions ? no fees. Bid on great items.  
http://auctions.yahoo.com/


From adilger at clusterfs.com  Fri Jul  8 18:20:25 2005
From: adilger at clusterfs.com (Andreas Dilger)
Date: Fri, 8 Jul 2005 12:20:25 -0600
Subject: Accidentally issued "mkswap" on ext3 fs -- recovery possible?
In-Reply-To: <20050708173319.57811.qmail@web30202.mail.mud.yahoo.com>
References: <20050708160337.ALLYOURBASEAREBELONGTOUS.A15231@bla.fasel.org>
	<20050708173319.57811.qmail@web30202.mail.mud.yahoo.com>
Message-ID: <20050708182025.GC5335@schatzie.adilger.int>

On Jul 08, 2005  10:33 -0700, ha haha wrote:
> > I accidentally issued "mkswap" on a used ext3 fs
> > partition (~30G) :-/
> > 
> > I have analyzed the behaviour of mkswap using two
> > test files and it
> > appears to only change "some" bytes:
> > --8<--
> > --- swap2.xxd   2005-07-04 21:00:10.157261360 +0200
> > +++ swap1.xxd   2005-07-04 21:00:01.894517488 +0200
> > @@ -62,7 +62,7 @@
> >  00003d0: 0000 0000 0000 0000 0000 0000 0000 0000 
> > ................
> >  00003e0: 0000 0000 0000 0000 0000 0000 0000 0000 
> > ................
> >  00003f0: 0000 0000 0000 0000 0000 0000 0000 0000 
> > ................
> > -0000400: 0000 0000 0000 0000 0000 0000 0000 0000 
> > ................
> > +0000400: 0100 0000 ff09 0000 0000 0000 0000 0000 
> > ................
> >  0000410: 0000 0000 0000 0000 0000 0000 0000 0000 
> > ................
> >  0000420: 0000 0000 0000 0000 0000 0000 0000 0000 
> > ................
> >  0000430: 0000 0000 0000 0000 0000 0000 0000 0000 
> > ................
> > @@ -253,7 +253,7 @@
> >  0000fc0: 0000 0000 0000 0000 0000 0000 0000 0000 
> > ................
> >  0000fd0: 0000 0000 0000 0000 0000 0000 0000 0000 
> > ................
> >  0000fe0: 0000 0000 0000 0000 0000 0000 0000 0000 
> > ................
> > -0000ff0: 0000 0000 0000 0000 0000 0000 0000 0000 
> > ................
> > +0000ff0: 0000 0000 0000 5357 4150 5350 4143 4532 
> > ......SWAPSPACE2
> >  0001000: 0000 0000 0000 0000 0000 0000 0000 0000 
> > ................
> >  0001010: 0000 0000 0000 0000 0000 0000 0000 0000 
> > ................
> >  0001020: 0000 0000 0000 0000 0000 0000 0000 0000 
> > ................
> > --8<--
> > 
> > Here is the output of 'fsck.ext3 -n -v hdb1.img':
> > --8<--
> > e2fsck 1.35 (28-Feb-2004)
> > Couldn't find ext2 superblock, trying backup
> > blocks...
> > hdb1.img was not cleanly unmounted, check forced.
> > Pass 1: Checking inodes, blocks, and sizes
> > Pass 2: Checking directory structure
> > Pass 3: Checking directory connectivity
> > Pass 4: Checking reference counts
> > Pass 5: Checking group summary information
> > Free blocks count wrong for group #0 (24043,
> > counted=0).
> > Fix? no
> > 
> > Free blocks count wrong for group #1 (32250,
> > counted=0).
> > Fix? no

Should have read original email better.  This is only be blocks count summary,
which is never up-to-date on a backup superblock+group descriptor.  The
checking shows everything is OK.

A good time to make a backup!

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


From menscher at uiuc.edu  Fri Jul  8 18:55:05 2005
From: menscher at uiuc.edu (Damian Menscher)
Date: Fri, 8 Jul 2005 13:55:05 -0500 (CDT)
Subject: [Q] Is this true and does it mean there is dynamic	entation in
 ext2/3?
In-Reply-To: <42B7A0A6.7010806@iki.fi>
References: <a0621024ebed8fb13822d@[129.98.90.227]>
	<20050618191451.GC16314@thunk.org> <42B7A0A6.7010806@iki.fi>
Message-ID: <Pine.LNX.4.62.0507081346470.19460@lx2.physics.uiuc.edu>

[Resurrecting an ancient thread...]

On Tue, 21 Jun 2005, Markus Peuhkuri wrote:
> Theodore Ts'o wrote:
>
>> Ext2/3 has advanced algorithms to make sure that the blocks that are
>> allocated avoid fragmentation, but it is not doing any kind of dynamic
>>
> And there is a tool 'filefrag' in e2fsprogs that reports how fragmented
> a particular file is.  If your disk grows full (over 90-95%, depending
> on file sizes etc..) then it is more difficult to find continuous blocks
> for files.  Now, if you delete files, then new files most probably are
> non-fragmented but those files that were written when disk was full are
> still fragmented.
>
> You can "unfragment" those files just by copying them and deleting old
> ones (if you have plenty of free space), but as Damian told, you must be
> careful with locks and nfs handles.

I have a user who is complaining of general slowness on his machine 
(RH9, so e2fsprogs-1.32-6).  The slowness is particularly bad when he 
types "mail" to read his email.  I can't find anything wrong with the 
system itself (no cpu load, free ram, no heavy disk activity, etc) but 
did note that when opening his mailbox (only 26M, so nothing huge) it 
does hang for 5-10 secs, apparently waiting on disk.  Running filefrag 
on his mailbox indicates it has 1119 extents.  I suspect that may be the 
source of the problem.

Now, I realize some fragmentation is normal, but I'd have expected that 
a mailbox with 328 messages wouldn't get more than 328 fragments (or 
possibly twice that many) maximum.  And the filesystem is only 42% full, 
so that's not causing extra fragmentation.

Is there anything I should be watching for here, or should I just give 
up and copy the file?

Damian Menscher
-- 
-=#| Physics Grad Student & SysAdmin @ U Illinois Urbana-Champaign |#=-
-=#| 488 LLP, 1110 W. Green St, Urbana, IL 61801 Ofc:(217)333-0038 |#=-
-=#| 4602 Beckman, VMIL/MS, Imaging Technology Group:(217)244-3074 |#=-
-=#| <menscher at uiuc.edu> www.uiuc.edu/~menscher/ Fax:(217)333-9819 |#=-
-=#| The above opinions are not necessarily those of my employers. |#=-


From menscher at uiuc.edu  Fri Jul  8 21:55:55 2005
From: menscher at uiuc.edu (Damian Menscher)
Date: Fri, 8 Jul 2005 16:55:55 -0500 (CDT)
Subject: filesystem fragmentation stats?
Message-ID: <Pine.LNX.4.62.0507081651340.19460@lx2.physics.uiuc.edu>

Let me preface this by saying "Yes, I know *nix filesystems don't need 
to worry about fragmentation".

That said, is there a way to check the overall level of fragmentation of 
a live ext3 filesystem?  I know about filefrag, but that's for specific 
files.  And I think e2fsck tells you, but only if you take the 
filesystem offline for the scan.  Is there anything that will give a 
percentage for a *live* filesystem?  (I have a fs that's been at >95% 
usage for quite some time, and I want to check for any fragmentation 
that could have resulted.)

Damian Menscher
-- 
-=#| Physics Grad Student & SysAdmin @ U Illinois Urbana-Champaign |#=-
-=#| 488 LLP, 1110 W. Green St, Urbana, IL 61801 Ofc:(217)333-0038 |#=-
-=#| 4602 Beckman, VMIL/MS, Imaging Technology Group:(217)244-3074 |#=-
-=#| <menscher at uiuc.edu> www.uiuc.edu/~menscher/ Fax:(217)333-9819 |#=-
-=#| The above opinions are not necessarily those of my employers. |#=-


From evilninja at gmx.net  Fri Jul  8 22:09:00 2005
From: evilninja at gmx.net (evilninja)
Date: Sat, 09 Jul 2005 00:09:00 +0200
Subject: filesystem fragmentation stats?
In-Reply-To: <Pine.LNX.4.62.0507081651340.19460@lx2.physics.uiuc.edu>
References: <Pine.LNX.4.62.0507081651340.19460@lx2.physics.uiuc.edu>
Message-ID: <42CEF97C.4080807@gmx.net>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Damian Menscher schrieb:
> Let me preface this by saying "Yes, I know *nix filesystems don't need
> to worry about fragmentation".

this was discussed *very* and again in june by Theodore:

https://www.redhat.com/archives/ext3-users/2005-June/msg00026.html

> That said, is there a way to check the overall level of fragmentation of
> a live ext3 filesystem?  I know about filefrag, but that's for specific
> files.  And I think e2fsck tells you, but only if you take the
> filesystem offline for the scan. 

"tune2fs -l" tells you about "Fragments per group", and "fsck.ext2 -nv"
opens the fs read-only and print some nice stats after that.


- --
BOFH excuse #341:

HTTPD Error 666 : BOFH was here
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCzvl8C/PVm5+NVoYRAvc7AJwKRTYhWussiZquiawLNZzjSnSJ7ACg7uoU
39MB4i90ajg+ckER52pqfZ4=
=LFM+
-----END PGP SIGNATURE-----


From evilninja at gmx.net  Fri Jul  8 22:10:09 2005
From: evilninja at gmx.net (evilninja)
Date: Sat, 09 Jul 2005 00:10:09 +0200
Subject: filesystem fragmentation stats?
In-Reply-To: <42CEF97C.4080807@gmx.net>
References: <Pine.LNX.4.62.0507081651340.19460@lx2.physics.uiuc.edu>
	<42CEF97C.4080807@gmx.net>
Message-ID: <42CEF9C1.7050908@gmx.net>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

evilninja schrieb:
> Damian Menscher schrieb:
> 
>>>Let me preface this by saying "Yes, I know *nix filesystems don't need
>>>to worry about fragmentation".
> 
> 
> this was discussed *very* and again in june by Theodore:
- ---------------------------^ often ;-)

- --
BOFH excuse #422:

Someone else stole your IP address, call the Internet detectives!
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCzvnAC/PVm5+NVoYRAgfTAKDuGNL6CpEcjdIKIVkW8vNBg0+7OACg1L90
7qB6it50hyYlY7hIYq3Gx4U=
=omCQ
-----END PGP SIGNATURE-----


From menscher at uiuc.edu  Sat Jul  9 07:15:41 2005
From: menscher at uiuc.edu (Damian Menscher)
Date: Sat, 9 Jul 2005 02:15:41 -0500 (CDT)
Subject: filesystem fragmentation stats?
In-Reply-To: <42CEF97C.4080807@gmx.net>
References: <Pine.LNX.4.62.0507081651340.19460@lx2.physics.uiuc.edu>
	<42CEF97C.4080807@gmx.net>
Message-ID: <C52A06C0689BBC498087200C29E6753C193247@phyexha.physics.uiuc.edu>

On Sat, 9 Jul 2005, evilninja wrote:
> Damian Menscher schrieb:
>> That said, is there a way to check the overall level of fragmentation of
>> a live ext3 filesystem?  I know about filefrag, but that's for specific
>> files.  And I think e2fsck tells you, but only if you take the
>> filesystem offline for the scan.
>
> "tune2fs -l" tells you about "Fragments per group", and "fsck.ext2 -nv"
> opens the fs read-only and print some nice stats after that.

I noticed the "Fragments per group", but haven't been able to find 
anything that documents what it means.  Could someone here comment?

Running e2fsck -nvf gave the info I was looking for:

On my mail partition:
       93 inodes used (0%)
       30 non-contiguous inodes (32.3%)
          # of inodes with ind/dind/tind blocks: 44/17/0
    60540 blocks used (45%)
        0 bad blocks
        0 large files

       80 regular files
        4 directories
--------
       84 files

On my home partition:
   191326 inodes used (7%)
    11084 non-contiguous inodes (5.8%)
          # of inodes with ind/dind/tind blocks: 21649/707/0
  4581594 blocks used (86%)
        0 bad blocks
        0 large files

   174096 regular files
    15787 directories
        5 fifos
      274 links
     1421 symbolic links (1356 fast symbolic links)
        8 sockets
--------
   191591 files


So, interestingly, the home directories haven't gotten too fragmented 
despite being at >90% usage for several months (much of that time at 
>95%).  Apparently the 5% reserved for the system factors in here. 
That's certainly a relief!

On the other hand, the mail spools are getting horribly fragmented 
(>30%), probably because mail programs are deleting messages out of the 
middle of the spools?  It's hard to imagine any other reason for 30% 
fragmentation on a filesystem that's less than half full.  (Another 
system I manage also shows high fragmentation for the mail spool, so I 
think this must be a generic problem.)

Damian Menscher
-- 
-=#| Physics Grad Student & SysAdmin @ U Illinois Urbana-Champaign |#=-
-=#| 488 LLP, 1110 W. Green St, Urbana, IL 61801 Ofc:(217)333-0038 |#=-
-=#| 4602 Beckman, VMIL/MS, Imaging Technology Group:(217)244-3074 |#=-
-=#| <menscher at uiuc.edu> www.uiuc.edu/~menscher/ Fax:(217)333-9819 |#=-
-=#| The above opinions are not necessarily those of my employers. |#=-


From bunk at stusta.de  Tue Jul 12 20:27:42 2005
From: bunk at stusta.de (Adrian Bunk)
Date: Tue, 12 Jul 2005 22:27:42 +0200
Subject: [2.6 patch] fs/jbd/: possible cleanups
Message-ID: <20050712202742.GM4034@stusta.de>

This patch contains the following possible cleanups:
- make needlessly global functions static
- journal.c: remove the unused global function __journal_internal_check
             and move the check to journal_init
- remove the following write-only global variable:
  - journal.c: current_journal
- remove the following unneeded EXPORT_SYMBOL's:
  - journal.c: journal_check_used_features
  - journal.c: journal_recover

Signed-off-by: Adrian Bunk <bunk at stusta.de>

---

This patch was already sent on:
- 3 Jul 2005
- 14 Jun 2005

 fs/jbd/journal.c    |   41 ++++++++++++++++++-----------------------
 fs/jbd/revoke.c     |    3 ++-
 include/linux/jbd.h |    3 ---
 3 files changed, 20 insertions(+), 27 deletions(-)

--- linux-2.6.12-rc6-mm1-full/include/linux/jbd.h.old	2005-06-14 03:58:20.000000000 +0200
+++ linux-2.6.12-rc6-mm1-full/include/linux/jbd.h	2005-06-14 04:00:56.000000000 +0200
@@ -900,8 +900,6 @@
 				int start, int len, int bsize);
 extern journal_t * journal_init_inode (struct inode *);
 extern int	   journal_update_format (journal_t *);
-extern int	   journal_check_used_features 
-		   (journal_t *, unsigned long, unsigned long, unsigned long);
 extern int	   journal_check_available_features 
 		   (journal_t *, unsigned long, unsigned long, unsigned long);
 extern int	   journal_set_features 
@@ -914,7 +912,6 @@
 extern int	   journal_skip_recovery	(journal_t *);
 extern void	   journal_update_superblock	(journal_t *, int);
 extern void	   __journal_abort_hard	(journal_t *);
-extern void	   __journal_abort_soft	(journal_t *, int);
 extern void	   journal_abort      (journal_t *, int);
 extern int	   journal_errno      (journal_t *);
 extern void	   journal_ack_err    (journal_t *);
--- linux-2.6.12-rc6-mm1-full/fs/jbd/journal.c.old	2005-06-14 03:57:39.000000000 +0200
+++ linux-2.6.12-rc6-mm1-full/fs/jbd/journal.c	2005-06-14 04:08:24.000000000 +0200
@@ -59,13 +59,11 @@
 EXPORT_SYMBOL(journal_init_dev);
 EXPORT_SYMBOL(journal_init_inode);
 EXPORT_SYMBOL(journal_update_format);
-EXPORT_SYMBOL(journal_check_used_features);
 EXPORT_SYMBOL(journal_check_available_features);
 EXPORT_SYMBOL(journal_set_features);
 EXPORT_SYMBOL(journal_create);
 EXPORT_SYMBOL(journal_load);
 EXPORT_SYMBOL(journal_destroy);
-EXPORT_SYMBOL(journal_recover);
 EXPORT_SYMBOL(journal_update_superblock);
 EXPORT_SYMBOL(journal_abort);
 EXPORT_SYMBOL(journal_errno);
@@ -81,6 +79,7 @@
 EXPORT_SYMBOL(journal_force_commit);
 
 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
+static void __journal_abort_soft (journal_t *journal, int errno);
 
 /*
  * Helper function used to manage commit timeouts
@@ -93,16 +92,6 @@
 	wake_up_process(p);
 }
 
-/* Static check for data structure consistency.  There's no code
- * invoked --- we'll just get a linker failure if things aren't right.
- */
-void __journal_internal_check(void)
-{
-	extern void journal_bad_superblock_size(void);
-	if (sizeof(struct journal_superblock_s) != 1024)
-		journal_bad_superblock_size();
-}
-
 /*
  * kjournald: The main thread function used to manage a logging device
  * journal.
@@ -119,16 +108,12 @@
  *    known as checkpointing, and this thread is responsible for that job.
  */
 
-journal_t *current_journal;		// AKPM: debug
-
-int kjournald(void *arg)
+static int kjournald(void *arg)
 {
 	journal_t *journal = (journal_t *) arg;
 	transaction_t *transaction;
 	struct timer_list timer;
 
-	current_journal = journal;
-
 	daemonize("kjournald");
 
 	/* Set up an interval timer which can be used to trigger a
@@ -1181,8 +1166,10 @@
  * features.  Return true (non-zero) if it does. 
  **/
 
-int journal_check_used_features (journal_t *journal, unsigned long compat,
-				 unsigned long ro, unsigned long incompat)
+static int journal_check_used_features (journal_t *journal,
+					unsigned long compat,
+					unsigned long ro,
+					unsigned long incompat)
 {
 	journal_superblock_t *sb;
 
@@ -1439,7 +1426,7 @@
  * device this journal is present.
  */
 
-const char *journal_dev_name(journal_t *journal, char *buffer)
+static const char *journal_dev_name(journal_t *journal, char *buffer)
 {
 	struct block_device *bdev;
 
@@ -1485,7 +1472,7 @@
 
 /* Soft abort: record the abort error status in the journal superblock,
  * but don't do any other IO. */
-void __journal_abort_soft (journal_t *journal, int errno)
+static void __journal_abort_soft (journal_t *journal, int errno)
 {
 	if (journal->j_flags & JFS_ABORT)
 		return;
@@ -1888,7 +1875,7 @@
 
 static struct proc_dir_entry *proc_jbd_debug;
 
-int read_jbd_debug(char *page, char **start, off_t off,
+static int read_jbd_debug(char *page, char **start, off_t off,
 			  int count, int *eof, void *data)
 {
 	int ret;
@@ -1898,7 +1885,7 @@
 	return ret;
 }
 
-int write_jbd_debug(struct file *file, const char __user *buffer,
+static int write_jbd_debug(struct file *file, const char __user *buffer,
 			   unsigned long count, void *data)
 {
 	char buf[32];
@@ -1987,6 +1974,14 @@
 {
 	int ret;
 
+/* Static check for data structure consistency.  There's no code
+ * invoked --- we'll just get a linker failure if things aren't right.
+ */
+	extern void journal_bad_superblock_size(void);
+	if (sizeof(struct journal_superblock_s) != 1024)
+		journal_bad_superblock_size();
+
+
 	ret = journal_init_caches();
 	if (ret != 0)
 		journal_destroy_caches();
--- linux-2.6.12-rc6-mm1-full/fs/jbd/revoke.c.old	2005-06-14 03:58:36.000000000 +0200
+++ linux-2.6.12-rc6-mm1-full/fs/jbd/revoke.c	2005-06-14 03:58:41.000000000 +0200
@@ -116,7 +116,8 @@
 		(block << (hash_shift - 12))) & (table->hash_size - 1);
 }
 
-int insert_revoke_hash(journal_t *journal, unsigned long blocknr, tid_t seq)
+static int insert_revoke_hash(journal_t *journal, unsigned long blocknr,
+			      tid_t seq)
 {
 	struct list_head *hash_list;
 	struct jbd_revoke_record_s *record;


From adilger at clusterfs.com  Tue Jul 12 22:32:44 2005
From: adilger at clusterfs.com (Andreas Dilger)
Date: Tue, 12 Jul 2005 16:32:44 -0600
Subject: [2.6 patch] fs/jbd/: possible cleanups
In-Reply-To: <20050712202742.GM4034@stusta.de>
References: <20050712202742.GM4034@stusta.de>
Message-ID: <20050712223243.GW5335@schatzie.adilger.int>

On Jul 12, 2005  22:27 +0200, Adrian Bunk wrote:
> - make needlessly global functions static

I had previously commented on this patch:

> - journal.c: remove the unused global function __journal_internal_check
>              and move the check to journal_init

I don't mind removing this function, but it shouldn't be put inside #ifdef
JBD_DEBUG, as that would remove the check from the compiler-parsed code
and defeat the purpose of the check.

> - remove the following write-only global variable:
>   - journal.c: current_journal

Seems fine.

> - remove the following unneeded EXPORT_SYMBOL's:
>   - journal.c: journal_check_used_features

Should be kept for API completeness.

> - remove the following unneeded EXPORT_SYMBOL's:
>   - journal.c: journal_recover

Doesn't appear usable in any case, should be removed.

> Signed-off-by: Adrian Bunk <bunk at stusta.de>
> 
> ---
> 
> This patch was already sent on:
> - 3 Jul 2005
> - 14 Jun 2005
> 
>  fs/jbd/journal.c    |   41 ++++++++++++++++++-----------------------
>  fs/jbd/revoke.c     |    3 ++-
>  include/linux/jbd.h |    3 ---
>  3 files changed, 20 insertions(+), 27 deletions(-)
> 
> --- linux-2.6.12-rc6-mm1-full/include/linux/jbd.h.old	2005-06-14 03:58:20.000000000 +0200
> +++ linux-2.6.12-rc6-mm1-full/include/linux/jbd.h	2005-06-14 04:00:56.000000000 +0200
> @@ -900,8 +900,6 @@
>  				int start, int len, int bsize);
>  extern journal_t * journal_init_inode (struct inode *);
>  extern int	   journal_update_format (journal_t *);
> -extern int	   journal_check_used_features 
> -		   (journal_t *, unsigned long, unsigned long, unsigned long);
>  extern int	   journal_check_available_features 
>  		   (journal_t *, unsigned long, unsigned long, unsigned long);
>  extern int	   journal_set_features 
> @@ -914,7 +912,6 @@
>  extern int	   journal_skip_recovery	(journal_t *);
>  extern void	   journal_update_superblock	(journal_t *, int);
>  extern void	   __journal_abort_hard	(journal_t *);
> -extern void	   __journal_abort_soft	(journal_t *, int);
>  extern void	   journal_abort      (journal_t *, int);
>  extern int	   journal_errno      (journal_t *);
>  extern void	   journal_ack_err    (journal_t *);
> --- linux-2.6.12-rc6-mm1-full/fs/jbd/journal.c.old	2005-06-14 03:57:39.000000000 +0200
> +++ linux-2.6.12-rc6-mm1-full/fs/jbd/journal.c	2005-06-14 04:08:24.000000000 +0200
> @@ -59,13 +59,11 @@
>  EXPORT_SYMBOL(journal_init_dev);
>  EXPORT_SYMBOL(journal_init_inode);
>  EXPORT_SYMBOL(journal_update_format);
> -EXPORT_SYMBOL(journal_check_used_features);
>  EXPORT_SYMBOL(journal_check_available_features);
>  EXPORT_SYMBOL(journal_set_features);
>  EXPORT_SYMBOL(journal_create);
>  EXPORT_SYMBOL(journal_load);
>  EXPORT_SYMBOL(journal_destroy);
> -EXPORT_SYMBOL(journal_recover);
>  EXPORT_SYMBOL(journal_update_superblock);
>  EXPORT_SYMBOL(journal_abort);
>  EXPORT_SYMBOL(journal_errno);
> @@ -81,6 +79,7 @@
>  EXPORT_SYMBOL(journal_force_commit);
>  
>  static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
> +static void __journal_abort_soft (journal_t *journal, int errno);
>  
>  /*
>   * Helper function used to manage commit timeouts
> @@ -93,16 +92,6 @@
>  	wake_up_process(p);
>  }
>  
> -/* Static check for data structure consistency.  There's no code
> - * invoked --- we'll just get a linker failure if things aren't right.
> - */
> -void __journal_internal_check(void)
> -{
> -	extern void journal_bad_superblock_size(void);
> -	if (sizeof(struct journal_superblock_s) != 1024)
> -		journal_bad_superblock_size();
> -}
> -
>  /*
>   * kjournald: The main thread function used to manage a logging device
>   * journal.
> @@ -119,16 +108,12 @@
>   *    known as checkpointing, and this thread is responsible for that job.
>   */
>  
> -journal_t *current_journal;		// AKPM: debug
> -
> -int kjournald(void *arg)
> +static int kjournald(void *arg)
>  {
>  	journal_t *journal = (journal_t *) arg;
>  	transaction_t *transaction;
>  	struct timer_list timer;
>  
> -	current_journal = journal;
> -
>  	daemonize("kjournald");
>  
>  	/* Set up an interval timer which can be used to trigger a
> @@ -1181,8 +1166,10 @@
>   * features.  Return true (non-zero) if it does. 
>   **/
>  
> -int journal_check_used_features (journal_t *journal, unsigned long compat,
> -				 unsigned long ro, unsigned long incompat)
> +static int journal_check_used_features (journal_t *journal,
> +					unsigned long compat,
> +					unsigned long ro,
> +					unsigned long incompat)
>  {
>  	journal_superblock_t *sb;
>  
> @@ -1439,7 +1426,7 @@
>   * device this journal is present.
>   */
>  
> -const char *journal_dev_name(journal_t *journal, char *buffer)
> +static const char *journal_dev_name(journal_t *journal, char *buffer)
>  {
>  	struct block_device *bdev;
>  
> @@ -1485,7 +1472,7 @@
>  
>  /* Soft abort: record the abort error status in the journal superblock,
>   * but don't do any other IO. */
> -void __journal_abort_soft (journal_t *journal, int errno)
> +static void __journal_abort_soft (journal_t *journal, int errno)
>  {
>  	if (journal->j_flags & JFS_ABORT)
>  		return;
> @@ -1888,7 +1875,7 @@
>  
>  static struct proc_dir_entry *proc_jbd_debug;
>  
> -int read_jbd_debug(char *page, char **start, off_t off,
> +static int read_jbd_debug(char *page, char **start, off_t off,
>  			  int count, int *eof, void *data)
>  {
>  	int ret;
> @@ -1898,7 +1885,7 @@
>  	return ret;
>  }
>  
> -int write_jbd_debug(struct file *file, const char __user *buffer,
> +static int write_jbd_debug(struct file *file, const char __user *buffer,
>  			   unsigned long count, void *data)
>  {
>  	char buf[32];
> @@ -1987,6 +1974,14 @@
>  {
>  	int ret;
>  
> +/* Static check for data structure consistency.  There's no code
> + * invoked --- we'll just get a linker failure if things aren't right.
> + */
> +	extern void journal_bad_superblock_size(void);
> +	if (sizeof(struct journal_superblock_s) != 1024)
> +		journal_bad_superblock_size();
> +
> +
>  	ret = journal_init_caches();
>  	if (ret != 0)
>  		journal_destroy_caches();
> --- linux-2.6.12-rc6-mm1-full/fs/jbd/revoke.c.old	2005-06-14 03:58:36.000000000 +0200
> +++ linux-2.6.12-rc6-mm1-full/fs/jbd/revoke.c	2005-06-14 03:58:41.000000000 +0200
> @@ -116,7 +116,8 @@
>  		(block << (hash_shift - 12))) & (table->hash_size - 1);
>  }
>  
> -int insert_revoke_hash(journal_t *journal, unsigned long blocknr, tid_t seq)
> +static int insert_revoke_hash(journal_t *journal, unsigned long blocknr,
> +			      tid_t seq)
>  {
>  	struct list_head *hash_list;
>  	struct jbd_revoke_record_s *record;

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


From bunk at stusta.de  Tue Jul 12 22:43:53 2005
From: bunk at stusta.de (Adrian Bunk)
Date: Wed, 13 Jul 2005 00:43:53 +0200
Subject: [2.6 patch] fs/jbd/: possible cleanups
In-Reply-To: <20050712223243.GW5335@schatzie.adilger.int>
References: <20050712202742.GM4034@stusta.de>
	<20050712223243.GW5335@schatzie.adilger.int>
Message-ID: <20050712224353.GN4034@stusta.de>

On Tue, Jul 12, 2005 at 04:32:44PM -0600, Andreas Dilger wrote:
> On Jul 12, 2005  22:27 +0200, Adrian Bunk wrote:
>...
> > - journal.c: remove the unused global function __journal_internal_check
> >              and move the check to journal_init
> 
> I don't mind removing this function, but it shouldn't be put inside #ifdef
> JBD_DEBUG, as that would remove the check from the compiler-parsed code
> and defeat the purpose of the check.

???

That's not what my patch is doing.

journal_init() is not inside an #ifdef JBD_DEBUG.

>...
> > - remove the following unneeded EXPORT_SYMBOL's:
> >   - journal.c: journal_check_used_features
> 
> Should be kept for API completeness.
>...

The function itself isn't removed.

Does it really has to stay exported or isn't it enough to re-export it 
when a user appears?

> Cheers, Andreas

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


From adilger at clusterfs.com  Tue Jul 12 23:05:39 2005
From: adilger at clusterfs.com (Andreas Dilger)
Date: Tue, 12 Jul 2005 17:05:39 -0600
Subject: [2.6 patch] fs/jbd/: possible cleanups
In-Reply-To: <20050712224353.GN4034@stusta.de>
References: <20050712202742.GM4034@stusta.de>
	<20050712223243.GW5335@schatzie.adilger.int>
	<20050712224353.GN4034@stusta.de>
Message-ID: <20050712230539.GX5335@schatzie.adilger.int>

On Jul 13, 2005  00:43 +0200, Adrian Bunk wrote:
> On Tue, Jul 12, 2005 at 04:32:44PM -0600, Andreas Dilger wrote:
> > I don't mind removing this function, but it shouldn't be put inside #ifdef
> > JBD_DEBUG, as that would remove the check from the compiler-parsed code
> > and defeat the purpose of the check.
> 
> That's not what my patch is doing.
> 
> journal_init() is not inside an #ifdef JBD_DEBUG.

My bad.  You didn't generate diff with -p (which I normally do and is
incredibly useful when reviewing patches) and I saw "write_jbd_debug()"
above and my brain went on autopilot assuming the code had moved into
that function.  Objection withdrawn.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


From jwbaker at acm.org  Thu Jul 14 00:12:26 2005
From: jwbaker at acm.org (Jeffrey W. Baker)
Date: Wed, 13 Jul 2005 17:12:26 -0700
Subject: a comparison of ext3, jfs, and xfs on hardware raid
Message-ID: <1121299946.20950.26.camel@toonses.gghcwest.com>

I'm setting up a new file server and I just can't seem to get the
expected performance from ext3.  Unfortunately I'm stuck with ext3 due
to my use of Lustre.  So I'm hoping you dear readers will send me some
tips for increasing ext3 performance.

The system is using an Areca hardware raid controller with 5 7200RPM
SATA disks.  The RAID controller has 128MB of cache and the disks each
have 8MB.  The cache is write-back.  The system is Linux 2.6.12 on amd64
with 1GB system memory.

Using bonnie++ with a 10GB fileset, in MB/s:

         ext3    jfs    xfs
Read     112     188    141
Write     97     157    167
Rewrite   51      71     60

These number were obtained using the mkfs defaults for all filesystems
and the deadline scheduler.  As you can see JFS is kicking butt on this
test.

Next I used pgbench to test parallel random I/O.  pgbench has
configurable number of clients and transactions per client, and can
change the size of its database.  I used a database of 100 million
tuples (scale factor 1000).  I times 100,000 transactions on each
filesystem, with 10 and 100 clients per run.  Figures are in
transactions per second.


              ext3  jfs  xfs
10 Clients      55   81   68
100 Clients     61  100   64

Here XFS is not substantially faster but JFS continues to lead.  

JFS is roughly 60% faster than ext3 on pgbench and 40-70% faster on
bonnie++ linear I/O.

Are there any tunables that I might want to adjust to get better
performance from ext3?

-jwb


From adilger at clusterfs.com  Thu Jul 14 18:33:30 2005
From: adilger at clusterfs.com (Andreas Dilger)
Date: Thu, 14 Jul 2005 12:33:30 -0600
Subject: a comparison of ext3, jfs, and xfs on hardware raid
In-Reply-To: <1121299946.20950.26.camel@toonses.gghcwest.com>
References: <1121299946.20950.26.camel@toonses.gghcwest.com>
Message-ID: <20050714183330.GN5335@schatzie.adilger.int>

On Jul 13, 2005  17:12 -0700, Jeffrey W. Baker wrote:
> I'm setting up a new file server and I just can't seem to get the
> expected performance from ext3.  Unfortunately I'm stuck with ext3 due
> to my use of Lustre.  So I'm hoping you dear readers will send me some
> tips for increasing ext3 performance.
> 
> The system is using an Areca hardware raid controller with 5 7200RPM
> SATA disks.  The RAID controller has 128MB of cache and the disks each
> have 8MB.  The cache is write-back.  The system is Linux 2.6.12 on amd64
> with 1GB system memory.
> 
> Using bonnie++ with a 10GB fileset, in MB/s:
> 
>          ext3    jfs    xfs
> Read     112     188    141
> Write     97     157    167
> Rewrite   51      71     60
> 
> These number were obtained using the mkfs defaults for all filesystems
> and the deadline scheduler.  As you can see JFS is kicking butt on this
> test.

One thing that is important for Lustre is performance of EAs.  See
http://samba.org/~tridge/xattr_results/ for a comparison.  Lustre
uses large inodes (-I 256 or larger) to store the EAs efficiently.

> Next I used pgbench to test parallel random I/O.  pgbench has
> configurable number of clients and transactions per client, and can
> change the size of its database.  I used a database of 100 million
> tuples (scale factor 1000).  I times 100,000 transactions on each
> filesystem, with 10 and 100 clients per run.  Figures are in
> transactions per second.
> 
>               ext3  jfs  xfs
> 10 Clients      55   81   68
> 100 Clients     61  100   64
> 
> Here XFS is not substantially faster but JFS continues to lead.  
> 
> JFS is roughly 60% faster than ext3 on pgbench and 40-70% faster on
> bonnie++ linear I/O.

This is a bit surprising, I've never heard JFS as a leader in many
performance tests.  Is pgbench at all related to dbench?  The problem
with dbench is that for cases where the filesystem does no IO at all
it reports a best result.  In real life the data has to make it to
disk at some point.

See http://sudhaa.com/~benchmark/ext3/newtiobenchresults.ext3gold/newtiobench/newtiobench.html
for a comparison of ext3, xfs, jfs in the mode that Lustre runs in
(specifically column 7, 14, 18).

> Are there any tunables that I might want to adjust to get better
> performance from ext3?

Try creating your ext3 filesystem with a larger journal, as Lustre does:

mkfs -J size=400 ...

size is in MB, 400 might be excessive for your setup - I'd be interested
in hearing where the "sweet spot" is for journal size.  The latest e2fsprogs
use 128MB as the largest default size (up from 32MB) for large filesystems.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


From sonny at burdell.org  Thu Jul 14 18:53:16 2005
From: sonny at burdell.org (Sonny Rao)
Date: Thu, 14 Jul 2005 14:53:16 -0400
Subject: a comparison of ext3, jfs, and xfs on hardware raid
In-Reply-To: <20050714183330.GN5335@schatzie.adilger.int>
References: <1121299946.20950.26.camel@toonses.gghcwest.com>
	<20050714183330.GN5335@schatzie.adilger.int>
Message-ID: <20050714185316.GA25794@kevlar.burdell.org>

On Thu, Jul 14, 2005 at 12:33:30PM -0600, Andreas Dilger wrote:
> On Jul 13, 2005  17:12 -0700, Jeffrey W. Baker wrote:
> > I'm setting up a new file server and I just can't seem to get the
> > expected performance from ext3.  Unfortunately I'm stuck with ext3 due
> > to my use of Lustre.  So I'm hoping you dear readers will send me some
> > tips for increasing ext3 performance.
> > 
> > The system is using an Areca hardware raid controller with 5 7200RPM
> > SATA disks.  The RAID controller has 128MB of cache and the disks each
> > have 8MB.  The cache is write-back.  The system is Linux 2.6.12 on amd64
> > with 1GB system memory.
> > 
> > Using bonnie++ with a 10GB fileset, in MB/s:
> > 
> >          ext3    jfs    xfs
> > Read     112     188    141
> > Write     97     157    167
> > Rewrite   51      71     60
> > 
> > These number were obtained using the mkfs defaults for all filesystems
> > and the deadline scheduler.  As you can see JFS is kicking butt on this
> > test.
> 
> One thing that is important for Lustre is performance of EAs.  See
> http://samba.org/~tridge/xattr_results/ for a comparison.  Lustre
> uses large inodes (-I 256 or larger) to store the EAs efficiently.
> 
> > Next I used pgbench to test parallel random I/O.  pgbench has
> > configurable number of clients and transactions per client, and can
> > change the size of its database.  I used a database of 100 million
> > tuples (scale factor 1000).  I times 100,000 transactions on each
> > filesystem, with 10 and 100 clients per run.  Figures are in
> > transactions per second.
> > 
> >               ext3  jfs  xfs
> > 10 Clients      55   81   68
> > 100 Clients     61  100   64
> > 
> > Here XFS is not substantially faster but JFS continues to lead.  
> > 
> > JFS is roughly 60% faster than ext3 on pgbench and 40-70% faster on
> > bonnie++ linear I/O.
> 
> This is a bit surprising, I've never heard JFS as a leader in many
> performance tests.  Is pgbench at all related to dbench?  The problem
> with dbench is that for cases where the filesystem does no IO at all
> it reports a best result.  In real life the data has to make it to
> disk at some point.

JFS tends to lead in two areas, low cpu utilization compared to other
filesystems, and on a new filesystem, layout is generally very good.

The low CPU utilization helps in environments where you have a lot of
filesystems or just a lot of I/O going on, we've seen on SPEC SFS that
JFS tends to be the best because of that.  (Yes, SPEC SFS is a rather
crazy workload, but then so are a lot of other common ones)

JFS's main weak point is on meta-data intensive workloads (like
dbench) because of deficiencies in the logging system and some
poorly placed synchronous operations which are currently being
tackled. 

We've also been slowly pushing in changes to improve JFS performance,
some of them have made it into 2.6.12.

Sonny


From jwbaker at acm.org  Thu Jul 14 18:56:15 2005
From: jwbaker at acm.org (Jeffrey W. Baker)
Date: Thu, 14 Jul 2005 11:56:15 -0700
Subject: a comparison of ext3, jfs, and xfs on hardware raid
In-Reply-To: <20050714183330.GN5335@schatzie.adilger.int>
References: <1121299946.20950.26.camel@toonses.gghcwest.com>
	<20050714183330.GN5335@schatzie.adilger.int>
Message-ID: <1121367375.20950.64.camel@toonses.gghcwest.com>

On Thu, 2005-07-14 at 12:33 -0600, Andreas Dilger wrote:
> On Jul 13, 2005  17:12 -0700, Jeffrey W. Baker wrote:
> > Using bonnie++ with a 10GB fileset, in MB/s:
> > 
> >          ext3    jfs    xfs
> > Read     112     188    141
> > Write     97     157    167
> > Rewrite   51      71     60
> > 
> > These number were obtained using the mkfs defaults for all filesystems
> > and the deadline scheduler.  As you can see JFS is kicking butt on this
> > test.
> 
> One thing that is important for Lustre is performance of EAs.  See
> http://samba.org/~tridge/xattr_results/ for a comparison.  Lustre
> uses large inodes (-I 256 or larger) to store the EAs efficiently.

This is of importance for only the metadata backend, or for OSTs as
well?

> > Next I used pgbench to test parallel random I/O.  pgbench has
> > configurable number of clients and transactions per client, and can
> > change the size of its database.  I used a database of 100 million
> > tuples (scale factor 1000).  I times 100,000 transactions on each
> > filesystem, with 10 and 100 clients per run.  Figures are in
> > transactions per second.
> > 
> >               ext3  jfs  xfs
> > 10 Clients      55   81   68
> > 100 Clients     61  100   64
> > 
> > Here XFS is not substantially faster but JFS continues to lead.  
> > 
> > JFS is roughly 60% faster than ext3 on pgbench and 40-70% faster on
> > bonnie++ linear I/O.
> 
> This is a bit surprising, I've never heard JFS as a leader in many
> performance tests.  Is pgbench at all related to dbench?  The problem
> with dbench is that for cases where the filesystem does no IO at all
> it reports a best result.  In real life the data has to make it to
> disk at some point.

pgbench comes in postgresql's contrib.  Believe me, the filesystem does
plenty of I/O.  It sustains roughly 600 iops for 15-20 minutes.  The
"scale factor of 1000" means pgbench is using a database with 100
million tuples, or about 16GB of data.  The entire run uses up only
about 2 minutes of CPU time.  

> 
> See http://sudhaa.com/~benchmark/ext3/newtiobenchresults.ext3gold/newtiobench/newtiobench.html
> for a comparison of ext3, xfs, jfs in the mode that Lustre runs in
> (specifically column 7, 14, 18).
> 
> > Are there any tunables that I might want to adjust to get better
> > performance from ext3?
> 
> Try creating your ext3 filesystem with a larger journal, as Lustre does:
> 
> mkfs -J size=400 ...
> 
> size is in MB, 400 might be excessive for your setup - I'd be interested
> in hearing where the "sweet spot" is for journal size.  The latest e2fsprogs
> use 128MB as the largest default size (up from 32MB) for large filesystems.

I intend to run many more benchmarks using various ext3 mount options.
I'll make sure to modulate the journal size as well.  However, it is my
impression that mballoc/delalloc/extents will be of use mainly to
workloads like tarring and untarring a large archive.  For linear reads
of one giant file, will these mount options make any difference?

Regards,
Jeffrey


From sonny at burdell.org  Thu Jul 14 23:49:29 2005
From: sonny at burdell.org (Sonny Rao)
Date: Thu, 14 Jul 2005 19:49:29 -0400
Subject: a comparison of ext3, jfs, and xfs on hardware raid
In-Reply-To: <1121367375.20950.64.camel@toonses.gghcwest.com>
References: <1121299946.20950.26.camel@toonses.gghcwest.com>
	<20050714183330.GN5335@schatzie.adilger.int>
	<1121367375.20950.64.camel@toonses.gghcwest.com>
Message-ID: <20050714234929.GA27538@kevlar.burdell.org>

On Thu, Jul 14, 2005 at 11:56:15AM -0700, Jeffrey W. Baker wrote:
<snip> 
> I intend to run many more benchmarks using various ext3 mount options.
> I'll make sure to modulate the journal size as well.  However, it is my
> impression that mballoc/delalloc/extents will be of use mainly to
> workloads like tarring and untarring a large archive.  For linear reads
> of one giant file, will these mount options make any difference?

The difference they will make will be in terms of file layout, because
they will give you better layout during creation which will give you
higher sustained throughput during your linear reads.  

Check out the ext2-devel mailing list back in Feb-March of this year
for some benchmark info about the difference these options make on
sequential read/write tests.

Sonny


From jwbaker at acm.org  Sat Jul 16 17:37:52 2005
From: jwbaker at acm.org (Jeffrey W. Baker)
Date: Sat, 16 Jul 2005 10:37:52 -0700
Subject: a comparison of ext3, jfs, and xfs on hardware raid
In-Reply-To: <20050714183330.GN5335@schatzie.adilger.int>
References: <1121299946.20950.26.camel@toonses.gghcwest.com>
	<20050714183330.GN5335@schatzie.adilger.int>
Message-ID: <1121535472.7101.37.camel@noodles>

On Thu, 2005-07-14 at 12:33 -0600, Andreas Dilger wrote:
> On Jul 13, 2005  17:12 -0700, Jeffrey W. Baker wrote:
...
> > The system is using an Areca hardware raid controller with 5 7200RPM
> > SATA disks.  The RAID controller has 128MB of cache and the disks each
> > have 8MB.  The cache is write-back.  The system is Linux 2.6.12 on amd64
> > with 1GB system memory.
...
> > Next I used pgbench to test parallel random I/O.  pgbench has
> > configurable number of clients and transactions per client, and can
> > change the size of its database.  I used a database of 100 million
> > tuples (scale factor 1000).  I times 100,000 transactions on each
> > filesystem, with 10 and 100 clients per run.  Figures are in
> > transactions per second.
> > 
> >               ext3  jfs  xfs
> > 10 Clients      55   81   68
> > 100 Clients     61  100   64
> > 
> > Here XFS is not substantially faster but JFS continues to lead.  
> > 
> > JFS is roughly 60% faster than ext3 on pgbench and 40-70% faster on
> > bonnie++ linear I/O.
> 
> This is a bit surprising, I've never heard JFS as a leader in many
> performance tests.  Is pgbench at all related to dbench?  The problem
> with dbench is that for cases where the filesystem does no IO at all
> it reports a best result.  In real life the data has to make it to
> disk at some point.
...
> Try creating your ext3 filesystem with a larger journal, as Lustre does:
> 
> mkfs -J size=400 ...
> 
> size is in MB, 400 might be excessive for your setup - I'd be interested
> in hearing where the "sweet spot" is for journal size.  The latest e2fsprogs
> use 128MB as the largest default size (up from 32MB) for large filesystems.

The journal size doesn't seem to make any difference to pgbench, except
that 256MB seems to be the worst.  400MB and 32MB are roughly equal on
the pgbench workload.  400MB was the optimal journal size on the bonnie
++ workload.  

Perhaps it is silly to benchmark a database with its journal files on a
journalling filesystem, but here is the result.

  journal          pgbench tps         bonnie++ MB/s
--------------------------------------------------------
size |  mode   |  1  | 10 | 100 | write | rewrite | read
--------------------------------------------------------
32     journal                      57      35      112
32     ordered   28    51    57     83      33      101
32    writeback  34    70    88     57      31      103
64     journal                      55      33      113
64     ordered   29    52    61     84      33      100
64    writeback  32    69    87     59      31      100
128    journal                      52      33      109
128    ordered   32    54    62     86      34      102
128   writeback  34    70    88     61      32      102
256    journal                      54      30      110
256    ordered   28    51    60     90      34      106
256   writeback  29    64    79     59      31      104
400    journal                      52      28      108
400    ordered   26    49    59     89      33      104
400   writeback  32    70    87     60      32      101
---     ext2                105    118      32      107


-jwb


From bunk at stusta.de  Tue Jul 19 14:15:25 2005
From: bunk at stusta.de (Adrian Bunk)
Date: Tue, 19 Jul 2005 16:15:25 +0200
Subject: [2.6 patch] fs/jbd/: cleanups
Message-ID: <20050719141525.GJ5031@stusta.de>

This patch contains the following cleanups:
- make needlessly global functions static
- journal.c: remove the unused global function __journal_internal_check
             and move the check to journal_init
- remove the following write-only global variable:
  - journal.c: current_journal
- remove the following unneeded EXPORT_SYMBOL:
  - journal.c: journal_recover

Signed-off-by: Adrian Bunk <bunk at stusta.de>

---

 fs/jbd/journal.c    |   34 ++++++++++++++--------------------
 fs/jbd/revoke.c     |    3 ++-
 include/linux/jbd.h |    1 -
 3 files changed, 16 insertions(+), 22 deletions(-)

--- linux-2.6.12-rc6-mm1-full/include/linux/jbd.h.old	2005-06-14 03:58:20.000000000 +0200
+++ linux-2.6.12-rc6-mm1-full/include/linux/jbd.h	2005-06-14 04:00:56.000000000 +0200
@@ -914,7 +912,6 @@
 extern int	   journal_skip_recovery	(journal_t *);
 extern void	   journal_update_superblock	(journal_t *, int);
 extern void	   __journal_abort_hard	(journal_t *);
-extern void	   __journal_abort_soft	(journal_t *, int);
 extern void	   journal_abort      (journal_t *, int);
 extern int	   journal_errno      (journal_t *);
 extern void	   journal_ack_err    (journal_t *);
--- linux-2.6.12-rc6-mm1-full/fs/jbd/revoke.c.old	2005-06-14 03:58:36.000000000 +0200
+++ linux-2.6.12-rc6-mm1-full/fs/jbd/revoke.c	2005-06-14 03:58:41.000000000 +0200
@@ -116,7 +116,8 @@
 		(block << (hash_shift - 12))) & (table->hash_size - 1);
 }
 
-int insert_revoke_hash(journal_t *journal, unsigned long blocknr, tid_t seq)
+static int insert_revoke_hash(journal_t *journal, unsigned long blocknr,
+			      tid_t seq)
 {
 	struct list_head *hash_list;
 	struct jbd_revoke_record_s *record;

--- linux-2.6.13-rc3-mm1-full/fs/jbd/journal.c.old	2005-07-19 15:53:16.000000000 +0200
+++ linux-2.6.13-rc3-mm1-full/fs/jbd/journal.c	2005-07-19 15:53:39.000000000 +0200
@@ -65,7 +65,6 @@ EXPORT_SYMBOL(journal_set_features);
 EXPORT_SYMBOL(journal_create);
 EXPORT_SYMBOL(journal_load);
 EXPORT_SYMBOL(journal_destroy);
-EXPORT_SYMBOL(journal_recover);
 EXPORT_SYMBOL(journal_update_superblock);
 EXPORT_SYMBOL(journal_abort);
 EXPORT_SYMBOL(journal_errno);
@@ -81,6 +80,7 @@ EXPORT_SYMBOL(journal_try_to_free_buffer
 EXPORT_SYMBOL(journal_force_commit);
 
 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
+static void __journal_abort_soft (journal_t *journal, int errno);
 
 /*
  * Helper function used to manage commit timeouts
@@ -93,16 +93,6 @@ static void commit_timeout(unsigned long
 	wake_up_process(p);
 }
 
-/* Static check for data structure consistency.  There's no code
- * invoked --- we'll just get a linker failure if things aren't right.
- */
-void __journal_internal_check(void)
-{
-	extern void journal_bad_superblock_size(void);
-	if (sizeof(struct journal_superblock_s) != 1024)
-		journal_bad_superblock_size();
-}
-
 /*
  * kjournald: The main thread function used to manage a logging device
  * journal.
@@ -119,16 +109,12 @@ void __journal_internal_check(void)
  *    known as checkpointing, and this thread is responsible for that job.
  */
 
-journal_t *current_journal;		// AKPM: debug
-
-int kjournald(void *arg)
+static int kjournald(void *arg)
 {
 	journal_t *journal = (journal_t *) arg;
 	transaction_t *transaction;
 	struct timer_list timer;
 
-	current_journal = journal;
-
 	daemonize("kjournald");
 
 	/* Set up an interval timer which can be used to trigger a
@@ -1441,7 +1427,7 @@ int journal_wipe(journal_t *journal, int
  * device this journal is present.
  */
 
-const char *journal_dev_name(journal_t *journal, char *buffer)
+static const char *journal_dev_name(journal_t *journal, char *buffer)
 {
 	struct block_device *bdev;
 
@@ -1487,7 +1473,7 @@ void __journal_abort_hard(journal_t *jou
 
 /* Soft abort: record the abort error status in the journal superblock,
  * but don't do any other IO. */
-void __journal_abort_soft (journal_t *journal, int errno)
+static void __journal_abort_soft (journal_t *journal, int errno)
 {
 	if (journal->j_flags & JFS_ABORT)
 		return;
@@ -1890,7 +1876,7 @@ EXPORT_SYMBOL(journal_enable_debug);
 
 static struct proc_dir_entry *proc_jbd_debug;
 
-int read_jbd_debug(char *page, char **start, off_t off,
+static int read_jbd_debug(char *page, char **start, off_t off,
 			  int count, int *eof, void *data)
 {
 	int ret;
@@ -1900,7 +1886,7 @@ int read_jbd_debug(char *page, char **st
 	return ret;
 }
 
-int write_jbd_debug(struct file *file, const char __user *buffer,
+static int write_jbd_debug(struct file *file, const char __user *buffer,
 			   unsigned long count, void *data)
 {
 	char buf[32];
@@ -1989,6 +1975,14 @@ static int __init journal_init(void)
 {
 	int ret;
 
+/* Static check for data structure consistency.  There's no code
+ * invoked --- we'll just get a linker failure if things aren't right.
+ */
+	extern void journal_bad_superblock_size(void);
+	if (sizeof(struct journal_superblock_s) != 1024)
+		journal_bad_superblock_size();
+
+
 	ret = journal_init_caches();
 	if (ret != 0)
 		journal_destroy_caches();


From adilger at clusterfs.com  Wed Jul 20 15:24:15 2005
From: adilger at clusterfs.com (Andreas Dilger)
Date: Wed, 20 Jul 2005 11:24:15 -0400
Subject: [2.6 patch] fs/jbd/: cleanups
In-Reply-To: <20050719141525.GJ5031@stusta.de>
References: <20050719141525.GJ5031@stusta.de>
Message-ID: <20050720152415.GA6704@schatzie.adilger.int>

On Jul 19, 2005  16:15 +0200, Adrian Bunk wrote:
> This patch contains the following cleanups:
> - make needlessly global functions static
> - journal.c: remove the unused global function __journal_internal_check
>              and move the check to journal_init
> - remove the following write-only global variable:
>   - journal.c: current_journal
> - remove the following unneeded EXPORT_SYMBOL:
>   - journal.c: journal_recover
> 
> Signed-off-by: Adrian Bunk <bunk at stusta.de>
Signed-off-by: Andreas Dilger <adilger at clusterfs.com>

> ---
> 
>  fs/jbd/journal.c    |   34 ++++++++++++++--------------------
>  fs/jbd/revoke.c     |    3 ++-
>  include/linux/jbd.h |    1 -
>  3 files changed, 16 insertions(+), 22 deletions(-)
> 
> --- linux-2.6.12-rc6-mm1-full/include/linux/jbd.h.old	2005-06-14 03:58:20.000000000 +0200
> +++ linux-2.6.12-rc6-mm1-full/include/linux/jbd.h	2005-06-14 04:00:56.000000000 +0200
> @@ -914,7 +912,6 @@
>  extern int	   journal_skip_recovery	(journal_t *);
>  extern void	   journal_update_superblock	(journal_t *, int);
>  extern void	   __journal_abort_hard	(journal_t *);
> -extern void	   __journal_abort_soft	(journal_t *, int);
>  extern void	   journal_abort      (journal_t *, int);
>  extern int	   journal_errno      (journal_t *);
>  extern void	   journal_ack_err    (journal_t *);
> --- linux-2.6.12-rc6-mm1-full/fs/jbd/revoke.c.old	2005-06-14 03:58:36.000000000 +0200
> +++ linux-2.6.12-rc6-mm1-full/fs/jbd/revoke.c	2005-06-14 03:58:41.000000000 +0200
> @@ -116,7 +116,8 @@
>  		(block << (hash_shift - 12))) & (table->hash_size - 1);
>  }
>  
> -int insert_revoke_hash(journal_t *journal, unsigned long blocknr, tid_t seq)
> +static int insert_revoke_hash(journal_t *journal, unsigned long blocknr,
> +			      tid_t seq)
>  {
>  	struct list_head *hash_list;
>  	struct jbd_revoke_record_s *record;
> 
> --- linux-2.6.13-rc3-mm1-full/fs/jbd/journal.c.old	2005-07-19 15:53:16.000000000 +0200
> +++ linux-2.6.13-rc3-mm1-full/fs/jbd/journal.c	2005-07-19 15:53:39.000000000 +0200
> @@ -65,7 +65,6 @@ EXPORT_SYMBOL(journal_set_features);
>  EXPORT_SYMBOL(journal_create);
>  EXPORT_SYMBOL(journal_load);
>  EXPORT_SYMBOL(journal_destroy);
> -EXPORT_SYMBOL(journal_recover);
>  EXPORT_SYMBOL(journal_update_superblock);
>  EXPORT_SYMBOL(journal_abort);
>  EXPORT_SYMBOL(journal_errno);
> @@ -81,6 +80,7 @@ EXPORT_SYMBOL(journal_try_to_free_buffer
>  EXPORT_SYMBOL(journal_force_commit);
>  
>  static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
> +static void __journal_abort_soft (journal_t *journal, int errno);
>  
>  /*
>   * Helper function used to manage commit timeouts
> @@ -93,16 +93,6 @@ static void commit_timeout(unsigned long
>  	wake_up_process(p);
>  }
>  
> -/* Static check for data structure consistency.  There's no code
> - * invoked --- we'll just get a linker failure if things aren't right.
> - */
> -void __journal_internal_check(void)
> -{
> -	extern void journal_bad_superblock_size(void);
> -	if (sizeof(struct journal_superblock_s) != 1024)
> -		journal_bad_superblock_size();
> -}
> -
>  /*
>   * kjournald: The main thread function used to manage a logging device
>   * journal.
> @@ -119,16 +109,12 @@ void __journal_internal_check(void)
>   *    known as checkpointing, and this thread is responsible for that job.
>   */
>  
> -journal_t *current_journal;		// AKPM: debug
> -
> -int kjournald(void *arg)
> +static int kjournald(void *arg)
>  {
>  	journal_t *journal = (journal_t *) arg;
>  	transaction_t *transaction;
>  	struct timer_list timer;
>  
> -	current_journal = journal;
> -
>  	daemonize("kjournald");
>  
>  	/* Set up an interval timer which can be used to trigger a
> @@ -1441,7 +1427,7 @@ int journal_wipe(journal_t *journal, int
>   * device this journal is present.
>   */
>  
> -const char *journal_dev_name(journal_t *journal, char *buffer)
> +static const char *journal_dev_name(journal_t *journal, char *buffer)
>  {
>  	struct block_device *bdev;
>  
> @@ -1487,7 +1473,7 @@ void __journal_abort_hard(journal_t *jou
>  
>  /* Soft abort: record the abort error status in the journal superblock,
>   * but don't do any other IO. */
> -void __journal_abort_soft (journal_t *journal, int errno)
> +static void __journal_abort_soft (journal_t *journal, int errno)
>  {
>  	if (journal->j_flags & JFS_ABORT)
>  		return;
> @@ -1890,7 +1876,7 @@ EXPORT_SYMBOL(journal_enable_debug);
>  
>  static struct proc_dir_entry *proc_jbd_debug;
>  
> -int read_jbd_debug(char *page, char **start, off_t off,
> +static int read_jbd_debug(char *page, char **start, off_t off,
>  			  int count, int *eof, void *data)
>  {
>  	int ret;
> @@ -1900,7 +1886,7 @@ int read_jbd_debug(char *page, char **st
>  	return ret;
>  }
>  
> -int write_jbd_debug(struct file *file, const char __user *buffer,
> +static int write_jbd_debug(struct file *file, const char __user *buffer,
>  			   unsigned long count, void *data)
>  {
>  	char buf[32];
> @@ -1989,6 +1975,14 @@ static int __init journal_init(void)
>  {
>  	int ret;
>  
> +/* Static check for data structure consistency.  There's no code
> + * invoked --- we'll just get a linker failure if things aren't right.
> + */
> +	extern void journal_bad_superblock_size(void);
> +	if (sizeof(struct journal_superblock_s) != 1024)
> +		journal_bad_superblock_size();
> +
> +
>  	ret = journal_init_caches();
>  	if (ret != 0)
>  		journal_destroy_caches();

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20050720/77f1e6b1/attachment.sig>

From tuttle at bbs.fsik.cvut.cz  Wed Jul 20 16:42:59 2005
From: tuttle at bbs.fsik.cvut.cz (Vlada Macek)
Date: Wed, 20 Jul 2005 18:42:59 +0200
Subject: ext3 nodump attribute inheritance
Message-ID: <42DE7F13.40503@bbs.cvut.cz>

Hi,

in the past I was considering the ways to back up files in my Linux
home box. The important part of such thinking is how to set exclusion
paths wisely. I finally found out that setting ext2/ext3 nodump
attribute using chattr for files, but mainly directories, suits me
best. Setting the match (regexp) lists for the backup script
exetrnally seems sub-optimal to me, since my dump/nodump data
arrangement is not stable in time.

I started using Schily's star because of its multiple advantages over
the GNU tar, true incremental dumps and honoring the nodump ext2/ext3
flag among others.

But then I noticed what I call an unfortunate feature of ext2/ext3.
The nodump flag of the directory is inherited by every new
file/directory created inside that directory (the same goes for the
most of the ext2/ext3 flags). This feature would quickly wipe off my
settings made by hand! Consider for example:

~/tmp/ is nodump. When I create a file inside, it gets nodump too.
When later this file develops into something useful and deserves to
stay under ~/myprogs/ for example, I'll move it, but it still carries
the nodump flag and therefore wont be dumped.

I googled for a while for other users experience with this feature,
but it seems to me the nodump flag is not used much or the people feel
ok or are unaware.

For myself and for now, I solved this problem with the following
oneline patch against kernel 2.6.8, ialloc.c/ext3_new_inode():

--- kernel-source-2.6.8-orig/fs/ext3/ialloc.c   2004-08-14
07:36:58.000000000 +0200
+++ kernel-source-2.6.8/fs/ext3/ialloc.c        2005-07-19
11:20:36.000000000 +0200
@@ -566,9 +566,9 @@
        ei->i_next_alloc_goal = 0;
        ei->i_dir_start_lookup = 0;
        ei->i_disksize = 0;
 
-       ei->i_flags = EXT3_I(dir)->i_flags & ~EXT3_INDEX_FL;
+       ei->i_flags = EXT3_I(dir)->i_flags & ~(EXT3_INDEX_FL |
EXT3_NODUMP_FL);
        if (S_ISLNK(mode))
                ei->i_flags &= ~(EXT3_IMMUTABLE_FL|EXT3_APPEND_FL);
        /* dirsync only applies to directories */
        if (!S_ISDIR(mode))

Now my new files and dirs do not inherit nodump flag from their parent
dir anymore. Of course, however tiny this change is, I rate is so
useful for others, that I'm going out with it.

Changing the filesystem behaviour for everyone is of course
problematic. I do not know whether there are (or will be in the
future) another flags deserving selective non-inheritance. So this is
my generalized idea: To leave the present behaviour as default and let
user configure the mask of flags that should not be inherited from
parent dirs via for example tune2fs in the way of chattr. Maybe like
this syntax:

    tune2fs -I+-=[ASacDdIijsTtu]

where -I is the new option meaning Inheritance.

What you all think? Would you be satisfield with the nodump flag
inheritance in case you would use the flag for your backups. Am I alone?

Thanks in advance,

-- 

\//\/\
(Sometimes credited as 1494 F8DD 6379 4CD7 E7E3 1FC9 D750 4243 1F05 9424.)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 256 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20050720/a64dc3f0/attachment.sig>

From poiqwepoi at gmail.com  Sat Jul 23 02:14:20 2005
From: poiqwepoi at gmail.com (ESM)
Date: Fri, 22 Jul 2005 22:14:20 -0400
Subject: Recovering lost file...
Message-ID: <38b74966050722191410c04670@mail.gmail.com>

Here goes the story...

I was manipulating a picture (jpg)  with gimp and mistakingly saved
the file instead of doing a save as.   I assume that what gimp did was
to write the new file then ulink the old one.   Now I found the adress
where the original file starts and I cand read the first 8192 bytes of
it. I found it by searching for the dateand time specified in the jpeg
header which did not change from the original file to the new.    Is
there a way to find the missing blocks?

Thanks in advance.


From tom at thesnail.org  Mon Jul 25 23:50:02 2005
From: tom at thesnail.org (Tom Coleman)
Date: Tue, 26 Jul 2005 09:50:02 +1000
Subject: [Fwd: e2fsck Segmentation Fault]
Message-ID: <1122335402.10813.2.camel@kofi>

Hi,

Somehow I've managed to get e2fsck to seg fault.. The filesystem in
question started acting very strangely (e.g. filenames changing from
music to MuSiC etc) so I rebooted, and since when fsck has crashed every
time it has been run.

I'm not really sure what any of this means, so I didn't know what
debugging output to include, but below is the output of e2fsck.
(I would have attached the results of dumpe2fs, but the mailing list complained).

Thanks for any help; let me know any more information I can provide 
(I am running debian unstable with a new (2.6.11-ac) kernel (although it
segfaulted on an older kernel too))

e2fsck output:
e2fsck 1.38 (30-Jun-2005)
/dev/hde1 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Root inode is not a directory.  Clear? yes

Pass 2: Checking directory structure
Entry '..' in ??? (1785857) has deleted/unused inode 12.  Clear? yes

Missing '..' in directory inode 2162689.
Fix? yes

Entry '..' in ... (2162689) has deleted/unused inode 2.  Clear? yes

Missing '..' in directory inode 3129345.
Fix? yes

Entry '..' in ... (3129345) has deleted/unused inode 2.  Clear? yes

Entry '..' in ??? (5931009) has deleted/unused inode 12.  Clear? yes

Missing '..' in directory inode 8601601.
Fix? yes

Entry '..' in ... (8601601) has deleted/unused inode 2.  Clear? yes

Pass 3: Checking directory connectivity
Root inode not allocated.  Allocate? yes

Unconnected directory inode 5931009 (...)
Connect to /lost+found? yes

/lost+found not found.  Create? yes

Unconnected directory inode 1785857 (...)
Connect to /lost+found? yes

Unconnected directory inode 2162689 (...)
Connect to /lost+found? yes

Unconnected directory inode 3129345 (...)
Connect to /lost+found? yes

Unconnected directory inode 8601601 (...)
Connect to /lost+found? yes

Pass 4: Checking reference counts
i_file_acl for inode 11 (...) is 536879104, should be zero.
Clear? yes

i_faddr for inode 11 (...) is 536879104, should be zero.
Clear? yes

i_fsize for inode 11 (...) is 32, should be zero.
Clear? yes

Segmentation fault


From adilger at clusterfs.com  Tue Jul 26 07:30:39 2005
From: adilger at clusterfs.com (Andreas Dilger)
Date: Tue, 26 Jul 2005 01:30:39 -0600
Subject: [Fwd: e2fsck Segmentation Fault]
In-Reply-To: <1122335402.10813.2.camel@kofi>
References: <1122335402.10813.2.camel@kofi>
Message-ID: <20050726073038.GV6126@schatzie.adilger.int>

On Jul 26, 2005  09:50 +1000, Tom Coleman wrote:
> Somehow I've managed to get e2fsck to seg fault.. The filesystem in
> question started acting very strangely (e.g. filenames changing from
> music to MuSiC etc) so I rebooted, and since when fsck has crashed every
> time it has been run.

It appears you are getting single-bit errors, either from your RAM, cable
or internal to the drive.

> Thanks for any help; let me know any more information I can provide 
> (I am running debian unstable with a new (2.6.11-ac) kernel (although it
> segfaulted on an older kernel too))
> 
> e2fsck output:
> e2fsck 1.38 (30-Jun-2005)
> /dev/hde1 contains a file system with errors, check forced.
> Pass 1: Checking inodes, blocks, and sizes
> Root inode is not a directory.  Clear? yes

This might be interesting to look at, if only to prove the single-bit
error theory.  If you start debugfs /dev/hde1, and "stat <2>" it should
show what is wrong with the root directory, as will "stat <12>".  It
may well be that they are just corrupted outright, hard to say.

> Pass 2: Checking directory structure
> Entry '..' in ??? (1785857) has deleted/unused inode 12.  Clear? yes
> 
> Missing '..' in directory inode 2162689.
> Fix? yes
> 
> Entry '..' in ... (2162689) has deleted/unused inode 2.  Clear? yes
> 
> Missing '..' in directory inode 3129345.
> Fix? yes
> 
> Entry '..' in ... (3129345) has deleted/unused inode 2.  Clear? yes
> 
> Entry '..' in ??? (5931009) has deleted/unused inode 12.  Clear? yes
> 
> Missing '..' in directory inode 8601601.
> Fix? yes
> 
> Entry '..' in ... (8601601) has deleted/unused inode 2.  Clear? yes
> 
> Pass 3: Checking directory connectivity
> Root inode not allocated.  Allocate? yes
> 
> Unconnected directory inode 5931009 (...)
> Connect to /lost+found? yes
> 
> /lost+found not found.  Create? yes
> 
> Unconnected directory inode 1785857 (...)
> Connect to /lost+found? yes
> 
> Unconnected directory inode 2162689 (...)
> Connect to /lost+found? yes
> 
> Unconnected directory inode 3129345 (...)
> Connect to /lost+found? yes
> 
> Unconnected directory inode 8601601 (...)
> Connect to /lost+found? yes
> 
> Pass 4: Checking reference counts
> i_file_acl for inode 11 (...) is 536879104, should be zero.
> Clear? yes
>
> i_faddr for inode 11 (...) is 536879104, should be zero.
> Clear? yes
> 
> i_fsize for inode 11 (...) is 32, should be zero.
> Clear? yes

These also appear to be single bit errors, 0x20002000 or 0x20.

> Segmentation fault

If you compile a new e2fsck (with -g) and run it under gdb it will tell
you what is going wrong.  Up until here there are only a couple of minor
errors, with / and lost+found.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


From adilger at clusterfs.com  Tue Jul 26 07:37:18 2005
From: adilger at clusterfs.com (Andreas Dilger)
Date: Tue, 26 Jul 2005 01:37:18 -0600
Subject: [Fwd: e2fsck Segmentation Fault]
In-Reply-To: <1122335402.10813.2.camel@kofi>
References: <1122335402.10813.2.camel@kofi>
Message-ID: <20050726073718.GX6126@schatzie.adilger.int>

On Jul 26, 2005  09:50 +1000, Tom Coleman wrote:
> Somehow I've managed to get e2fsck to seg fault.. The filesystem in
> question started acting very strangely (e.g. filenames changing from
> music to MuSiC etc) so I rebooted, and since when fsck has crashed every
> time it has been run.

Oh, see also (just posted to ext2-devel):
http://thunk.org/hg/e2fsprogs/?cmd=changeset;node=0502b63a5be9cb490c0c9086fa05edc1b1712a78

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


From tom at thesnail.org  Wed Jul 27 04:21:32 2005
From: tom at thesnail.org (Tom Coleman)
Date: Wed, 27 Jul 2005 14:21:32 +1000
Subject: [Fwd: e2fsck Segmentation Fault]
In-Reply-To: <20050726073718.GX6126@schatzie.adilger.int>
References: <1122335402.10813.2.camel@kofi>
	<20050726073718.GX6126@schatzie.adilger.int>
Message-ID: <1122438092.10820.1.camel@kofi>

Bingo. I tried e2fsprogs-1.35 and there was no seg fault.

Thanks heaps for the help. 

P.S I think you were right and the RAM was screwed too.

On Tue, 2005-07-26 at 01:37 -0600, Andreas Dilger wrote:
> On Jul 26, 2005  09:50 +1000, Tom Coleman wrote:
> > Somehow I've managed to get e2fsck to seg fault.. The filesystem in
> > question started acting very strangely (e.g. filenames changing from
> > music to MuSiC etc) so I rebooted, and since when fsck has crashed every
> > time it has been run.
> 
> Oh, see also (just posted to ext2-devel):
> http://thunk.org/hg/e2fsprogs/?cmd=changeset;node=0502b63a5be9cb490c0c9086fa05edc1b1712a78
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Software Engineer
> Cluster File Systems, Inc.
> 
> 


From mag.andersen at gmail.com  Wed Jul 27 21:05:34 2005
From: mag.andersen at gmail.com (Magnus Andersen)
Date: Wed, 27 Jul 2005 17:05:34 -0400
Subject: high context switching and high load averages slowing down system
Message-ID: <5ea1658405072714052ad381d2@mail.gmail.com>

Hi All,

I have a HP DL 580 with 4 3 GHz CPUs and 4 GB RAM.  I'm running Oracle
on it.  Throughout the day I am getting high load averages (6 - 18)
and at the same time I see context switching go over 300,000. 
Sometimes over 500,000.  This is slowing the system down to a crawl.

My OS is RHEL 3 AS Update 4 with the 2.4.21-32.0.1.ELsmp kernel.

Any ideas on why this is happening and how to fix it?

Thanks in advance,
-- 
Magnus Andersen
Systems Administrator / Oracle DBA
Walker & Associates, Inc.


From theman at josephdwagner.info  Wed Jul 27 22:01:11 2005
From: theman at josephdwagner.info (Joseph D. Wagner)
Date: Wed, 27 Jul 2005 17:01:11 -0500
Subject: high context switching and high load averages slowing down
	system
In-Reply-To: <5ea1658405072714052ad381d2@mail.gmail.com>
Message-ID: <48vksc$18b7ab4@mxip19a.cluster1.charter.net>

This is the EXT3 File System mailing list.  I don't mean to be rude; it's just that you may get better answers to your questions on another mailing list.

> Any ideas on why this is happening and how to fix it?

Off the top of my head, it sounds like your system is thrashing.  Is there some out-of-control process hogging all the memory?

Some links I found on google.com include:

http://www.unix.org.ua/orelly/oracle/guide8i/ch05_01.htm

The 2.6 kernel has better multitasking capabilities.  You may want to try building a custom kernel with SMP and 4 GB High Memory with Preemptive Multitasking turned on.

But like I said, this is all off the top of my head.

Joseph D. Wagner


From drakoulegonas at kolasi.gr  Mon Jul 25 12:24:19 2005
From: drakoulegonas at kolasi.gr (Professor Stafylopaths)
Date: Mon, 25 Jul 2005 15:24:19 +0300
Subject: Strange corruption (?) problem
Message-ID: <1122294259.42e4d9f34b277@webmail.teilam.gr>


Hello,

I seem to have a strange problem with an ext3 fs partition.
Whenever I transfer several files to this partition and compare
the md5 and sha1 sums with the originals don't match. Seems like
this takes place only with large files (~700MB). I am using vanilla
2.4.27 linux kernel (haven't seen any significant changelog entries
through the latest kernel that might be relevant to my problem).
I've also run e2fsck -C -v -c -f as well as on the partition which didn't
reveal any problems at all.
This partition was corrupted somehow in the past, but was completely
recovered by fsck, although probably it now makes use of a backup
superblock (when I try to mount the partition without any parameters
the kernel mounts it as ext2 with the warning "EXT2-fs warning (device
ide0(3,65)): ext2_read_super: mounting ext3 filesystem as ext2", so I always
specify the fs type by -t ext3 parameter). I don't know whether this is relevant
to the current problem.

Any ideas?