From tytso at mit.edu  Sat Apr  1 02:41:47 2006
From: tytso at mit.edu (Theodore Ts'o)
Date: Fri, 31 Mar 2006 21:41:47 -0500
Subject: [RFC] mke2fs with DIR_INDEX, RESIZE_INODE by default
In-Reply-To: <1142891937.21593.47.camel@orbit.scot.redhat.com>
References: <20060317075312.GG30801@schatzie.adilger.int>
	<1142634418.3641.62.camel@orbit.scot.redhat.com>
	<20060317143630.300d82f8.akpm@osdl.org>
	<20060318084302.GX30801@schatzie.adilger.int>
	<1142878786.3414.27.camel@orbit.scot.redhat.com>
	<20060320211401.GG6199@schatzie.adilger.int>
	<1142891937.21593.47.camel@orbit.scot.redhat.com>
Message-ID: <20060401024147.GA24163@thunk.org>

On Mon, Mar 20, 2006 at 04:58:57PM -0500, Stephen C. Tweedie wrote:
> 
> I think we're probably at the right point to do so.  Most people who are
> most likely to be affected have a reasonably recent e2fsprogs now.  On
> the Fedora side I'm seeing very few reports of people bitten by
> e2fsprogs incompatibility, and more and more instances of people bitten
> the other way by filesystems not performing as well as expected due to
> missing dir_index flags.
> 

In case some people haven't noticed, a few days ago I released a new
e2fsprogs release for e2fsprogs 1.39.  New in this release is a way
for distributions and system administrators to control the default
filesystem features via the /etc/mke2fs.conf file.  

In the pre-release version, mke2fs is still using the same behaviour
as before, but my plan is to change mke2fs to create filesystems with
the dir_index and resize_inode features by default.  People who don't
like this default can always edit mke2fs.conf and change things back.

						- Ted


From HuntressGB at Npt.NUWC.Navy.Mil  Sat Apr  1 14:53:11 2006
From: HuntressGB at Npt.NUWC.Navy.Mil (Huntress Gary B NPRI)
Date: Sat, 01 Apr 2006 09:53:11 -0500
Subject: Tuning for large number of directory entries?
Message-ID: <7F93C0D0C6D8454B9B05720F713A09F138D941@ldap.npt.nuwc.navy.mil>


I have been running a public MySQL server for over 4 years.  The system was a 1GHz box with 256MB of RAM.  MySQL puts each database into a seperate directory in a single "data" directory.   Once the old system reached about 10K databases the connection times increased 10X or more.  I attributed the speed problems to a possible filesystem limitation with a large number of files.  That system ran RH9 and either had ext2 or ext3 with no htree patch. 

Recently I bought a new 2.4GHz server with 1GB RAM and much faster drives.  I installed FC4 and am running a 2.6.14 kernel.  I thought that even if my problems were not completely solved, I would at least not see connection times increase until I had many more directory entries (wild guess - 20K).

Since I am writing, you can guess that is not the case.  At 13K directory entries, I am still seeing significantly slower connection times even with the faster hardware and newer software.  

I'm limited in what I can move because MySQL expects everyting in the "data" directory.

My questions are:  1)  I don't know much about htree, but I recall that it is supposed to help in this situation.  How do I tell if it is in use on my system. Is it a kernel module or is it compiled into ext3?   2)  Are there other options for tuneing ext3 performance?

Thanks everyone,

Gary Huntress


From adilger at clusterfs.com  Sat Apr  1 18:11:40 2006
From: adilger at clusterfs.com (Andreas Dilger)
Date: Sat, 1 Apr 2006 11:11:40 -0700
Subject: Tuning for large number of directory entries?
In-Reply-To: <7F93C0D0C6D8454B9B05720F713A09F138D941@ldap.npt.nuwc.navy.mil>
References: <7F93C0D0C6D8454B9B05720F713A09F138D941@ldap.npt.nuwc.navy.mil>
Message-ID: <20060401181140.GR17364@schatzie.adilger.int>

On Apr 01, 2006  09:53 -0500, Huntress Gary B NPRI wrote:
> Since I am writing, you can guess that is not the case.  At 13K
> directory entries, I am still seeing significantly slower connection
> times even with the faster hardware and newer software.

You need to run "e2fsck -f -D" when the filesystem is unmounted, to build
htree indices for your large directories.

> My questions are:  1)  I don't know much about htree, but I recall
> that it is supposed to help in this situation.  How do I tell if it is
> in use on my system. Is it a kernel module or is it compiled into ext3?

"dumpe2fs -h <device> | grep -i features", it's part of stock 2.6 ext3.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


From hahaha_30k at yahoo.com  Sun Apr  2 06:39:27 2006
From: hahaha_30k at yahoo.com (Robinson Tiemuqinke)
Date: Sat, 1 Apr 2006 22:39:27 -0800 (PST)
Subject: [RFC] mke2fs with DIR_INDEX, RESIZE_INODE by default
In-Reply-To: <20060401024147.GA24163@thunk.org>
Message-ID: <20060402063927.26510.qmail@web36709.mail.mud.yahoo.com>

Hi,

 A stupid questions to ask: 

 How to turn on "resize_inode" feature for ext3 file
system created with old mke2fs?

  In Fedora Core 4 and Fedora Core 5 the new mke2fs
program creates file systems with "resize_inode"
feature on by default, but old file systems didn't
have the "resize_inode" feature which were created
with RH9 and Fedora core 1. 

 I can not run a "tune2fs -O resize_inode" to make old
file systems have the new feature after Linux OS
upgraded from Fedora Core 1 to Fedora Core 5, neither
can I re-create the file systems directly since I have
important data on them.

 If there is a tool to upgrade old ext3 file systems
so that they will also have "resize_inode" feature?

Thanks.


--- Theodore Ts'o <tytso at mit.edu> wrote:

> On Mon, Mar 20, 2006 at 04:58:57PM -0500, Stephen C.
> Tweedie wrote:
> > 
> > I think we're probably at the right point to do
> so.  Most people who are
> > most likely to be affected have a reasonably
> recent e2fsprogs now.  On
> > the Fedora side I'm seeing very few reports of
> people bitten by
> > e2fsprogs incompatibility, and more and more
> instances of people bitten
> > the other way by filesystems not performing as
> well as expected due to
> > missing dir_index flags.
> > 
> 
> In case some people haven't noticed, a few days ago
> I released a new
> e2fsprogs release for e2fsprogs 1.39.  New in this
> release is a way
> for distributions and system administrators to
> control the default
> filesystem features via the /etc/mke2fs.conf file.  
> 
> In the pre-release version, mke2fs is still using
> the same behaviour
> as before, but my plan is to change mke2fs to create
> filesystems with
> the dir_index and resize_inode features by default. 
> People who don't
> like this default can always edit mke2fs.conf and
> change things back.
> 
> 						- Ted
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From rmy at tigress.co.uk  Sun Apr  2 17:07:21 2006
From: rmy at tigress.co.uk (Ron Yorston)
Date: Sun, 2 Apr 2006 18:07:21 +0100 (BST)
Subject: Zeroing freed blocks
Message-ID: <200604021707.k32H7LpJ026632@tiffany.internal.tigress.co.uk>

A couple of years ago there was a discussion on lkml under the thread
'PATCH - ext2fs privacy (i.e. secure deletion) patch' about zapping
deleted data in the filesystem as a security mechanism.  The discussion
wandered off into how 'chattr +s' could be implemented and whether
encrypting filesystems wouldn't be a better solution to the problem.

I've been maintaining a simplified version of the patch for a different
reason:  to keep filesystems in files sparse.  Filesystem images for use
by things like user-mode Linux and Xen are often created as sparse files.
After they've been in use for a while their sparseness is reduced even
though they may have lots of free space.  Having the guest kernel fill
deleted blocks with zeros doesn't make the underlying file sparse,
but it does help.  I've got a page with more details:

   http://intgat.tigress.co.uk/rmy/uml/sparsify.html

Anyway, a couple of things:

1. The patch (see below) is pretty simple.  I've been using it for some
   time in UML build systems for old versions of software (rh62, anyone?),
   and today I even tried it for several seconds in a Xen domU kernel.
   It seems to do what I want, but is it any good?

2. The patch is now for ext2 only, the original ext3 version having 
   succumbed to bitrot.  What would it take to implement something
   similar for ext3 these days?

Ron

--- linux-2.6.16/Documentation/filesystems/ext2.txt.zerofree	2006-03-20 05:53:29.000000000 +0000
+++ linux-2.6.16/Documentation/filesystems/ext2.txt	2006-04-02 09:21:52.000000000 +0100
@@ -58,6 +58,8 @@ nobh				Do not attach buffer_heads to fi
 
 xip				Use execute in place (no caching) if possible
 
+zerofree			Zero data blocks when they are freed.
+
 grpquota,noquota,quota,usrquota	Quota options are silently ignored by ext2.
 
 
--- linux-2.6.16/fs/ext2/balloc.c.zerofree	2006-03-20 05:53:29.000000000 +0000
+++ linux-2.6.16/fs/ext2/balloc.c	2006-04-02 09:21:52.000000000 +0100
@@ -174,6 +174,16 @@ static void group_release_blocks(struct 
 	}
 }
 
+static inline void zero_block(struct super_block *sb, unsigned long block)
+{
+	struct buffer_head * bh;
+
+	bh = sb_getblk(sb, block);
+	memset(bh->b_data, 0, bh->b_size);
+	mark_buffer_dirty(bh);
+	brelse(bh);
+}
+
 /* Free given blocks, update quota and i_blocks field */
 void ext2_free_blocks (struct inode * inode, unsigned long block,
 		       unsigned long count)
@@ -242,6 +252,9 @@ do_more:
 				"bit already cleared for block %lu", block + i);
 		} else {
 			group_freed++;
+			if ( test_opt(sb, ZEROFREE) ) {
+				zero_block(sb, block+i);
+			}
 		}
 	}
 
--- linux-2.6.16/fs/ext2/super.c.zerofree	2006-03-20 05:53:29.000000000 +0000
+++ linux-2.6.16/fs/ext2/super.c	2006-04-02 09:21:52.000000000 +0100
@@ -289,7 +289,7 @@ enum {
 	Opt_err_ro, Opt_nouid32, Opt_nocheck, Opt_debug,
 	Opt_oldalloc, Opt_orlov, Opt_nobh, Opt_user_xattr, Opt_nouser_xattr,
 	Opt_acl, Opt_noacl, Opt_xip, Opt_ignore, Opt_err, Opt_quota,
-	Opt_usrquota, Opt_grpquota
+	Opt_usrquota, Opt_grpquota, Opt_zerofree
 };
 
 static match_table_t tokens = {
@@ -312,6 +312,7 @@ static match_table_t tokens = {
 	{Opt_oldalloc, "oldalloc"},
 	{Opt_orlov, "orlov"},
 	{Opt_nobh, "nobh"},
+	{Opt_zerofree, "zerofree"},
 	{Opt_user_xattr, "user_xattr"},
 	{Opt_nouser_xattr, "nouser_xattr"},
 	{Opt_acl, "acl"},
@@ -395,6 +396,9 @@ static int parse_options (char * options
 		case Opt_nobh:
 			set_opt (sbi->s_mount_opt, NOBH);
 			break;
+		case Opt_zerofree:
+			set_opt (sbi->s_mount_opt, ZEROFREE);
+			break;
 #ifdef CONFIG_EXT2_FS_XATTR
 		case Opt_user_xattr:
 			set_opt (sbi->s_mount_opt, XATTR_USER);
--- linux-2.6.16/include/linux/ext2_fs.h.zerofree	2006-03-20 05:53:29.000000000 +0000
+++ linux-2.6.16/include/linux/ext2_fs.h	2006-04-02 09:21:52.000000000 +0100
@@ -310,6 +310,7 @@ struct ext2_inode {
 #define EXT2_MOUNT_MINIX_DF		0x000080  /* Mimics the Minix statfs */
 #define EXT2_MOUNT_NOBH			0x000100  /* No buffer_heads */
 #define EXT2_MOUNT_NO_UID32		0x000200  /* Disable 32-bit UIDs */
+#define EXT2_MOUNT_ZEROFREE		0x000400  /* Zero freed blocks */
 #define EXT2_MOUNT_XATTR_USER		0x004000  /* Extended user attributes */
 #define EXT2_MOUNT_POSIX_ACL		0x008000  /* POSIX Access Control Lists */
 #define EXT2_MOUNT_XIP			0x010000  /* Execute in place */


From keld at dkuug.dk  Sun Apr  2 20:37:01 2006
From: keld at dkuug.dk (Keld =?iso-8859-1?Q?J=F8rn?= Simonsen)
Date: Sun, 2 Apr 2006 22:37:01 +0200
Subject: Zeroing freed blocks
In-Reply-To: <200604021707.k32H7LpJ026632@tiffany.internal.tigress.co.uk>
References: <200604021707.k32H7LpJ026632@tiffany.internal.tigress.co.uk>
Message-ID: <20060402203701.GB14104@rap.rap.dk>

On Sun, Apr 02, 2006 at 06:07:21PM +0100, Ron Yorston wrote:
> A couple of years ago there was a discussion on lkml under the thread
> 'PATCH - ext2fs privacy (i.e. secure deletion) patch' about zapping
> deleted data in the filesystem as a security mechanism.  The discussion
> wandered off into how 'chattr +s' could be implemented and whether
> encrypting filesystems wouldn't be a better solution to the problem.
> 
> I've been maintaining a simplified version of the patch for a different
> reason:  to keep filesystems in files sparse.  Filesystem images for use
> by things like user-mode Linux and Xen are often created as sparse files.
> After they've been in use for a while their sparseness is reduced even
> though they may have lots of free space.  Having the guest kernel fill
> deleted blocks with zeros doesn't make the underlying file sparse,
> but it does help.  I've got a page with more details:
> 
>    http://intgat.tigress.co.uk/rmy/uml/sparsify.html
> 
> Anyway, a couple of things:
> 
> 1. The patch (see below) is pretty simple.  I've been using it for some
>    time in UML build systems for old versions of software (rh62, anyone?),
>    and today I even tried it for several seconds in a Xen domU kernel.
>    It seems to do what I want, but is it any good?
> 
> 2. The patch is now for ext2 only, the original ext3 version having 
>    succumbed to bitrot.  What would it take to implement something
>    similar for ext3 these days?

Well, I think this should be optional, if included. It does directly
counteract the patch I recently sent to salvage files from their data
blocks in ext2/ext3. 

Best regards
keld


From tytso at mit.edu  Sun Apr  2 14:14:14 2006
From: tytso at mit.edu (Theodore Ts'o)
Date: Sun, 2 Apr 2006 10:14:14 -0400
Subject: [RFC] mke2fs with DIR_INDEX, RESIZE_INODE by default
In-Reply-To: <20060402063927.26510.qmail@web36709.mail.mud.yahoo.com>
References: <20060401024147.GA24163@thunk.org>
	<20060402063927.26510.qmail@web36709.mail.mud.yahoo.com>
Message-ID: <20060402141414.GA7745@thunk.org>

On Sat, Apr 01, 2006 at 10:39:27PM -0800, Robinson Tiemuqinke wrote:
>  A stupid questions to ask: 
> 
>  How to turn on "resize_inode" feature for ext3 file
> system created with old mke2fs?
> 
>  Is there is a tool to upgrade old ext3 file systems
> so that they will also have "resize_inode" feature?

If you download the ext2resize program from SourceForge, it has a
program called "ext2resize" which will do this.  Neither Stephen when
he was integrating on-line resizing for Fedora/Red Hat Enterprise
Linux, nor I when considering how to integrate this functionality into
e2fsprogs, were comfortable with the code base enough to accept
responsibility for maintaining it in its current form, which is why
that functionality has not yet appeared in either RHEL4 nor in e2fsprogs.

That being said, I'm not aware of anyone who has lost data or any
other serious bugs in the ext2prepare program in ext2resize, aside
from the fact that it has portability problems on big-endian systems.
(One of ext2resize's problems is that it doesn't use libext2fs, but
rather rolled its own library functions, which clearly was never
tested on big-endian systems but which also has application-level
functionality folded into its library routines, making a port to
libext2fs more difficult than it ought to have been.)

In any case, since requesting on-line resizing is now integrated into
resize2fs, the only missing functionality only found in ext2resze is
the ext2prepare progam to reserve space on an already-created ext3
filesystem.  Adding this support is on my todo list, but to be honest
other development items for e2fsprogs are higher priority at the
moment.  If someone wants to try writing an ext2prepare-like program
using libext2fs, let me know, and I can give you an outline of what
needs to be done.

						- Ted


From fk at linuxburg.de  Mon Apr  3 09:40:36 2006
From: fk at linuxburg.de (Felix E. Klee)
Date: Mon, 3 Apr 2006 11:40:36 +0200
Subject: Can copying a file damage the original?
Message-ID: <200604031140.37018.fk@linuxburg.de>

Consider the following scenario:

* A database is accessing a large file $a on an Ext3FS, writing to it, reading
  from it.

* While testing a backup script, the file $a is copied with rsync without
  prior shutdown of the database software.

Here's what just happened under this scenario:

  $a got damaged.

I'm certain that this is just a conincidence.  However, my employer recalls 
hearing other people stating that copying around files while copying them may 
damage the original.  I doubt that these other people have a clue, but 
perhaps it's me who doesn't have a clue: Are there any circumstances under 
which a source file in a copy operation can be damaged?

-- 
Dipl.-Phys. Felix E. Klee
Email: fk at linuxburg.de (work), felix.klee at inka.de (home)
Tel: +49 721 8307937, Fax: +49 721 8307936
Linuxburg, Goethestr. 15a, 76135 Karlsruhe, Germany


From sct at redhat.com  Mon Apr  3 20:06:58 2006
From: sct at redhat.com (Stephen C. Tweedie)
Date: Mon, 03 Apr 2006 16:06:58 -0400
Subject: FC5: "ext_attr" and "large_file" features for ext3 file
	systems ???
In-Reply-To: <20060328215257.28237.qmail@web36702.mail.mud.yahoo.com>
References: <20060328215257.28237.qmail@web36702.mail.mud.yahoo.com>
Message-ID: <1144094818.9387.7.camel@orbit.scot.redhat.com>

Hi,

On Tue, 2006-03-28 at 13:52 -0800, Robinson Tiemuqinke wrote:

>  First, what's the "large_file" feature REALLY means?
> Then, what's the size of "large file" to light this
> feature on? 2GB, or 2TB? 

2GB.

>  Second, the "ext_attr" feature seems another
> automatic one: it only appears after the first
> "setfacl" command runs on the file system and then the
> feature will keep on there forever even ACL is
> removed. What's the indication of "ext_attr" feature
> and what are the reasons behind to have this feature?

They are there simply to indicate that a given feature is present on the
filesystem.  They prevent old versions of the kernel and/or e2fsck tools
from mistakenly operating on a filesystem with newer features,
potentially corrupting things on disk or returning incorrect file
contents.

All remotely recent kernels have large file support for ext3, and all
2.6 ones (and many vendor-supplied 2.4 ones) have ext_attr, so you have
to be running something pretty old to run into compatibility problems
with either of those features.

--Stephen


From maillists at hosttuls.com  Tue Apr  4 00:16:28 2006
From: maillists at hosttuls.com (Brandon Evans)
Date: Mon, 03 Apr 2006 17:16:28 -0700
Subject: Filesystem too large...
Message-ID: <4431BADC.8000403@hosttuls.com>


I need to setup a 3.27TB ext3 filesystem using -i 1024 and -b 1024.

When I try to format this partition I get the "Filesystem too large." 
error.  Are there any plans to update these limits?  are there any 
patches already available that I can try out?  Or am I just SOL here?


(vzbu2 ~)# fdisk -l /dev/etherd/e1.1

Disk /dev/etherd/e1.1: 3600.7 GB, 3600795892224 bytes
255 heads, 63 sectors/track, 437771 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/etherd/e1.1 doesn't contain a valid partition table


(vzbu2 ~)# lvdisplay
   --- Logical volume ---
   LV Name                /dev/lvg01/vz
   VG Name                lvg01
   LV UUID                CH5TEA-WC61-oSMX-olxz-sBTf-L1Ho-E1740u
   LV Write Access        read/write
   LV Status              available
   # open                 0
   LV Size                3.27 TB
   Current LE             858496
   Segments               1
   Allocation             inherit
   Read ahead sectors     0
   Block device           253:0


mkfs.ext3 -i 1024 -b 1024 /dev/lvg01/vz
mke2fs 1.35 (28-Feb-2004)
mkfs.ext3: Filesystem too large.  No more than 2**31-1 blocks
          (8TB using a blocksize of 4k) are currently supported.


-- 

   Brandon Evans

"I have a theory that the truth is never told during the nine-to-five 
hours."
-Hunter S. Thompson


From hahaha_30k at yahoo.com  Tue Apr  4 02:03:21 2006
From: hahaha_30k at yahoo.com (Robinson Tiemuqinke)
Date: Mon, 3 Apr 2006 19:03:21 -0700 (PDT)
Subject: FC5: "ext_attr" and "large_file" features for ext3 file systems
	???
In-Reply-To: <1144094818.9387.7.camel@orbit.scot.redhat.com>
Message-ID: <20060404020321.31102.qmail@web36701.mail.mud.yahoo.com>


Thanks a lot.

Another question is: 

 Do I have to run "e2fsck -y -D" on a file system to
active "dir_index" feature? 

 I have bunches of old ext3 file systems created with
old versions of mkfs.ext3, then after upgraded to
Fedora Core 5, I run "tune2fs -O dir_index" to have
turned on the feature, but it is rumored that I have
to run "e2fsck -y -D" after unmounting old ext3 file
systems so that new file and directory creations will
use hased B-tree. 

 If that's corrct? If I don't run "e2fsck -y -D", then
original linear directory structure will be still in
effect even I turned on "dir_index" feature with
tune2fs? For this case, what's the potential effects
on the underlying old ext3 file systems?

Thanks a lot.
 

--- "Stephen C. Tweedie" <sct at redhat.com> wrote:

> Hi,
> 
> On Tue, 2006-03-28 at 13:52 -0800, Robinson
> Tiemuqinke wrote:
> 
> >  First, what's the "large_file" feature REALLY
> means?
> > Then, what's the size of "large file" to light
> this
> > feature on? 2GB, or 2TB? 
> 
> 2GB.
> 
> >  Second, the "ext_attr" feature seems another
> > automatic one: it only appears after the first
> > "setfacl" command runs on the file system and then
> the
> > feature will keep on there forever even ACL is
> > removed. What's the indication of "ext_attr"
> feature
> > and what are the reasons behind to have this
> feature?
> 
> They are there simply to indicate that a given
> feature is present on the
> filesystem.  They prevent old versions of the kernel
> and/or e2fsck tools
> from mistakenly operating on a filesystem with newer
> features,
> potentially corrupting things on disk or returning
> incorrect file
> contents.
> 
> All remotely recent kernels have large file support
> for ext3, and all
> 2.6 ones (and many vendor-supplied 2.4 ones) have
> ext_attr, so you have
> to be running something pretty old to run into
> compatibility problems
> with either of those features.
> 
> --Stephen
> 
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From adilger at clusterfs.com  Tue Apr  4 07:01:21 2006
From: adilger at clusterfs.com (Andreas Dilger)
Date: Tue, 4 Apr 2006 01:01:21 -0600
Subject: FC5: "ext_attr" and "large_file" features for ext3 file systems
	???
In-Reply-To: <20060404020321.31102.qmail@web36701.mail.mud.yahoo.com>
References: <1144094818.9387.7.camel@orbit.scot.redhat.com>
	<20060404020321.31102.qmail@web36701.mail.mud.yahoo.com>
Message-ID: <20060404070121.GK17364@schatzie.adilger.int>

On Apr 03, 2006  19:03 -0700, Robinson Tiemuqinke wrote:
>  Do I have to run "e2fsck -y -D" on a file system to
> active "dir_index" feature? 

You do not HAVE to run this, as new directories and existing directories
that grow larger than one block (normally 4kB) will start to use the
directory indexing feature.  However, to use dir indexing on existing
large directories you do need to use "e2fsck -f -D".  This will also
"pack" large directories that have had most of the files deleted out,
AFAIR.

>  I have bunches of old ext3 file systems created with
> old versions of mkfs.ext3, then after upgraded to
> Fedora Core 5, I run "tune2fs -O dir_index" to have
> turned on the feature, but it is rumored that I have
> to run "e2fsck -y -D" after unmounting old ext3 file
> systems so that new file and directory creations will
> use hased B-tree. 
> 
>  If that's corrct? If I don't run "e2fsck -y -D", then
> original linear directory structure will be still in
> effect even I turned on "dir_index" feature with
> tune2fs? For this case, what's the potential effects
> on the underlying old ext3 file systems?
> 
> Thanks a lot.
>  
> 
>  
> --- "Stephen C. Tweedie" <sct at redhat.com> wrote:
> 
> > Hi,
> > 
> > On Tue, 2006-03-28 at 13:52 -0800, Robinson
> > Tiemuqinke wrote:
> > 
> > >  First, what's the "large_file" feature REALLY
> > means?
> > > Then, what's the size of "large file" to light
> > this
> > > feature on? 2GB, or 2TB? 
> > 
> > 2GB.
> > 
> > >  Second, the "ext_attr" feature seems another
> > > automatic one: it only appears after the first
> > > "setfacl" command runs on the file system and then
> > the
> > > feature will keep on there forever even ACL is
> > > removed. What's the indication of "ext_attr"
> > feature
> > > and what are the reasons behind to have this
> > feature?
> > 
> > They are there simply to indicate that a given
> > feature is present on the
> > filesystem.  They prevent old versions of the kernel
> > and/or e2fsck tools
> > from mistakenly operating on a filesystem with newer
> > features,
> > potentially corrupting things on disk or returning
> > incorrect file
> > contents.
> > 
> > All remotely recent kernels have large file support
> > for ext3, and all
> > 2.6 ones (and many vendor-supplied 2.4 ones) have
> > ext_attr, so you have
> > to be running something pretty old to run into
> > compatibility problems
> > with either of those features.
> > 
> > --Stephen
> > 
> > 
> > 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


From adilger at clusterfs.com  Tue Apr  4 06:56:21 2006
From: adilger at clusterfs.com (Andreas Dilger)
Date: Tue, 4 Apr 2006 00:56:21 -0600
Subject: Filesystem too large...
In-Reply-To: <4431BADC.8000403@hosttuls.com>
References: <4431BADC.8000403@hosttuls.com>
Message-ID: <20060404065621.GJ17364@schatzie.adilger.int>

On Apr 03, 2006  17:16 -0700, Brandon Evans wrote:
> I need to setup a 3.27TB ext3 filesystem using -i 1024 and -b 1024.
> 
> When I try to format this partition I get the "Filesystem too large." 
> error.  Are there any plans to update these limits?  are there any 
> patches already available that I can try out?  Or am I just SOL here?

The same patches that have been posted here (or maybe ext2-devel?)
to increase the fs size to 16TB are applicable in your case.  They
are experimental at this stage, however, but as always, testing is
welcome.

The other question is why you want to have a 3TB filesystem with 1kB
blocks, unless you are consistently creating very small files...

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


From sct at redhat.com  Tue Apr  4 18:10:10 2006
From: sct at redhat.com (Stephen C. Tweedie)
Date: Tue, 04 Apr 2006 14:10:10 -0400
Subject: Filesystem too large...
In-Reply-To: <20060404065621.GJ17364@schatzie.adilger.int>
References: <4431BADC.8000403@hosttuls.com>
	<20060404065621.GJ17364@schatzie.adilger.int>
Message-ID: <1144174210.3411.24.camel@orbit.scot.redhat.com>

Hi,

On Tue, 2006-04-04 at 00:56 -0600, Andreas Dilger wrote:
> On Apr 03, 2006  17:16 -0700, Brandon Evans wrote:
> > I need to setup a 3.27TB ext3 filesystem using -i 1024 and -b 1024.
> > 
> > When I try to format this partition I get the "Filesystem too large." 
> > error.  Are there any plans to update these limits?  are there any 
> > patches already available that I can try out?  Or am I just SOL here?
> 
> The same patches that have been posted here (or maybe ext2-devel?)
> to increase the fs size to 16TB are applicable in your case.

Yes; just note that with a 1k blocksize, 2^32 blocks will only get you
as far as 4TB, not 16TB.  But yes, it should work.

However, 1k blocksize is usually a bad idea unless you really need the
very very best space efficiency on the filesystem: it usually performs
worse than 4k blocksize, and it imposes other limits such as a maximum
file size of a bit over 16GB.  With 4k blocksize, a 3.27TB filesystem
should just work.

--Stephen


From maillists at hosttuls.com  Tue Apr  4 21:22:20 2006
From: maillists at hosttuls.com (Brandon Evans)
Date: Tue, 04 Apr 2006 14:22:20 -0700
Subject: Filesystem too large...
In-Reply-To: <20060404065621.GJ17364@schatzie.adilger.int>
References: <4431BADC.8000403@hosttuls.com>
	<20060404065621.GJ17364@schatzie.adilger.int>
Message-ID: <4432E38C.1070506@hosttuls.com>

Andreas Dilger wrote:
> On Apr 03, 2006  17:16 -0700, Brandon Evans wrote:
>> I need to setup a 3.27TB ext3 filesystem using -i 1024 and -b 1024.
>>
>> When I try to format this partition I get the "Filesystem too large." 
>> error.  Are there any plans to update these limits?  are there any 
>> patches already available that I can try out?  Or am I just SOL here?

> The other question is why you want to have a 3TB filesystem with 1kB
> blocks, unless you are consistently creating very small files...

The server I am preparing is a sw-soft virtuozzo backup server which 
requires the 1kB blocks.  The small blocks are need for the magic links 
it uses in the virtual environment.


-- 

   Brandon Evans

"I have a theory that the truth is never told during the nine-to-five 
hours."
-Hunter S. Thompson


From maillists at hosttuls.com  Tue Apr  4 22:44:13 2006
From: maillists at hosttuls.com (Brandon Evans)
Date: Tue, 04 Apr 2006 15:44:13 -0700
Subject: Filesystem too large...
In-Reply-To: <1144174210.3411.24.camel@orbit.scot.redhat.com>
References: <4431BADC.8000403@hosttuls.com>	
	<20060404065621.GJ17364@schatzie.adilger.int>
	<1144174210.3411.24.camel@orbit.scot.redhat.com>
Message-ID: <4432F6BD.7080201@hosttuls.com>

Stephen C. Tweedie wrote:
> Hi,
> 
> On Tue, 2006-04-04 at 00:56 -0600, Andreas Dilger wrote:
>> On Apr 03, 2006  17:16 -0700, Brandon Evans wrote:
>>> I need to setup a 3.27TB ext3 filesystem using -i 1024 and -b 1024.
>>>
>>> When I try to format this partition I get the "Filesystem too large." 
>>> error.  Are there any plans to update these limits?  are there any 
>>> patches already available that I can try out?  Or am I just SOL here?
>> The same patches that have been posted here (or maybe ext2-devel?)
>> to increase the fs size to 16TB are applicable in your case.
> 
> Yes; just note that with a 1k blocksize, 2^32 blocks will only get you
> as far as 4TB, not 16TB.  But yes, it should work.
> 
> However, 1k blocksize is usually a bad idea unless you really need the
> very very best space efficiency on the filesystem: it usually performs
> worse than 4k blocksize, and it imposes other limits such as a maximum
> file size of a bit over 16GB.  With 4k blocksize, a 3.27TB filesystem
> should just work.


I should mention I have tried this on 2.6.14 and 2.6.8.  From what I 
have found, it seems thees kernels should already have the 16TB file 
system support.  Perhaps I am looking in the wrong place.  Any help 
finding this patch would be appreciated


-- 

   Brandon Evans

"I have a theory that the truth is never told during the nine-to-five 
hours."
-Hunter S. Thompson


From talk2sumit at gmail.com  Thu Apr  6 06:37:33 2006
From: talk2sumit at gmail.com (Sumit Narayan)
Date: Thu, 6 Apr 2006 14:37:33 +0800
Subject: deleting partition does not effect superblock?
Message-ID: <1458d9610604052337p2cafa6c8j78fc6da8c5f8be1a@mail.gmail.com>

Hi,

I am using kernel 2.6.15.4.

On my system, I first created a partition with EXT3 and put some data
on it. Later, I deleted the partition, and re-created another
partition with the same starting block number and a higher ending
block number. I intended to format it with another filesystem, but
surprisingly (or maybe just to me), the superblock of the partition
had not changed. I could still mount the new partition as the same old
filesystem. I could see all the files which was present earlier. Doing
'df' showed me the older partition details (size, % used etc.).

Shouldn't the superblock be changed/deleted once the partition is
deleted? I tried a reboot, but the output remained the same.

-- Sumit


From menscher at uiuc.edu  Thu Apr  6 07:31:31 2006
From: menscher at uiuc.edu (Damian Menscher)
Date: Thu, 6 Apr 2006 02:31:31 -0500 (CDT)
Subject: deleting partition does not effect superblock?
In-Reply-To: <1458d9610604052337p2cafa6c8j78fc6da8c5f8be1a@mail.gmail.com>
References: <1458d9610604052337p2cafa6c8j78fc6da8c5f8be1a@mail.gmail.com>
Message-ID: <Pine.LNX.4.63.0604060228560.8287@zeus.itg.uiuc.edu>

On Thu, 6 Apr 2006, Sumit Narayan wrote:

> On my system, I first created a partition with EXT3 and put some data
> on it. Later, I deleted the partition, and re-created another
> partition with the same starting block number and a higher ending
> block number. I intended to format it with another filesystem, but
> surprisingly (or maybe just to me), the superblock of the partition
> had not changed. I could still mount the new partition as the same old
> filesystem. I could see all the files which was present earlier. Doing
> 'df' showed me the older partition details (size, % used etc.).
>
> Shouldn't the superblock be changed/deleted once the partition is
> deleted? I tried a reboot, but the output remained the same.

This is the expected behavior.  A filesystem is created within the 
partition.  If you grow the partition, the filesystem doesn't 
automatically grow (use resize2fs for that).  In fact, you should 
probably read the resize2fs manpage, as it might give you some starting 
clue of what's going on.

Damian Menscher
-- 
-=#| <menscher at uiuc.edu> www.uiuc.edu/~menscher/ Ofc:(650)253-2757 |#=-
-=#| The above opinions are not necessarily those of my employers. |#=-


From maillists at hosttuls.com  Thu Apr  6 21:12:06 2006
From: maillists at hosttuls.com (Brandon Evans)
Date: Thu, 06 Apr 2006 14:12:06 -0700
Subject: Filesystem too large...
In-Reply-To: <4432E38C.1070506@hosttuls.com>
References: <4431BADC.8000403@hosttuls.com>
	<20060404065621.GJ17364@schatzie.adilger.int>
	<4432E38C.1070506@hosttuls.com>
Message-ID: <44358426.2030500@hosttuls.com>

Brandon Evans wrote:
> Andreas Dilger wrote:

> The server I am preparing is a sw-soft virtuozzo backup server which 
> requires the 1kB blocks.  The small blocks are need for the magic links 
> it uses in the virtual environment.
> 
> 
It turns our the 4kB block size is not 100% necessary for virtuozzo, so 
I just formated with the 4Kb and moved on.


-- 

   Brandon Evans

"I have a theory that the truth is never told during the nine-to-five 
hours."
-Hunter S. Thompson


From jbglaw at lug-owl.de  Thu Apr  6 06:58:32 2006
From: jbglaw at lug-owl.de (Jan-Benedict Glaw)
Date: Thu, 6 Apr 2006 08:58:32 +0200
Subject: deleting partition does not effect superblock?
In-Reply-To: <1458d9610604052337p2cafa6c8j78fc6da8c5f8be1a@mail.gmail.com>
References: <1458d9610604052337p2cafa6c8j78fc6da8c5f8be1a@mail.gmail.com>
Message-ID: <20060406065832.GK13324@lug-owl.de>

On Thu, 2006-04-06 14:37:33 +0800, Sumit Narayan <talk2sumit at gmail.com> wrote:
> Shouldn't the superblock be changed/deleted once the partition is
> deleted? I tried a reboot, but the output remained the same.

No, everything you see is "works as expected."  A partition is only a
container (as well as "disks", "volume groups", "RAID arrays",
"logical volumes", "image files" etc. are.)

Whenever you destroy such a container, its contents isn't modified (or
deleted) or otherwise modified. So it's perfectly okay to delete such
a container (eg. remove start and end from the partition table) and
recreate it at some time later (by adding those values back to the
partition table.)  As long as the new container starts at the same
location, a filesystem driver will be able to find the old
information. If you start a block later, it won't find it's
superblocks.

Finally, you have several choices how to defeat getting back old data.
Most probably, you'd just zero it out before deleting the partition
with something like:

# cat /dev/zero > /dev/hda3

(of course with the correct device name!)

MfG, JBG

-- 
Jan-Benedict Glaw       jbglaw at lug-owl.de    . +49-172-7608481             _ O _
"Eine Freie Meinung in  einem Freien Kopf    | Gegen Zensur | Gegen Krieg  _ _ O
 f?r einen Freien Staat voll Freier B?rger"  | im Internet! |   im Irak!   O O O
ret = do_actions((curr | FREE_SPEECH) & ~(NEW_COPYRIGHT_LAW | DRM | TCPA));
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20060406/b3a0e144/attachment.sig>

From hahaha_30k at yahoo.com  Fri Apr  7 07:20:48 2006
From: hahaha_30k at yahoo.com (Robinson Tiemuqinke)
Date: Fri, 7 Apr 2006 00:20:48 -0700 (PDT)
Subject: How to interpret the output of  'iostat -x /dev/sdb1 20 100' ?? 
Message-ID: <20060407072048.17474.qmail@web36709.mail.mud.yahoo.com>

Hi,
 
 I'm  a newbie to tool 'iostat' and I've read the
manual for iostat several times. But it doesn't help.
I still get confused with the output of 'iostat', the
manual seems too abstract, or high-level, for me.

Let's post the output first:

avg-cpu:  %user   %nice    %sys   %idle
           5.70    0.00    3.15   91.15

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s  
 rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm 
%util
/dev/sdb1    0.60   4.70 12.60  1.50  105.60   49.60  
 52.80    24.80    11.01     1.54   10.92   8.65 
12.20


I'll ask about the rrqm/s, r/s, rsec/s, avgrq-sz,
avgqu-sz, await, svctm and %util in the above output.

First question: How many physical disk I/O read
requests are sent to hard drive by kernel driver? is
it the subtract of (r/s - rrqm/s), or just r/s? if it
is r/s, then  it means user&sys applications send
(r/s+rrqm/s) read requests to kernel per second?

Second question: ( r/s * avgrq-sz ) is 30% bigger than
rsec/s, why? they should be equal or little difference
related to calculation omission.

Third Question: What's the UNIT of avgqu-sz, is it
NONE, or sector, or something else? If it is NONE,
then does it mean that the unit is 'read request'?

4th question: (await + svctm) is the time span for a
read request from being dispatched (by kernel driver)
to being served? If so, could we use this number as a
criteria for (disk + file_system) performance ?

5th question: %util is which percentage of CPU time?
it looks too abstract in manual, does it means (disk
I/O opertions time) divided by (%user + %nice + %sys)?
Or it is (%user + %nice + %sys) divided by all the
system time lots (%user + %nice + %sys +%idle)?

I got lost completely here, Please help.

Thanks a lot.


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From hahaha_30k at yahoo.com  Fri Apr  7 07:20:48 2006
From: hahaha_30k at yahoo.com (Robinson Tiemuqinke)
Date: Fri, 7 Apr 2006 00:20:48 -0700 (PDT)
Subject: How to interpret the output of  'iostat -x /dev/sdb1 20 100' ?? 
Message-ID: <20060407072048.17474.qmail@web36709.mail.mud.yahoo.com>

Hi,
 
 I'm  a newbie to tool 'iostat' and I've read the
manual for iostat several times. But it doesn't help.
I still get confused with the output of 'iostat', the
manual seems too abstract, or high-level, for me.

Let's post the output first:

avg-cpu:  %user   %nice    %sys   %idle
           5.70    0.00    3.15   91.15

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s  
 rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm 
%util
/dev/sdb1    0.60   4.70 12.60  1.50  105.60   49.60  
 52.80    24.80    11.01     1.54   10.92   8.65 
12.20


I'll ask about the rrqm/s, r/s, rsec/s, avgrq-sz,
avgqu-sz, await, svctm and %util in the above output.

First question: How many physical disk I/O read
requests are sent to hard drive by kernel driver? is
it the subtract of (r/s - rrqm/s), or just r/s? if it
is r/s, then  it means user&sys applications send
(r/s+rrqm/s) read requests to kernel per second?

Second question: ( r/s * avgrq-sz ) is 30% bigger than
rsec/s, why? they should be equal or little difference
related to calculation omission.

Third Question: What's the UNIT of avgqu-sz, is it
NONE, or sector, or something else? If it is NONE,
then does it mean that the unit is 'read request'?

4th question: (await + svctm) is the time span for a
read request from being dispatched (by kernel driver)
to being served? If so, could we use this number as a
criteria for (disk + file_system) performance ?

5th question: %util is which percentage of CPU time?
it looks too abstract in manual, does it means (disk
I/O opertions time) divided by (%user + %nice + %sys)?
Or it is (%user + %nice + %sys) divided by all the
system time lots (%user + %nice + %sys +%idle)?

I got lost completely here, Please help.

Thanks a lot.


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

-- 
fedora-list mailing list
fedora-list at redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
--------------------------------------------------------

This e-mail and any attachments are confidential and may also be legally
privileged and/or copyright material of Intec Telecom Systems PLC (or its
affiliated companies). If you are not an intended or authorised recipient
of this e-mail or have received it in error, please delete it immediately
and notify the sender by e-mail. In such a case, reading, reproducing,
printing or further dissemination of this e-mail or its contents is strictly
prohibited and may be unlawful.
Intec Telecom Systems PLC does not represent or warrant that an attachment
hereto is free from computer viruses or other defects. The opinions
expressed in this e-mail and any attachments may be those of the author and
are not necessarily those of Intec Telecom Systems PLC.


From jerume at assiniemafia.com  Sun Apr  9 03:01:23 2006
From: jerume at assiniemafia.com (jerume)
Date: Sun, 09 Apr 2006 05:01:23 +0200
Subject: Table creation failed
Message-ID: <44387903.5090207@assiniemafia.com>

Hello,

I come to you beacause i have something that i dont understand :

i m using udev on a debian sid with 2.6.15.1 kernel.

I have created an deprecated raid at /dev/md0
when i tried doing mkfs.ext3 /dev/md0 i have got :

mke2fs 1.39-WIP (29-Mar-2006)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
4643968 inodes, 9277344 blocks
463867 blocks (5.00%) reserved for the super user
First data block=0
284 block groups
32768 blocks per group, 32768 fragments per group
16352 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 
2654208,
        4096000, 7962624

Writing inode tables: done                           
Creating journal (32768 blocks): mkfs.ext3: Device or resource busy
        while trying to create journal

zsh: exit 1     mkfs.ext3 /dev/md0

Could you help me please ?

Thanks for open source.

J?r?me. ;)


From coywolf at sosdg.org  Sun Apr  9 03:50:25 2006
From: coywolf at sosdg.org (Coywolf Qi Hunt)
Date: Sat, 8 Apr 2006 23:50:25 -0400
Subject: Table creation failed
In-Reply-To: <44387903.5090207@assiniemafia.com>
References: <44387903.5090207@assiniemafia.com>
Message-ID: <20060409035025.GA28159@everest.sosdg.org>

On Sun, Apr 09, 2006 at 05:01:23AM +0200, jerume wrote:
> Hello,
> 
> I come to you beacause i have something that i dont understand :
> 
> i m using udev on a debian sid with 2.6.15.1 kernel.
> 
> I have created an deprecated raid at /dev/md0
> when i tried doing mkfs.ext3 /dev/md0 i have got :
> 
> mke2fs 1.39-WIP (29-Mar-2006)
> Filesystem label=
> OS type: Linux
> Block size=4096 (log=2)
> Fragment size=4096 (log=2)
> 4643968 inodes, 9277344 blocks
> 463867 blocks (5.00%) reserved for the super user
> First data block=0
> 284 block groups
> 32768 blocks per group, 32768 fragments per group
> 16352 inodes per group
> Superblock backups stored on blocks:
>        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 
> 2654208,
>        4096000, 7962624
> 
> Writing inode tables: done                           
> Creating journal (32768 blocks): mkfs.ext3: Device or resource busy
>        while trying to create journal
> 
> zsh: exit 1     mkfs.ext3 /dev/md0
> 
> Could you help me please ?

Look at http://thunk.org/hg/e2fsprogs/?cs=1bfd437f2f61

	Coywolf


From jerume at assiniemafia.com  Mon Apr 10 12:28:22 2006
From: jerume at assiniemafia.com (jerume)
Date: Mon, 10 Apr 2006 14:28:22 +0200
Subject: Table creation failed
In-Reply-To: <20060409035025.GA28159@everest.sosdg.org>
References: <44387903.5090207@assiniemafia.com>
	<20060409035025.GA28159@everest.sosdg.org>
Message-ID: <443A4F66.9030602@assiniemafia.com>

Coywolf Qi Hunt wrote:
> On Sun, Apr 09, 2006 at 05:01:23AM +0200, jerume wrote:
>   
>> Hello,
>>
>> I come to you beacause i have something that i dont understand :
>>
>> i m using udev on a debian sid with 2.6.15.1 kernel.
>>
>> I have created an deprecated raid at /dev/md0
>> when i tried doing mkfs.ext3 /dev/md0 i have got :
>>
>> mke2fs 1.39-WIP (29-Mar-2006)
>> Filesystem label=
>> OS type: Linux
>> Block size=4096 (log=2)
>> Fragment size=4096 (log=2)
>> 4643968 inodes, 9277344 blocks
>> 463867 blocks (5.00%) reserved for the super user
>> First data block=0
>> 284 block groups
>> 32768 blocks per group, 32768 fragments per group
>> 16352 inodes per group
>> Superblock backups stored on blocks:
>>        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 
>> 2654208,
>>        4096000, 7962624
>>
>> Writing inode tables: done                           
>> Creating journal (32768 blocks): mkfs.ext3: Device or resource busy
>>        while trying to create journal
>>
>> zsh: exit 1     mkfs.ext3 /dev/md0
>>
>> Could you help me please ?
>>     
>
> Look at http://thunk.org/hg/e2fsprogs/?cs=1bfd437f2f61
>
> 	Coywolf
>   
Thanks you :)
I would rather wait for the update package unless it'll take toolong.
Let me know what you think of it ;)

bye


From jbglaw at lug-owl.de  Mon Apr 10 16:00:52 2006
From: jbglaw at lug-owl.de (Jan-Benedict Glaw)
Date: Mon, 10 Apr 2006 18:00:52 +0200
Subject: deleting partition does not effect superblock?
In-Reply-To: <Pine.LNX.4.61.0604101725130.922@yvahk01.tjqt.qr>
References: <1458d9610604052337p2cafa6c8j78fc6da8c5f8be1a@mail.gmail.com>
	<20060406065832.GK13324@lug-owl.de>
	<Pine.LNX.4.61.0604101725130.922@yvahk01.tjqt.qr>
Message-ID: <20060410160052.GO13324@lug-owl.de>

On Mon, 2006-04-10 17:28:18 +0200, Jan Engelhardt <jengelh at linux01.gwdg.de> wrote:
> >deleted) or otherwise modified. So it's perfectly okay to delete such
> >a container (eg. remove start and end from the partition table) and
> >recreate it at some time later (by adding those values back to the
> >partition table.)  As long as the new container starts at the same
> >location, a filesystem driver will be able to find the old
> >information. If you start a block later, it won't find it's
> >superblocks.
> >
> If using a filesystem with replicated superblocks (ext*, xfs), then ...?
> [Includes expecting weird breakage.]

I'll possibly test if this works in another life...

MfG, JBG

-- 
Jan-Benedict Glaw       jbglaw at lug-owl.de    . +49-172-7608481             _ O _
"Eine Freie Meinung in  einem Freien Kopf    | Gegen Zensur | Gegen Krieg  _ _ O
 f?r einen Freien Staat voll Freier B?rger"  | im Internet! |   im Irak!   O O O
ret = do_actions((curr | FREE_SPEECH) & ~(NEW_COPYRIGHT_LAW | DRM | TCPA));
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20060410/1754e289/attachment.sig>

From tytso at mit.edu  Mon Apr 10 16:08:37 2006
From: tytso at mit.edu (Theodore Ts'o)
Date: Mon, 10 Apr 2006 12:08:37 -0400
Subject: Table creation failed
In-Reply-To: <443A4F66.9030602@assiniemafia.com>
References: <44387903.5090207@assiniemafia.com>
	<20060409035025.GA28159@everest.sosdg.org>
	<443A4F66.9030602@assiniemafia.com>
Message-ID: <20060410160837.GB24654@thunk.org>

On Mon, Apr 10, 2006 at 02:28:22PM +0200, jerume wrote:
> >>Writing inode tables: done                           
> >>Creating journal (32768 blocks): mkfs.ext3: Device or resource busy
> >>       while trying to create journal
> >>
> >
> >Look at http://thunk.org/hg/e2fsprogs/?cs=1bfd437f2f61
> >
> I would rather wait for the update package unless it'll take toolong.
> Let me know what you think of it ;)

I just put out a new WIP release (09-Apr-2006) last night/this morning
which has this and the AMD64 build bug that were biting folks with the
last WIP release.  It can be found at:

	https://sourceforge.net/project/showfiles.php?group_id=2406

							- Ted


From sev at bnl.gov  Tue Apr 11 16:34:12 2006
From: sev at bnl.gov (Sev Binello)
Date: Tue, 11 Apr 2006 12:34:12 -0400
Subject: ext3 filesystem corruption
Message-ID: <443BDA84.7010102@bnl.gov>

Hi -

    We have had 3 rather major occurances of ext3 filesystem corruption 
lately,
    i.e. so bad we couldn't event mount, and fsck didn't help.

    I am looking for pointers, that could help us investigate the root 
cause.

    In general...
   
    We are running  RedHat WS 3 Update 6,   2.4.21-40.2.ELsmp or 
2.4.21-37.ELsmp

    We have a small SAN  system that looks like this
      
          3 NFS servers each containing 2 Qlocic hba's connected to 2 
qlogic switches
          connected to an nstor (now xyratex) 6TB raid system containing 
2 (active-active) controllers.

  On the first 2 occasions one of the controllers was failed over.
  On a 3rd occasion both SAN  switches lost power, and the hosts and raid lost communication.
  

  On all occasions the qlocic failover driver tried to start up on the alternate HBA.

  On the first 2 instances we sort of tried to blame the controller.
  On the 3rd, that was harder to do since the raid system and the hosts stayed up
  but lost communication.

  I can provide more detail if anyone as any info on how to proceed.

Thanks
-Sev

-- 

Sev Binello
Brookhaven National Laboratory
Upton, New York
631-344-5647
sev at bnl.gov


From adilger at clusterfs.com  Tue Apr 11 17:28:56 2006
From: adilger at clusterfs.com (Andreas Dilger)
Date: Tue, 11 Apr 2006 11:28:56 -0600
Subject: ext3 filesystem corruption
In-Reply-To: <443BDA84.7010102@bnl.gov>
References: <443BDA84.7010102@bnl.gov>
Message-ID: <20060411172856.GA17364@schatzie.adilger.int>

On Apr 11, 2006  12:34 -0400, Sev Binello wrote:
>    We are running  RedHat WS 3 Update 6,   2.4.21-40.2.ELsmp or 
> 2.4.21-37.ELsmp
> 
>    We have a small SAN  system that looks like this
>      
>          3 NFS servers each containing 2 Qlocic hba's connected to 2 
> qlogic switches
>          connected to an nstor (now xyratex) 6TB raid system containing 
> 2 (active-active) controllers.

Does this imply you have a 6TB ext3 filesystem?

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


From sev at bnl.gov  Wed Apr 12 23:28:40 2006
From: sev at bnl.gov (Sev Binello)
Date: Wed, 12 Apr 2006 19:28:40 -0400
Subject: ext3 filesystem corruption - more info
In-Reply-To: <443BDA84.7010102@bnl.gov>
References: <443BDA84.7010102@bnl.gov>
Message-ID: <443D8D28.3090202@bnl.gov>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20060412/c052ef57/attachment.htm>

From menscher at uiuc.edu  Thu Apr 13 00:06:11 2006
From: menscher at uiuc.edu (Damian Menscher)
Date: Wed, 12 Apr 2006 19:06:11 -0500 (CDT)
Subject: ext3 filesystem corruption - more info
In-Reply-To: <443D8D28.3090202@bnl.gov>
References: <443BDA84.7010102@bnl.gov> <443D8D28.3090202@bnl.gov>
Message-ID: <Pine.LNX.4.63.0604121904140.13237@zeus.itg.uiuc.edu>

I've seen similar errors when attempting to have a >2TB filesystem on a 
32-bit RHEL3 machine.  We have since implemented a 3.5TB filesystem on a 
64-bit RHEL4 machine.

It would help if you could answer the question Andreas Dilger posed:

"Does this imply you have a 6TB ext3 filesystem?"

Damian

On Wed, 12 Apr 2006, Sev Binello wrote:

> 
> Hi -
> 
> In case this helps,
> we got the following messages from EXT3 before the filesystem went
> Does anyone recognize these.....
> 
> //seems to mount okay
>    Mar 25 17:52:30 acnlin82 kernel: EXT3 FS 2.4-0.9.19, 19 August 2002 on sd(8,33),
> internal journal
>    Mar 25 17:52:30 acnlin82 kernel: EXT3-fs: recovery complete.
>    Mar 26 00:04:01 acnlin82 kernel: EXT3-fs: mounted filesystem with ordered data
> mode.
> 
> //soon as nfs clients start get a TON of errors like this
> Mar 26 00:07:19 acnlin82 kernel: EXT3-fs error (device sd(8,49)): ext3_free_blocks:
> Freeing blocks not in datazone - block =    3443589120, count = 1
> Mar 26 00:07:19 acnlin82 kernel: EXT3-fs error (device sd(8,49)): ext3_free_blocks:
> Freeing blocks not in datazone - block = 2113834232, count = 1
> Mar 26 00:07:22 acnlin82 kernel: EXT3-fs error (device sd(8,49)): ext3_free_blocks:
> bit already cleared for block 49125
> 
> //interspersed with some of these
> Mar 26 00:10:56 acnlin82 kernel: attempt to access beyond end of device
> Mar 26 00:10:56 acnlin82 kernel: 08:31: rw=0, want=1891463980, limit=1722264358
> Mar 26 00:10:56 acnlin82 kernel: attempt to access beyond end of device
> Mar 26 00:10:56 acnlin82 kernel: 08:31: rw=0, want=1824250576, limit=1722264358
> Mar 26 00:10:56 acnlin82 kernel: attempt to access beyond end of device
> 
> Then we had to reboot and basically filesystem is shot
> 
> Thanks
> -Sev
> 
> Sev Binello wrote:
>       Hi -
>
>          We have had 3 rather major occurances of ext3 filesystem corruption
>       lately,
>          i.e. so bad we couldn't event mount, and fsck didn't help.
>
>          I am looking for pointers, that could help us investigate the root
>       cause.
>
>          In general...
>            We are running  RedHat WS 3 Update 6,   2.4.21-40.2.ELsmp or
>       2.4.21-37.ELsmp
>
>          We have a small SAN  system that looks like this
>                     3 NFS servers each containing 2 Qlocic hba's connected to 2
>       qlogic switches
>                connected to an nstor (now xyratex) 6TB raid system containing 2
>       (active-active) controllers.
>
>        On the first 2 occasions one of the controllers was failed over.
>        On a 3rd occasion both SAN  switches lost power, and the hosts and raid
>       lost communication.
> 
>
>        On all occasions the qlocic failover driver tried to start up on the
>       alternate HBA.
>
>        On the first 2 instances we sort of tried to blame the controller.
>        On the 3rd, that was harder to do since the raid system and the hosts
>       stayed up
>        but lost communication.
>
>        I can provide more detail if anyone as any info on how to proceed.
>
>       Thanks
>       -Sev
> 
> 
>
>  -- 
> 
> Sev Binello
> Brookhaven National Laboratory
> Upton, New York
> 631-344-5647
> sev at bnl.gov
> 
>

Damian Menscher
-- 
-=#| <menscher at uiuc.edu> www.uiuc.edu/~menscher/ Ofc:(650)253-2757 |#=-
-=#| The above opinions are not necessarily those of my employers. |#=-


From sev at bnl.gov  Thu Apr 13 00:20:48 2006
From: sev at bnl.gov (Sev Binello)
Date: Wed, 12 Apr 2006 20:20:48 -0400
Subject: ext3 filesystem corruption - more info
In-Reply-To: <Pine.LNX.4.63.0604121904140.13237@zeus.itg.uiuc.edu>
References: <443BDA84.7010102@bnl.gov> <443D8D28.3090202@bnl.gov>
	<Pine.LNX.4.63.0604121904140.13237@zeus.itg.uiuc.edu>
Message-ID: <443D9960.20402@bnl.gov>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20060412/98e34a36/attachment.htm>

From adilger at clusterfs.com  Thu Apr 13 05:40:56 2006
From: adilger at clusterfs.com (Andreas Dilger)
Date: Wed, 12 Apr 2006 23:40:56 -0600
Subject: ext3 filesystem corruption - more info
In-Reply-To: <443D8D28.3090202@bnl.gov>
References: <443BDA84.7010102@bnl.gov> <443D8D28.3090202@bnl.gov>
Message-ID: <20060413054056.GP17364@schatzie.adilger.int>

On Apr 12, 2006  19:28 -0400, Sev Binello wrote:
[HTML-only email] - it would be preferred if you used plain text, or at
least multipart/mixed for your email to this list...

> //soon as nfs clients start get a TON of errors like this
> Mar 26 00:07:19 acnlin82 kernel: EXT3-fs error (device sd(8,49)):
> ext3_free_blocks: Freeing blocks not in datazone - block = 3443589120, count = 1
> Mar 26 00:07:19 acnlin82 kernel: EXT3-fs error (device sd(8,49)):
> ext3_free_blocks: Freeing blocks not in datazone - block = 2113834232, count = 1
> Mar 26 00:07:22 acnlin82 kernel: EXT3-fs error (device sd(8,49)):
> ext3_free_blocks: bit already cleared for block 49125

> //interspersed with some of these
> Mar 26 00:10:56 acnlin82 kernel: attempt to access beyond end of device
> Mar 26 00:10:56 acnlin82 kernel: 08:31: rw=0, want=1891463980, limit=1722264358
> Mar 26 00:10:56 acnlin82 kernel: attempt to access beyond end of device
> Mar 26 00:10:56 acnlin82 kernel: 08:31: rw=0, want=1824250576, limit=1722264358
> Mar 26 00:10:56 acnlin82 kernel: attempt to access beyond end of device

These indicate that the kernel ext3 code detected serious corruption of the
metadata on the filesystem.  In cases like this, if the filesystem doesn't
remount readonly (i.e. mounted with "-o errors=remount-ro") then it just
makes the corruption progressively worse.

It doesn't point to a root cause, however.

> Would it be a problem if the two 1.8TB systems appeared on one host?

No, some of our customers have hundreds of systems with two ext3 filesystems
of about this size, running on 2.4.21-RHEL3 kernels.  The LUNs exported from
the RAID storage are all under 2TB.  They have never reported similar problems
over several years of usage.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


From sev at bnl.gov  Thu Apr 13 14:40:25 2006
From: sev at bnl.gov (Sev Binello)
Date: Thu, 13 Apr 2006 10:40:25 -0400
Subject: ext3 filesystem corruption - more info
In-Reply-To: <20060413054056.GP17364@schatzie.adilger.int>
References: <443BDA84.7010102@bnl.gov> <443D8D28.3090202@bnl.gov>
	<20060413054056.GP17364@schatzie.adilger.int>
Message-ID: <443E62D9.4060404@bnl.gov>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20060413/6306d0da/attachment.htm>

From sev at bnl.gov  Thu Apr 13 19:54:50 2006
From: sev at bnl.gov (Sev Binello)
Date: Thu, 13 Apr 2006 15:54:50 -0400
Subject: ext3 filesystem corruption - more info
In-Reply-To: <20060413192909.GV17364@schatzie.adilger.int>
References: <443BDA84.7010102@bnl.gov> <443D8D28.3090202@bnl.gov>
	<20060413054056.GP17364@schatzie.adilger.int>
	<443E62D9.4060404@bnl.gov>
	<20060413192909.GV17364@schatzie.adilger.int>
Message-ID: <443EAC8A.9020209@bnl.gov>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20060413/187a483f/attachment.htm>

From sev at bnl.gov  Thu Apr 13 20:40:40 2006
From: sev at bnl.gov (Sev Binello)
Date: Thu, 13 Apr 2006 16:40:40 -0400
Subject: ext3 filesystem corruption - more info (in text)
In-Reply-To: <443EAC8A.9020209@bnl.gov>
References: <443BDA84.7010102@bnl.gov>
	<443D8D28.3090202@bnl.gov>	<20060413054056.GP17364@schatzie.adilger.int>	<443E62D9.4060404@bnl.gov>	<20060413192909.GV17364@schatzie.adilger.int>
	<443EAC8A.9020209@bnl.gov>
Message-ID: <443EB748.20606@bnl.gov>


Sorry about all the html
Resending last message in text

Sev Binello wrote:
> Andreas Dilger wrote:
> 
>>On Apr 13, 2006  10:40 -0400, Sev Binello wrote:
>>[ still HTML-only email, extracting text from HTML is getting dull ]
>>  
>>
>>>Since it seemed to mount okay only 3mins earlier,<br>
>>>can we assume that it was initially uncorrupted ?<br>
>>>Or, is that not valid assumption ?<br>
>>>    
>>>
>>
>>No, at mount time there is only very cursory checking done of the group
>>descriptors and superblock.  The corruption reported appears to be from
>>bad indirect blocks.
>>
>>  
>>
>>>Is there anything that we can check, test etc...<br>
>>>any advice, action at this point is better than waiting for the next
>>>fileystem disaster to ocurr.<br>
>>>    
>>>
>>
>>Do you run with write cache enabled on your device?  That can potentially
>>cause filesystem corruption even in the face of ext3 journaling, because
>>the journal atomicity guarantees are lost when the device reports a write
>>is complete on disk when it really isn't.
>>  
>>
> The raid system does run with write back cache enabled.
> I don't believe the actual drives have this enabled,  but I'd have to check.
> 
> But we didn't actually lose power on the raid or hosts
> just the connecting switches, so we lost all communication.
> Presumably, in this situation  the controller cache should have been emptied
> Is my reasoning correct here ?
> 
> Either way, you are saying is best to avoid write cacheing in the future.
> 
> Also, in looking and comparing error msgs in the log files
> I noticed that on the host where the corruption occurred,
> the call to abort the journal didn't seem to actually happen for an hour
> Does that have any significance ?
> 
>     Mar 25 14:38:52 acnlin83 kernel: Error (-5) on journal on device 08:21
>     Mar 25 14:38:52 acnlin83 kernel: Aborting journal on device sd(8,33).
> 
>     1hr gap
>        Mar 25 15:39:19 acnlin83 kernel: ext3_abort called.
>       Mar 25 15:39:19 acnlin83 kernel: EXT3-fs abort (device sd(8,33)): 
> ext3_journal_start: Detected aborted journal
>        Mar 25 15:39:19 acnlin83 kernel: Remounting filesystem read-only
>         Mar 25 15:39:19 acnlin83 kernel: EXT3-fs error (device sd(8,33)) 
> in start_transaction: Journal has aborted
>  
> Thanks again
> -Sev
> 
>>Cheers, Andreas
>>--
>>Andreas Dilger
>>Principal Software Engineer
>>Cluster File Systems, Inc.
>>
>>  
>>
> 
> 
> -- 
> 
> Sev Binello
> Brookhaven National Laboratory
> Upton, New York
> 631-344-5647
> sev at bnl.gov
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users


-- 

Sev Binello
Brookhaven National Laboratory
Upton, New York
631-344-5647
sev at bnl.gov


From adilger at clusterfs.com  Thu Apr 13 22:12:10 2006
From: adilger at clusterfs.com (Andreas Dilger)
Date: Thu, 13 Apr 2006 16:12:10 -0600
Subject: ext3 filesystem corruption - more info (in text)
In-Reply-To: <443EB748.20606@bnl.gov>
References: <443BDA84.7010102@bnl.gov> <443D8D28.3090202@bnl.gov>
	<20060413054056.GP17364@schatzie.adilger.int>
	<443E62D9.4060404@bnl.gov>
	<20060413192909.GV17364@schatzie.adilger.int>
	<443EAC8A.9020209@bnl.gov> <443EB748.20606@bnl.gov>
Message-ID: <20060413221210.GA17364@schatzie.adilger.int>

On Apr 13, 2006  16:40 -0400, Sev Binello wrote:
>Andreas Dilger wrote:
>>Do you run with write cache enabled on your device?  That can potentially
>>cause filesystem corruption even in the face of ext3 journaling, because
>>the journal atomicity guarantees are lost when the device reports a write
>>is complete on disk when it really isn't.
>> 
>>
>The raid system does run with write back cache enabled.
>I don't believe the actual drives have this enabled,  but I'd have to 
>check.
>
>But we didn't actually lose power on the raid or hosts
>just the connecting switches, so we lost all communication.
>Presumably, in this situation  the controller cache should have been 
>emptied Is my reasoning correct here ?

Correct.  If your RAID has w/b cache enabled, but is battery backed, you
should be OK.

Beyond this, I'm not sure what else you can look at.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


From martial at server101.com  Thu Apr 13 22:30:19 2006
From: martial at server101.com (Martial Herbaut)
Date: Fri, 14 Apr 2006 08:30:19 +1000 (EST)
Subject: ext3 filesystem corruption - more info (in text)
In-Reply-To: <20060413221210.GA17364@schatzie.adilger.int>
Message-ID: <Pine.LNX.4.44.0604140814550.30444-100000@support.server101.com>


> >But we didn't actually lose power on the raid or hosts
> >just the connecting switches, so we lost all communication.
> >Presumably, in this situation  the controller cache should have been 
> >emptied Is my reasoning correct here ?
> 
> Correct.  If your RAID has w/b cache enabled, but is battery backed, you
> should be OK.
> 
> Beyond this, I'm not sure what else you can look at.
> 

don't mean to barge in, however I have seen similar corruption happen in 
the past where the fabric went away momentarily, like unplugging and 
replugging a fibre cable on a non-dualpath/failover setup but the host 
was not killed/rebooted. From memory the corruption was not immediately 
apparent and became so later. 

I think the best thing to do in that case scenario is force a reboot of 
the host and then force fsck as opposed to continuing on and hope for the 
best.

Martial Herbaut
---------------
Server101.com


From sev at bnl.gov  Fri Apr 14 14:21:56 2006
From: sev at bnl.gov (Sev Binello)
Date: Fri, 14 Apr 2006 10:21:56 -0400
Subject: ext3 filesystem corruption - more info (in text)
In-Reply-To: <Pine.LNX.4.44.0604140814550.30444-100000@support.server101.com>
References: <Pine.LNX.4.44.0604140814550.30444-100000@support.server101.com>
Message-ID: <443FB004.2040809@bnl.gov>


Thanks for the suggestion,
seems reasonable unfortunately on a operational system
it means a lot of down time,
but we end up there anyway.

Thanks
-Sev

Martial Herbaut wrote:
> 
>>>But we didn't actually lose power on the raid or hosts
>>>just the connecting switches, so we lost all communication.
>>>Presumably, in this situation  the controller cache should have been 
>>>emptied Is my reasoning correct here ?
>>
>>Correct.  If your RAID has w/b cache enabled, but is battery backed, you
>>should be OK.
>>
>>Beyond this, I'm not sure what else you can look at.
>>
> 
> 
> don't mean to barge in, however I have seen similar corruption happen in 
> the past where the fabric went away momentarily, like unplugging and 
> replugging a fibre cable on a non-dualpath/failover setup but the host 
> was not killed/rebooted. From memory the corruption was not immediately 
> apparent and became so later. 
> 
> I think the best thing to do in that case scenario is force a reboot of 
> the host and then force fsck as opposed to continuing on and hope for the 
> best.
> 
> Martial Herbaut
> ---------------
> Server101.com
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users


-- 

Sev Binello
Brookhaven National Laboratory
Upton, New York
631-344-5647
sev at bnl.gov


From jlb17 at duke.edu  Fri Apr 14 22:31:27 2006
From: jlb17 at duke.edu (Joshua Baker-LePain)
Date: Fri, 14 Apr 2006 18:31:27 -0400 (EDT)
Subject: Ext3 and 3ware RAID5
Message-ID: <Pine.LNX.4.62.0604141820040.27432@chaos.egr.duke.edu>

I run a decent amount of 3ware hardware, all under centos-4.  There seems 
to be some sort of fundamental disagreement between ext3 and 3ware's 
hardware RAID5 mode that trashes write performance.  As a representative 
example, one current setup is 2 9550SX-12 boards in hardware RAID5 mode 
(256KB stripe size) with a software RAID0 stripe on top (also 256KB 
chunks).  bonnie++ results look like this:

mount -t ext3
175 MB/s writes, 352 MB/s reads

mount -t ext3 -o data=writeback
185 MB/s writes, 254 MB/s reads

mount -t ext2
340 MB writes, 266 MB/s reads

XFS on this hardware gets (untuned) about 300 MB/s writes and 400 MB/s 
reads.  The hardware itself is capable of more, and those results are 
representative of several different configs of hardware and software RAID 
options.

Any ideas as to what leads to ext3's performance hit?  I've tested *lots* 
of configurations.  It's not the md layer -- 1 card setups see the same 
performance hit.

Thanks.

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


From alex at alex.org.uk  Sat Apr 15 11:18:00 2006
From: alex at alex.org.uk (Alex Bligh)
Date: Sat, 15 Apr 2006 12:18:00 +0100
Subject: Ext3 and 3ware RAID5
In-Reply-To: <Pine.LNX.4.62.0604141820040.27432@chaos.egr.duke.edu>
References: <Pine.LNX.4.62.0604141820040.27432@chaos.egr.duke.edu>
Message-ID: <B712C2FA45F5C886AC0E072E@[192.168.100.25]>


--On 14 April 2006 18:31 -0400 Joshua Baker-LePain <jlb17 at duke.edu> wrote:

> Any ideas as to what leads to ext3's performance hit?  I've tested *lots*
> of configurations.  It's not the md layer -- 1 card setups see the same
> performance hit.

No idea, but I suffer the same problem with a 9550SX-4 with SATA3
drives. I don't think ext3 is the problem as dd gives much the same
behaviour (you might want to try it). My reluctant conclusion is
that the 9550SX is just dog slow, which is why the drives are currently
sitting idle. I would love someone from 3ware to disprove this.

Alex


From jlb17 at duke.edu  Sat Apr 15 11:33:48 2006
From: jlb17 at duke.edu (Joshua Baker-LePain)
Date: Sat, 15 Apr 2006 07:33:48 -0400 (EDT)
Subject: Ext3 and 3ware RAID5
In-Reply-To: <B712C2FA45F5C886AC0E072E@[192.168.100.25]>
References: <Pine.LNX.4.62.0604141820040.27432@chaos.egr.duke.edu>
	<B712C2FA45F5C886AC0E072E@[192.168.100.25]>
Message-ID: <Pine.LNX.4.62.0604150730590.20274@chaos.egr.duke.edu>

On Sat, 15 Apr 2006 at 12:18pm, Alex Bligh wrote

>
>
> --On 14 April 2006 18:31 -0400 Joshua Baker-LePain <jlb17 at duke.edu> wrote:
>
>> Any ideas as to what leads to ext3's performance hit?  I've tested *lots*
>> of configurations.  It's not the md layer -- 1 card setups see the same
>> performance hit.
>
> No idea, but I suffer the same problem with a 9550SX-4 with SATA3
> drives. I don't think ext3 is the problem as dd gives much the same
> behaviour (you might want to try it). My reluctant conclusion is
> that the 9550SX is just dog slow, which is why the drives are currently
> sitting idle. I would love someone from 3ware to disprove this.

When ext2 is almost 2X faster than ext3 at writing, it points pretty 
firmly at something in the journaling code (IMO).  And a journaling FS can 
do decently with this hardware, as XFS also gets ~300MB/s writing.

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


From alex at alex.org.uk  Sat Apr 15 11:44:58 2006
From: alex at alex.org.uk (Alex Bligh)
Date: Sat, 15 Apr 2006 12:44:58 +0100
Subject: Ext3 and 3ware RAID5
In-Reply-To: <Pine.LNX.4.62.0604150730590.20274@chaos.egr.duke.edu>
References: <Pine.LNX.4.62.0604141820040.27432@chaos.egr.duke.edu>
	<B712C2FA45F5C886AC0E072E@[192.168.100.25]>
	<Pine.LNX.4.62.0604150730590.20274@chaos.egr.duke.edu>
Message-ID: <9EDC1C2D4A1EBDB4CC54DF2B@[192.168.100.25]>


--On 15 April 2006 07:33 -0400 Joshua Baker-LePain <jlb17 at duke.edu> wrote:

>> No idea, but I suffer the same problem with a 9550SX-4 with SATA3
>> drives. I don't think ext3 is the problem as dd gives much the same
>> behaviour (you might want to try it). My reluctant conclusion is
>> that the 9550SX is just dog slow, which is why the drives are currently
>> sitting idle. I would love someone from 3ware to disprove this.
>
> When ext2 is almost 2X faster than ext3 at writing, it points pretty
> firmly at something in the journaling code (IMO).  And a journaling FS
> can do decently with this hardware, as XFS also gets ~300MB/s writing.

Sorry forgot to address that point. Writes seem to be especially slow
(there is/was some stuff on the 3-ware site saying slow writes under Linux
were a known problem). I am presuming that ext3 simply does more writing
(even if the extra writes are small and discontiguous but numerous) than
ext2 due to journalling, and this shows up the poor performance.

You might run some benchmarks on the raw partition and see just how
slow writes are. You might also take a look at the 3ware site as I
think there might have been some tuning options they suggested which
allegedly improved things (I gave up) - stripe size comes to mind.

Anyway, do let me know if you find the answer...

Alex


From julius.junghans at gmx.de  Sat Apr 15 11:54:05 2006
From: julius.junghans at gmx.de (julius Junghans)
Date: Sat, 15 Apr 2006 13:54:05 +0200
Subject: Partition not recognized by mount
Message-ID: <4440DEDD.1030706@gmx.de>

Hi,

somehow after a power failure i can't mount my ext3 partition :(

mount /dev/hdd2 /mnt/gentoo/
mount: you must specify the filesystem type

fdisk /dev/hdd

The number of cylinders for this disk is set to 484521.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/hdd: 250.0 GB, 250059350016 bytes
16 heads, 63 sectors/track, 484521 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hdd1               1         970      488848+  83  Linux
/dev/hdd2             971      155114    77688576   83  Linux


mount -t ext3 /dev/hdd2 /mnt/gentoo/
mount: wrong fs type, bad option, bad superblock on /dev/hdd2,
       missing codepage or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so


dmesg:
VFS: Can't find ext3 filesystem on dev hdd2.
VFS: Can't find ext3 filesystem on dev hdd2.


What can i do to get my data back?

Julius


From seanos at seanos.net  Sat Apr 15 11:57:20 2006
From: seanos at seanos.net (Sean O Sullivan)
Date: Sat, 15 Apr 2006 12:57:20 +0100
Subject: Ext3 and 3ware RAID5
In-Reply-To: <9EDC1C2D4A1EBDB4CC54DF2B@[192.168.100.25]>
References: <Pine.LNX.4.62.0604141820040.27432@chaos.egr.duke.edu>	<B712C2FA45F5C886AC0E072E@[192.168.100.25]>	<Pine.LNX.4.62.0604150730590.20274@chaos.egr.duke.edu>
	<9EDC1C2D4A1EBDB4CC54DF2B@[192.168.100.25]>
Message-ID: <4440DFA0.5070703@seanos.net>

Alex Bligh wrote:
> 
> 
> --On 15 April 2006 07:33 -0400 Joshua Baker-LePain <jlb17 at duke.edu> wrote:
> 
>>> No idea, but I suffer the same problem with a 9550SX-4 with SATA3
>>> drives. I don't think ext3 is the problem as dd gives much the same
>>> behaviour (you might want to try it). My reluctant conclusion is
>>> that the 9550SX is just dog slow, which is why the drives are currently
>>> sitting idle. I would love someone from 3ware to disprove this.
>>
>> When ext2 is almost 2X faster than ext3 at writing, it points pretty
>> firmly at something in the journaling code (IMO).  And a journaling FS
>> can do decently with this hardware, as XFS also gets ~300MB/s writing.
> 
I had similar problems with 9500S-8, and searched about, and eventually 
found some useful information.
Try mounting the volume with the 'noreservation' option.

Also, it is well worth your time setting 'blockdev'
for example : blockdev --setra 20000 /dev/sde
and this put this in /etc/rc.local

Note blockdev is really something you have to just mess/experiment with.
I went from 12000-26000 and some of the differences between between 
1000, or even 500 at times was amazing.
My volume is used for storage-only, write speed not too important, so 
really only did this out of curiosity.

There is also a fair bit of interesting information in 3ware's knowledge 
base.

Regards,

Sean


From jlb17 at duke.edu  Sat Apr 15 12:23:38 2006
From: jlb17 at duke.edu (Joshua Baker-LePain)
Date: Sat, 15 Apr 2006 08:23:38 -0400 (EDT)
Subject: Ext3 and 3ware RAID5
In-Reply-To: <4440DFA0.5070703@seanos.net>
References: <Pine.LNX.4.62.0604141820040.27432@chaos.egr.duke.edu>
	<B712C2FA45F5C886AC0E072E@[192.168.100.25]>
	<Pine.LNX.4.62.0604150730590.20274@chaos.egr.duke.edu>
	<9EDC1C2D4A1EBDB4CC54DF2B@[192.168.100.25]>
	<4440DFA0.5070703@seanos.net>
Message-ID: <Pine.LNX.4.62.0604150821160.20274@chaos.egr.duke.edu>

On Sat, 15 Apr 2006 at 12:57pm, Sean O Sullivan wrote

> Alex Bligh wrote:
>> 
>> --On 15 April 2006 07:33 -0400 Joshua Baker-LePain <jlb17 at duke.edu> wrote:
>> 
>>>> No idea, but I suffer the same problem with a 9550SX-4 with SATA3
>>>> drives. I don't think ext3 is the problem as dd gives much the same
>>>> behaviour (you might want to try it). My reluctant conclusion is
>>>> that the 9550SX is just dog slow, which is why the drives are currently
>>>> sitting idle. I would love someone from 3ware to disprove this.
>>> 
>>> When ext2 is almost 2X faster than ext3 at writing, it points pretty
>>> firmly at something in the journaling code (IMO).  And a journaling FS
>>> can do decently with this hardware, as XFS also gets ~300MB/s writing.
>> 
> I had similar problems with 9500S-8, and searched about, and eventually found 
> some useful information.
> Try mounting the volume with the 'noreservation' option.

AFAIK, that bug was fixed in the most recent centos/RHEL kernel.  In any 
case, noreservation made no difference.

> Also, it is well worth your time setting 'blockdev'
> for example : blockdev --setra 20000 /dev/sde
> and this put this in /etc/rc.local

Yeah, I've already played around with blockdev a lot.  It made some 
difference, but nothing extraordinary.


-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


From keld at dkuug.dk  Sun Apr 16 12:30:29 2006
From: keld at dkuug.dk (Keld =?iso-8859-1?Q?J=F8rn?= Simonsen)
Date: Sun, 16 Apr 2006 14:30:29 +0200
Subject: e2fsck dies with signal 11
Message-ID: <20060416123029.GA11999@rap.rap.dk>

Hi 

I got a strange error, happening on two of my ext3 partitions.
What can be wrong? And why does e2fsck error out, instead of displaying
an error message?

Best regards
keld

fsck /dev/hda6
fsck 1.38 (30-Jun-2005)
e2fsck 1.38 (30-Jun-2005)
Warning... fsck.ext3 for device /dev/hda6 exited with signal 11.


also From my dmesg:


 <1>general protection fault: e7a8 [#3]
Modules linked in: i915 drm snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd soundcore lp parport_pc ppdev parport ipt_REJECT ipt_LOG ipt_state ipt_pkttype ipt_set ipt_CONNMARK ipt_MARK ipt_ROUTE ipt_connmark ipt_owner ipt_recent ipt_iprange ipt_physdev ipt_multiport ipt_conntrack iptable_mangle ip_set_portmap ip_set_macipmap ip_set_ipmap ip_set_iphash ip_set ip_nat_irc ip_nat_tftp ip_nat_ftp iptable_nat ip_conntrack_irc ip_conntrack_tftp ip_conntrack_ftp ip_conntrack iptable_filter ip_tables 8139too mii af_packet ide_cd loop ext3 jbd nls_iso8859_1 nls_cp850 vfat fat intel_agp nvram amd64_agp agpgart evdev bttv video_buf firmware_class i2c_algo_bit v4l2_common btcx_risc tveeprom i2c_core videodev dm_mod sata_vsc sata_via sata_svw sata_sil sata_promise sata_nv sx8 sata_uli sata_sx4 sata_sis sata_qstor pata_pdc2027x ahci BusLogic aic7xxx scsi_transport_spi sg sr_mod cdrom ata_piix libata reiserfs usb_storage sd_mod scsi_mod usbhid ohci_hcd ehci_hcd uhci_hcd usbcore
CPU:    0
EIP:    00c0:[<000023c1>]    Not tainted VLI
EFLAGS: 00210046   (2.6.12-oci6.mdk-i586-up-1GB) 
EIP is at 0x23c1
eax: 00000292   ebx: 00000001   ecx: 00000000   edx: 00000000
esi: ffffffff   edi: 00200014   ebp: bc569e5c   esp: bc569e54
ds: 00c8   es: 0000   ss: 0068
Process fsck.ext3 (pid: 4220, threadinfo=bc568000 task=b2f13020)
Stack: 462c44b1 00009e5c 000000c8 ffff0292 9e7000c0 00000001 530a0000 00200016 
       00b8467c 00000000 bc569ebc b0111311 00000060 bc569ebc 00200292 b11e007b 
       0020007b 00000000 b1292d98 00000000 00000000 a7df0000 bc560000 bc569f1a 
Call Trace:
 [<b010425b>] show_stack+0x9b/0xb0
 [<b01043ab>] show_registers+0x11b/0x190
 [<b0104575>] die+0xb5/0x130
 [<b0104d1a>] do_general_protection+0x13a/0x160
 [<b0103e5f>] error_code+0x4f/0x60
 [<ffff0292>] 0xffff0292
Code:  Bad EIP value.


From coywolf at sosdg.org  Sun Apr 16 13:55:54 2006
From: coywolf at sosdg.org (Coywolf Qi Hunt)
Date: Sun, 16 Apr 2006 09:55:54 -0400
Subject: Partition not recognized by mount
In-Reply-To: <4440DEDD.1030706@gmx.de>
References: <4440DEDD.1030706@gmx.de>
Message-ID: <20060416135554.GA30746@everest.sosdg.org>

On Sat, Apr 15, 2006 at 01:54:05PM +0200, julius Junghans wrote:
> Hi,
> 
> somehow after a power failure i can't mount my ext3 partition :(
> 
> mount /dev/hdd2 /mnt/gentoo/
> mount: you must specify the filesystem type
> 
> fdisk /dev/hdd
> 
> The number of cylinders for this disk is set to 484521.
> There is nothing wrong with that, but this is larger than 1024,
> and could in certain setups cause problems with:
> 1) software that runs at boot time (e.g., old versions of LILO)
> 2) booting and partitioning software from other OSs
>    (e.g., DOS FDISK, OS/2 FDISK)
> 
> Command (m for help): p
> 
> Disk /dev/hdd: 250.0 GB, 250059350016 bytes
> 16 heads, 63 sectors/track, 484521 cylinders
> Units = cylinders of 1008 * 512 = 516096 bytes
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/hdd1               1         970      488848+  83  Linux
> /dev/hdd2             971      155114    77688576   83  Linux
> 
> 
> mount -t ext3 /dev/hdd2 /mnt/gentoo/
> mount: wrong fs type, bad option, bad superblock on /dev/hdd2,
>        missing codepage or other error
>        In some cases useful info is found in syslog - try
>        dmesg | tail  or so
> 
> 
> dmesg:
> VFS: Can't find ext3 filesystem on dev hdd2.
> VFS: Can't find ext3 filesystem on dev hdd2.
> 
> 
> What can i do to get my data back?
> 
> Julius

What were you doing before the power failure?

I have had lost my super block before too. And I did get my filesystem back.
To get your filesystem back, you need to locate your backup superblocks.
I wrote a simple program to find my superblock last time.  There is also one
in the e2fsprogs source package. Then you could use dd(1) to copy your backup
sb onto your primary sb. Or you could try mount with sb=n option. Good luck.


	Coywolf


From coywolf at sosdg.org  Sun Apr 16 14:18:38 2006
From: coywolf at sosdg.org (Coywolf Qi Hunt)
Date: Sun, 16 Apr 2006 10:18:38 -0400
Subject: e2fsck dies with signal 11
In-Reply-To: <20060416123029.GA11999@rap.rap.dk>
References: <20060416123029.GA11999@rap.rap.dk>
Message-ID: <20060416141838.GB30746@everest.sosdg.org>

On Sun, Apr 16, 2006 at 02:30:29PM +0200, Keld J?rn Simonsen wrote:
> Hi 
> 
> I got a strange error, happening on two of my ext3 partitions.
> What can be wrong? And why does e2fsck error out, instead of displaying
> an error message?
> 
> Best regards
> keld
> 
> fsck /dev/hda6
> fsck 1.38 (30-Jun-2005)
> e2fsck 1.38 (30-Jun-2005)
> Warning... fsck.ext3 for device /dev/hda6 exited with signal 11.

Please try with gdb to trace the problem.

	Coywolf


From tytso at mit.edu  Mon Apr 17 08:41:25 2006
From: tytso at mit.edu (Theodore Ts'o)
Date: Mon, 17 Apr 2006 04:41:25 -0400
Subject: e2fsck dies with signal 11
In-Reply-To: <20060416123029.GA11999@rap.rap.dk>
References: <20060416123029.GA11999@rap.rap.dk>
Message-ID: <20060417084125.GC13985@thunk.org>

The dmesg indicates that the kernel trapped a general protection fault
(GPF) in kernel space.  So this looks like some kind of kernel bug
which was triggered by e2fsck.  Unfortunately the EIP is invalid, so
it's hard to track down what might have caused it.  If this is
repeatable, I'd suggest using strace so we can see what e2fsck was
requesting of the kernel right before it triggered the kernel GPF
which killed the process.

						- Ted


From keld at dkuug.dk  Mon Apr 17 10:30:23 2006
From: keld at dkuug.dk (Keld =?iso-8859-1?Q?J=F8rn?= Simonsen)
Date: Mon, 17 Apr 2006 12:30:23 +0200
Subject: e2fsck dies with signal 11
In-Reply-To: <20060417084125.GC13985@thunk.org>
References: <20060416123029.GA11999@rap.rap.dk>
	<20060417084125.GC13985@thunk.org>
Message-ID: <20060417103023.GA6782@rap.rap.dk>

On Mon, Apr 17, 2006 at 04:41:25AM -0400, Theodore Ts'o wrote:
> The dmesg indicates that the kernel trapped a general protection fault
> (GPF) in kernel space.  So this looks like some kind of kernel bug
> which was triggered by e2fsck.  Unfortunately the EIP is invalid, so
> it's hard to track down what might have caused it.  If this is
> repeatable, I'd suggest using strace so we can see what e2fsck was
> requesting of the kernel right before it triggered the kernel GPF
> which killed the process.

OK, here are the last words of an strace:


open("/etc/mtab", O_RDONLY)             = 3
stat64("/dev/hda6", {st_mode=S_IFBLK|0660, st_rdev=makedev(3, 6), ...}) = 0
fstat64(3, {st_mode=S_IFREG|0644, st_size=524, ...}) = 0
mmap2(NULL, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xa7d5c000
read(3, "/dev/hda9 / reiserfs rw,noatime,"..., 131072) = 524
stat64("/dev/hda9", {st_mode=S_IFBLK|0660, st_rdev=makedev(3, 9), ...}) = 0
stat64("none", 0xafa32710)              = -1 ENOENT (No such file or directory)
stat64("none", 0xafa32710)              = -1 ENOENT (No such file or directory)
stat64("none", 0xafa32710)              = -1 ENOENT (No such file or directory)
stat64("/dev/hda1", {st_mode=S_IFBLK|0660, st_rdev=makedev(3, 1), ...}) = 0
stat64("/dev/hda10", {st_mode=S_IFBLK|0660, st_rdev=makedev(3, 10), ...}) = 0
stat64("/dev/hda11", {st_mode=S_IFBLK|0660, st_rdev=makedev(3, 11), ...}) = 0
stat64("/dev/hda2", {st_mode=S_IFBLK|0660, st_rdev=makedev(3, 2), ...}) = 0
stat64("/dev/hda3", {st_mode=S_IFBLK|0660, st_rdev=makedev(3, 3), ...}) = 0
stat64("/dev/hda5", {st_mode=S_IFBLK|0660, st_rdev=makedev(3, 5), ...}) = 0
read(3, "", 131072)                     = 0
stat64("/", {st_mode=S_IFDIR|0755, st_size=520, ...}) = 0
close(3)                                = 0
munmap(0xa7d5c000, 131072)              = 0
stat64("/dev/hda6", {st_mode=S_IFBLK|0660, st_rdev=makedev(3, 6), ...}) = 0
open("/dev/hda6", O_RDONLY|O_EXCL)      = 3
close(3)                                = 0
open("/dev/hda6", O_RDWR|O_LARGEFILE)   = 3
uname({sys="Linux", node="localhost", ...}) = 0
lseek(3, 1024, SEEK_SET)                = 1024
read(3, "\0\326\6\0\177\252\r\0\354\256\0\0002\v\1\0\350,\4\0\0"..., 1024) = 1024
lseek(3, 4096, SEEK_SET)                = 4096
read(3, "\2\0\0\0\3\0\0\0\4\0\0\0\0\0|;=\1\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
lseek(3, 16384, SEEK_SET)               = 16384
read(3, "\0\0\0\0\0\0\0\0\0\17.C\0\17.C\0\17.C\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
lseek(3, 2084864, SEEK_SET)             = 2084864
read(3, "\300;9\230\0\0\0\4\0\0\0\0\0\0\20\0\0\0@\0\0\0\0\1\0\2"..., 4096) = 4096
open("/dev/hda6", O_RDONLY|O_LARGEFILE) = 4
uname({sys="Linux", node="localhost", ...}) = 0
ioctl(4, 0x80041272, 0xafa32698)        = 0
close(4)                                = 0
open("/proc/apm", O_RDONLY)             = 4
fstat64(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xa7d7b000
read(4,  <unfinished ...>
+++ killed by SIGSEGV +++
          

best regards
keld


From tytso at mit.edu  Mon Apr 17 11:15:32 2006
From: tytso at mit.edu (Theodore Ts'o)
Date: Mon, 17 Apr 2006 07:15:32 -0400
Subject: e2fsck dies with signal 11
In-Reply-To: <20060417103023.GA6782@rap.rap.dk>
References: <20060416123029.GA11999@rap.rap.dk>
	<20060417084125.GC13985@thunk.org>
	<20060417103023.GA6782@rap.rap.dk>
Message-ID: <20060417111532.GA23376@thunk.org>

On Mon, Apr 17, 2006 at 12:30:23PM +0200, Keld J?rn Simonsen wrote:
> open("/proc/apm", O_RDONLY)             = 4
	...
> read(4,  <unfinished ...>

This was caused by e2fsck trying to read from /proc/apm to see whether
or not your system was running on batteries or not.  /proc/apm exists
(or the open would have returned an error), but reading from it
apparently causes a kernel oops.  This is definitely a kernel bug, and
I suspect can be reproduced by the shell command "cat /proc/apm".

Recompiling the kernel with CONFIG_APM disabled is probably the most
expedient answer, since for most systems ACPI is more functional (and
in some cases, required).  Indeed, the APM code has been sufferring
progressive bitrot, which probably explains the kernel oops.  You
could try sending a complaint to LKML if you really need APM
functionality for your laptop, and for some reason ACPI is not
sufficent for your needs.

Regards,

						- Ted


From sev at bnl.gov  Mon Apr 17 19:22:25 2006
From: sev at bnl.gov (Sev Binello)
Date: Mon, 17 Apr 2006 15:22:25 -0400
Subject: EXT3-fs unexpected failure msg ?
Message-ID: <4443EAF1.8080807@bnl.gov>


Hi -

   We have had a raid failure, we have some what recovered
   but we continue to see the following ext3 message...

Apr 17 14:59:14 acnlin84 kernel: EXT3-fs unexpected failure: (((jh2bh(jh))->b_state & (1UL << 
BH_Uptodate)) != 0);
Apr 17 14:59:14 acnlin84 kernel: Possible IO failure.


   Since we have experienced several instances of ext3 file system corruption
   when we lose total communication with our raid,
   we were wondering if there was any concrete advice out there
   on what to do in this situation.

Other messages we got before the ones above...
Apr 17 13:40:42 acnlin84 kernel: EXT3-fs error (device sd(8,33)): ext3_free_blocks: bit already 
cleared for block 14943160
Apr 17 13:40:42 acnlin84 kernel: EXT3-fs error (device sd(8,33)): ext3_free_blocks: bit already 
cleared for block 3703794

Apr 17 13:40:43 acnlin84 kernel: EXT3-fs error (device sd(8,65)): ext3_get_inode_loc: unable to read 
inode block - inode=50931914, block=101843272
-- 

Sev Binello
Brookhaven National Laboratory
Upton, New York
631-344-5647
sev at bnl.gov


From adilger at clusterfs.com  Mon Apr 17 23:51:56 2006
From: adilger at clusterfs.com (Andreas Dilger)
Date: Mon, 17 Apr 2006 17:51:56 -0600
Subject: EXT3-fs unexpected failure msg ?
In-Reply-To: <4443EAF1.8080807@bnl.gov>
References: <4443EAF1.8080807@bnl.gov>
Message-ID: <20060417235156.GO17364@schatzie.adilger.int>

On Apr 17, 2006  15:22 -0400, Sev Binello wrote:
>   We have had a raid failure, we have some what recovered
>   but we continue to see the following ext3 message...
> 
> Apr 17 14:59:14 acnlin84 kernel: EXT3-fs unexpected failure: 
> (((jh2bh(jh))->b_state & (1UL << BH_Uptodate)) != 0);
> Apr 17 14:59:14 acnlin84 kernel: Possible IO failure.
> 
> 
>   Since we have experienced several instances of ext3 file system corruption
>   when we lose total communication with our raid,
>   we were wondering if there was any concrete advice out there
>   on what to do in this situation.

You really, really, really need to mount your filesystem with
"-o errors=remount-ro", at least to prevent filesystem corruption.
I'm not sure if this is enough to prevent corruption in the case
of your RAID disconnects (if it doesn't generate errors up to the
filesystem, but still discards writes), but it is at least a minimum
requirement.

> Other messages we got before the ones above...
> Apr 17 13:40:42 acnlin84 kernel: EXT3-fs error (device sd(8,33)): 
> ext3_free_blocks: bit already cleared for block 14943160
> Apr 17 13:40:42 acnlin84 kernel: EXT3-fs error (device sd(8,33)): 
> ext3_free_blocks: bit already cleared for block 3703794
> 
> Apr 17 13:40:43 acnlin84 kernel: EXT3-fs error (device sd(8,65)): 
> ext3_get_inode_loc: unable to read inode block - inode=50931914, 
> block=101843272
> -- 

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


From menscher at uiuc.edu  Tue Apr 18 00:02:21 2006
From: menscher at uiuc.edu (Damian Menscher)
Date: Mon, 17 Apr 2006 19:02:21 -0500 (CDT)
Subject: EXT3-fs unexpected failure msg ?
In-Reply-To: <20060417235156.GO17364@schatzie.adilger.int>
References: <4443EAF1.8080807@bnl.gov>
	<20060417235156.GO17364@schatzie.adilger.int>
Message-ID: <Pine.LNX.4.63.0604171859340.25546@zeus.itg.uiuc.edu>

On Mon, 17 Apr 2006, Andreas Dilger wrote:
>
> You really, really, really need to mount your filesystem with
> "-o errors=remount-ro", at least to prevent filesystem corruption.
> I'm not sure if this is enough to prevent corruption in the case
> of your RAID disconnects (if it doesn't generate errors up to the
> filesystem, but still discards writes), but it is at least a minimum
> requirement.

Since this was so strongly-worded, I just did a random spot-check of 
some of our filesystems (RHEL4) and discovered they all have:

    Errors behavior:          Continue

in the superblock (and mount apparently takes that option).  This makes 
me curious: if it's so obvious that it should remount-ro on errors, why 
is the default (on RHEL4, at least) to continue?

Damian Menscher
-- 
-=#| <menscher at uiuc.edu> www.uiuc.edu/~menscher/ Ofc:(650)253-2757 |#=-
-=#| The above opinions are not necessarily those of my employers. |#=-


From sev at bnl.gov  Tue Apr 18 01:30:01 2006
From: sev at bnl.gov (Sev Binello)
Date: Mon, 17 Apr 2006 21:30:01 -0400
Subject: EXT3-fs unexpected failure msg ?
In-Reply-To: <Pine.LNX.4.63.0604171859340.25546@zeus.itg.uiuc.edu>
References: <4443EAF1.8080807@bnl.gov>	<20060417235156.GO17364@schatzie.adilger.int>
	<Pine.LNX.4.63.0604171859340.25546@zeus.itg.uiuc.edu>
Message-ID: <44444119.6000502@bnl.gov>

Damian Menscher wrote:
> On Mon, 17 Apr 2006, Andreas Dilger wrote:
> 
>>
>> You really, really, really need to mount your filesystem with
>> "-o errors=remount-ro", at least to prevent filesystem corruption.
>> I'm not sure if this is enough to prevent corruption in the case
>> of your RAID disconnects (if it doesn't generate errors up to the
>> filesystem, but still discards writes), but it is at least a minimum
>> requirement.
> 
> 
> Since this was so strongly-worded, I just did a random spot-check of 
> some of our filesystems (RHEL4) and discovered they all have:
> 
>    Errors behavior:          Continue
> 
> in the superblock (and mount apparently takes that option).  This makes 
> me curious: if it's so obvious that it should remount-ro on errors, why 
> is the default (on RHEL4, at least) to continue?
> 
> Damian Menscher

Aside from the fact that this is the current default setting for RHEL linux systems,
though maybe not the best,
my question/concern is that since there are sometimes trivial errors that we often
have to live with until we can take our operational systems down long enough to fsck,
will this option automatically put us in ro mode no matter how trivial the
problem is ?

Also, when we had the problem earlier today (i.e. the raid controller didn't failover
for about 20 mins), we did stop and fsck. But even so when we checked after it
was done, it still said state was "clean with errors" ?
We tried fscking again with no better results,
though when it started it said...
       "ext3 recovery flag clear but journal has data"
any advice here ?

Thanks
-Sev


-- 

Sev Binello
Brookhaven National Laboratory
Upton, New York
631-344-5647
sev at bnl.gov


From adilger at clusterfs.com  Tue Apr 18 08:31:11 2006
From: adilger at clusterfs.com (Andreas Dilger)
Date: Tue, 18 Apr 2006 02:31:11 -0600
Subject: EXT3-fs unexpected failure msg ?
In-Reply-To: <44444119.6000502@bnl.gov>
References: <4443EAF1.8080807@bnl.gov>
	<20060417235156.GO17364@schatzie.adilger.int>
	<Pine.LNX.4.63.0604171859340.25546@zeus.itg.uiuc.edu>
	<44444119.6000502@bnl.gov>
Message-ID: <20060418083111.GP17364@schatzie.adilger.int>

On Apr 17, 2006  21:30 -0400, Sev Binello wrote:
> Damian Menscher wrote:
> >On Mon, 17 Apr 2006, Andreas Dilger wrote:
> >>You really, really, really need to mount your filesystem with
> >>"-o errors=remount-ro", at least to prevent filesystem corruption.
> >>I'm not sure if this is enough to prevent corruption in the case
> >>of your RAID disconnects (if it doesn't generate errors up to the
> >>filesystem, but still discards writes), but it is at least a minimum
> >>requirement.
> >
> >Since this was so strongly-worded, I just did a random spot-check of 
> >some of our filesystems (RHEL4) and discovered they all have:
> >
> >   Errors behavior:          Continue
> >
> >in the superblock (and mount apparently takes that option).  This makes 
> >me curious: if it's so obvious that it should remount-ro on errors, why 
> >is the default (on RHEL4, at least) to continue?

It was only so strongly worded because Sev has had repeated failures of
the RAID hardware resulting in filesystem corruption, and it seems prudent
to stop the filesystem at the first inkling of corruption in this case.
Not all environments see so many problems, and the choice to use remount-ro
is up to the admin (though I believe Debian uses this as the default).

> my question/concern is that since there are sometimes trivial errors that 
> we often have to live with until we can take our operational systems down
> long enough to fsck, will this option automatically put us in ro mode no
> matter how trivial the problem is ?

This will only trigger on cases where there is a consistency error detected
in the ext3 metadata.  It doesn't affect regular IO errors for file data.

However, that said, it surprises me that you are getting any kind of errors,
even "trivial" ones, often.  I wouldn't consider a RAID system where you
often get errors to be very reliable.

> Also, when we had the problem earlier today (i.e. the raid controller 
> didn't failover for about 20 mins), we did stop and fsck.
> But even so when we checked after it was done, it still said state was
> "clean with errors" ?

When you run e2fsck, are you specifying the "-f" flag?  For ext3 filesystems,
an e2fsck (without -f) will normally not do a full filesystem check unless
the superblock has been flagged with an error.  This allows e2fsck to run
against the filesystem always at boot, but normally only do journal replay
(seconds at most) unless there was an error reported.

> We tried fscking again with no better results,
> though when it started it said...
>       "ext3 recovery flag clear but journal has data"
> any advice here ?

Run "e2fsck -f"?  I haven't seen this unless the superblock was corrupted
and had to be restored from backup or similar.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


From sev at bnl.gov  Tue Apr 18 13:57:46 2006
From: sev at bnl.gov (Sev Binello)
Date: Tue, 18 Apr 2006 09:57:46 -0400
Subject: EXT3-fs unexpected failure msg ?
In-Reply-To: <20060418083111.GP17364@schatzie.adilger.int>
References: <4443EAF1.8080807@bnl.gov>
	<20060417235156.GO17364@schatzie.adilger.int>
	<Pine.LNX.4.63.0604171859340.25546@zeus.itg.uiuc.edu>
	<44444119.6000502@bnl.gov>
	<20060418083111.GP17364@schatzie.adilger.int>
Message-ID: <4444F05A.8010800@bnl.gov>

Andreas Dilger wrote:
> On Apr 17, 2006  21:30 -0400, Sev Binello wrote:
> 
>>Damian Menscher wrote:
>>
>>>On Mon, 17 Apr 2006, Andreas Dilger wrote:
>>>
>>>>You really, really, really need to mount your filesystem with
>>>>"-o errors=remount-ro", at least to prevent filesystem corruption.
>>>>I'm not sure if this is enough to prevent corruption in the case
>>>>of your RAID disconnects (if it doesn't generate errors up to the
>>>>filesystem, but still discards writes), but it is at least a minimum
>>>>requirement.
>>>
>>>Since this was so strongly-worded, I just did a random spot-check of 
>>>some of our filesystems (RHEL4) and discovered they all have:
>>>
>>>  Errors behavior:          Continue
>>>
>>>in the superblock (and mount apparently takes that option).  This makes 
>>>me curious: if it's so obvious that it should remount-ro on errors, why 
>>>is the default (on RHEL4, at least) to continue?
> 
> 
> It was only so strongly worded because Sev has had repeated failures of
> the RAID hardware resulting in filesystem corruption, and it seems prudent
> to stop the filesystem at the first inkling of corruption in this case.
> Not all environments see so many problems, and the choice to use remount-ro
> is up to the admin (though I believe Debian uses this as the default).
> 
> 
>>my question/concern is that since there are sometimes trivial errors that 
>>we often have to live with until we can take our operational systems down
>>long enough to fsck, will this option automatically put us in ro mode no
>>matter how trivial the problem is ?
> 
> 
> This will only trigger on cases where there is a consistency error detected
> in the ext3 metadata.  It doesn't affect regular IO errors for file data.
> 
Ok, I'm assuming this would be any error reported in /var/log/messages
that is preceeded by EXT3-fs

> However, that said, it surprises me that you are getting any kind of errors,
> even "trivial" ones, often.  I wouldn't consider a RAID system where you
> often get errors to be very reliable.
> 
No arguement from us.
> 
>>Also, when we had the problem earlier today (i.e. the raid controller 
>>didn't failover for about 20 mins), we did stop and fsck.
>>But even so when we checked after it was done, it still said state was
>>"clean with errors" ?
> 
> 
> When you run e2fsck, are you specifying the "-f" flag?  For ext3 filesystems,
> an e2fsck (without -f) will normally not do a full filesystem check unless
> the superblock has been flagged with an error.  This allows e2fsck to run
> against the filesystem always at boot, but normally only do journal replay
> (seconds at most) unless there was an error reported.
> 
> 
>>We tried fscking again with no better results,
>>though when it started it said...
>>      "ext3 recovery flag clear but journal has data"
>>any advice here ?
> 
> 
> Run "e2fsck -f"?  I haven't seen this unless the superblock was corrupted
> and had to be restored from backup or similar.
> 
Will try it
Thanks

> Cheers, Andreas
> --
> Andreas Dilger
> Principal Software Engineer
> Cluster File Systems, Inc.
> 


-- 

Sev Binello
Brookhaven National Laboratory
Upton, New York
631-344-5647
sev at bnl.gov


From agupta at cs.ubc.ca  Tue Apr 18 20:27:50 2006
From: agupta at cs.ubc.ca (Abhishek Gupta)
Date: Tue, 18 Apr 2006 13:27:50 -0700 (PDT)
Subject: Use of journal->j_blk_offset
Message-ID: <Pine.GSO.4.60.0604181314350.17905@cascade.cs.ubc.ca>

Hi everyone,

So this question is more for people who are familiar with the internals of 
ext3.

I notice that the function journal_init_dev() sets the value

journal->j_blk_offset = start

This means that start can be any arbitrary block number on the device. 
However, later in the function journal_bmap() it is never actually used. 
The value of *retp in journal_bmap() is set to

*retp = blocknr; /* + journal->j_blk_offset */

A comment on the top of journal_bmap() says that the addition can be 
included in the above operation if so be the need. Is there any specific 
reason (related to performance etc) why it has not been done.

Please let me know.

Thanks

Abhishek


From jengelh at linux01.gwdg.de  Mon Apr 10 15:28:18 2006
From: jengelh at linux01.gwdg.de (Jan Engelhardt)
Date: Mon, 10 Apr 2006 17:28:18 +0200 (MEST)
Subject: deleting partition does not effect superblock?
In-Reply-To: <20060406065832.GK13324@lug-owl.de>
References: <1458d9610604052337p2cafa6c8j78fc6da8c5f8be1a@mail.gmail.com>
	<20060406065832.GK13324@lug-owl.de>
Message-ID: <Pine.LNX.4.61.0604101725130.922@yvahk01.tjqt.qr>


>deleted) or otherwise modified. So it's perfectly okay to delete such
>a container (eg. remove start and end from the partition table) and
>recreate it at some time later (by adding those values back to the
>partition table.)  As long as the new container starts at the same
>location, a filesystem driver will be able to find the old
>information. If you start a block later, it won't find it's
>superblocks.
>
If using a filesystem with replicated superblocks (ext*, xfs), then ...?
[Includes expecting weird breakage.]


Jan Engelhardt
-- 


From dlochart at gmail.com  Wed Apr 19 15:34:17 2006
From: dlochart at gmail.com (Doug Lochart)
Date: Wed, 19 Apr 2006 15:34:17 +0000
Subject: Max filesystem size for ext3 using Adaptec RAID 5 on 64 bit CentOS
Message-ID: <1e71f8880604190834k1759512as301503b7b3586c9b@mail.gmail.com>

We are strategizing a set of backup servers and I have been trying to
deduce wha the maximum size of each RAID 5 array should be to match
the OS we are using.  We are currently running CentOS 4.3 64 bit.  We
have planned a 2 TB RAID 5 array for testing but we will need to set
up several larger ones for production.  I have poked around and I see
people mention limits like 2TB max file size and 32TB max filesystem
size. Many of these were mention 2.4/2,5 kernels and neither specified
32 bit vs 64 bit.

Can someone please provide the following:

max file size for 2.6.9 64 bit CentOS kernel
max partition/fielsystem size for the same.

Thanks

Doug

--
What profits a man if he gains the whole world yet loses his soul?


From jlb17 at duke.edu  Wed Apr 19 15:58:50 2006
From: jlb17 at duke.edu (Joshua Baker-LePain)
Date: Wed, 19 Apr 2006 11:58:50 -0400 (EDT)
Subject: Max filesystem size for ext3 using Adaptec RAID 5 on 64 bit
 CentOS
In-Reply-To: <1e71f8880604190834k1759512as301503b7b3586c9b@mail.gmail.com>
References: <1e71f8880604190834k1759512as301503b7b3586c9b@mail.gmail.com>
Message-ID: <Pine.LNX.4.62.0604191157360.27432@chaos.egr.duke.edu>

On Wed, 19 Apr 2006 at 3:34pm, Doug Lochart wrote

> We are strategizing a set of backup servers and I have been trying to
> deduce wha the maximum size of each RAID 5 array should be to match
> the OS we are using.  We are currently running CentOS 4.3 64 bit.  We
> have planned a 2 TB RAID 5 array for testing but we will need to set
> up several larger ones for production.  I have poked around and I see
> people mention limits like 2TB max file size and 32TB max filesystem
> size. Many of these were mention 2.4/2,5 kernels and neither specified
> 32 bit vs 64 bit.
>
> Can someone please provide the following:
>
> max file size for 2.6.9 64 bit CentOS kernel
> max partition/fielsystem size for the same.

http://www.redhat.com/rhel/details/limits/

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


From keld at dkuug.dk  Fri Apr 21 09:55:53 2006
From: keld at dkuug.dk (Keld =?iso-8859-1?Q?J=F8rn?= Simonsen)
Date: Fri, 21 Apr 2006 11:55:53 +0200
Subject: e2fsck dies with signal 11
In-Reply-To: <20060417111532.GA23376@thunk.org>
References: <20060416123029.GA11999@rap.rap.dk>
	<20060417084125.GC13985@thunk.org>
	<20060417103023.GA6782@rap.rap.dk>
	<20060417111532.GA23376@thunk.org>
Message-ID: <20060421095553.GA28488@rap.rap.dk>

On Mon, Apr 17, 2006 at 07:15:32AM -0400, Theodore Ts'o wrote:
> On Mon, Apr 17, 2006 at 12:30:23PM +0200, Keld J?rn Simonsen wrote:
> > open("/proc/apm", O_RDONLY)             = 4
> 	...
> > read(4,  <unfinished ...>
> 
> This was caused by e2fsck trying to read from /proc/apm to see whether
> or not your system was running on batteries or not.  /proc/apm exists
> (or the open would have returned an error), but reading from it
> apparently causes a kernel oops.  This is definitely a kernel bug, and
> I suspect can be reproduced by the shell command "cat /proc/apm".
> 
> Recompiling the kernel with CONFIG_APM disabled is probably the most
> expedient answer, since for most systems ACPI is more functional (and
> in some cases, required).  Indeed, the APM code has been sufferring
> progressive bitrot, which probably explains the kernel oops.  You
> could try sending a complaint to LKML if you really need APM
> functionality for your laptop, and for some reason ACPI is not
> sufficent for your needs.

My problem here has vanished, I don't know why.

But why was e2fsck checking APM?
None of the other fs fsck's do, AFAIK.

Best regards
Keld


From keld at dkuug.dk  Fri Apr 21 10:00:00 2006
From: keld at dkuug.dk (Keld =?iso-8859-1?Q?J=F8rn?= Simonsen)
Date: Fri, 21 Apr 2006 12:00:00 +0200
Subject: problem with e2fsck not knowing xfs
Message-ID: <20060421100000.GB28488@rap.rap.dk>

Hi!

I had problem yesterday with e2fsck. 
It reported a bad superblock.
I then tried to use one of the other superblocks.
To no avail. 

Then later I remembered that I had switched the fs type to xfs.
Maybe e2fsck could recognize other common fs types,
and report this instead?

best regards
keld


From keld at dkuug.dk  Fri Apr 21 10:05:03 2006
From: keld at dkuug.dk (Keld =?iso-8859-1?Q?J=F8rn?= Simonsen)
Date: Fri, 21 Apr 2006 12:05:03 +0200
Subject: EXT2-fs warning (device hda6): ext2_fill_super: mounting ext3
	filesystem as ext2
Message-ID: <20060421100503.GA28673@rap.rap.dk>

I often get the message:

EXT2-fs warning (device hda6): ext2_fill_super: mounting ext3 filesystem as ext2

I have googled for a reason and a way to solve this -
but not found something I could use. Maybe somebody here konws
what to do?

best regards
keld


From herta.vandeneynde at cc.kuleuven.be  Fri Apr 21 15:10:38 2006
From: herta.vandeneynde at cc.kuleuven.be (Herta Van den Eynde)
Date: Fri, 21 Apr 2006 17:10:38 +0200
Subject: ext3 data=ordered - good enough for oracle?
Message-ID: <4448F5EE.7030106@cc.kuleuven.be>

Given that the default journaling mode of ext3 (i.e. ordered), does not 
guarantee write ordering after a crash, is this journaling mode safe 
enough to use for a database such as Oracle?  If so, how are out of sync 
writes delt with?

Kind regards,

Herta

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm


From jburgess at uklinux.net  Fri Apr 21 18:39:30 2006
From: jburgess at uklinux.net (Jon Burgess)
Date: Fri, 21 Apr 2006 19:39:30 +0100
Subject: EXT2-fs warning (device hda6): ext2_fill_super: mounting ext3
	filesystem as ext2
In-Reply-To: <20060421100503.GA28673@rap.rap.dk>
References: <20060421100503.GA28673@rap.rap.dk>
Message-ID: <1145644770.28767.7.camel@shark.home>

On Fri, 2006-04-21 at 12:05 +0200, Keld J?rn Simonsen wrote:
> I often get the message:
> 
> EXT2-fs warning (device hda6): ext2_fill_super: mounting ext3 filesystem as ext2
> 
> I have googled for a reason and a way to solve this -
> but not found something I could use. Maybe somebody here konws
> what to do?

This can happen for several reasons:-

1) Make sure you specify ext3 in /etc/fstab, e.g.
...
/dev/hda6          /boot       ext3    defaults        1 2

2) 'ext3' may not be compiled into your kernel (or the module may be
missing). What kernel are you using and did you compile it yourself?

3) You may be hard coding ext2 in some mount command. Make sure that the
filesystem is unspecified or set it to ext3, e.g.
$ mount -t ext3 /dev/hda6 /mnt/tmp

	Jon


From keld at dkuug.dk  Fri Apr 21 20:05:58 2006
From: keld at dkuug.dk (Keld =?iso-8859-1?Q?J=F8rn?= Simonsen)
Date: Fri, 21 Apr 2006 22:05:58 +0200
Subject: EXT2-fs warning (device hda6): ext2_fill_super: mounting ext3
	filesystem as ext2
In-Reply-To: <1145644770.28767.7.camel@shark.home>
References: <20060421100503.GA28673@rap.rap.dk>
	<1145644770.28767.7.camel@shark.home>
Message-ID: <20060421200558.GA7256@rap.rap.dk>

On Fri, Apr 21, 2006 at 07:39:30PM +0100, Jon Burgess wrote:
> On Fri, 2006-04-21 at 12:05 +0200, Keld J?rn Simonsen wrote:
> > I often get the message:
> > 
> > EXT2-fs warning (device hda6): ext2_fill_super: mounting ext3 filesystem as ext2
> > 
> > I have googled for a reason and a way to solve this -
> > but not found something I could use. Maybe somebody here konws
> > what to do?
> 
> This can happen for several reasons:-
> 
> 1) Make sure you specify ext3 in /etc/fstab, e.g.
> ...
> /dev/hda6          /boot       ext3    defaults        1 2

It was there as ext3.

> 2) 'ext3' may not be compiled into your kernel (or the module may be
> missing). What kernel are you using and did you compile it yourself?

ext3 is in the kernel. I did compile it myself. it is 2.6.3 from The
Suurce.

> 3) You may be hard coding ext2 in some mount command. Make sure that the
> filesystem is unspecified or set it to ext3, e.g.
> $ mount -t ext3 /dev/hda6 /mnt/tmp

I have not hard coded it. Anyway, mount report it as mounted as ext3.
hda6 is my root fs for my default system on that machine. It
may be only doring boot it mounts it as ext2. Still strange, and
then what about the journal, if my system was stopped unreglementary?

best regards
keld


From adilger at clusterfs.com  Sat Apr 22 20:29:34 2006
From: adilger at clusterfs.com (Andreas Dilger)
Date: Sat, 22 Apr 2006 14:29:34 -0600
Subject: problem with e2fsck not knowing xfs
In-Reply-To: <20060421100000.GB28488@rap.rap.dk>
References: <20060421100000.GB28488@rap.rap.dk>
Message-ID: <20060422202934.GC6075@schatzie.adilger.int>

On Apr 21, 2006  12:00 +0200, Keld J?rn Simonsen wrote:
> I had problem yesterday with e2fsck. 
> It reported a bad superblock.
> I then tried to use one of the other superblocks.
> To no avail. 
> 
> Then later I remembered that I had switched the fs type to xfs.
> Maybe e2fsck could recognize other common fs types,
> and report this instead?

Or, maybe you can change your /etc/fstab to report the filesystem type
as xfs.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


From adilger at clusterfs.com  Sat Apr 22 20:30:08 2006
From: adilger at clusterfs.com (Andreas Dilger)
Date: Sat, 22 Apr 2006 14:30:08 -0600
Subject: EXT2-fs warning (device hda6): ext2_fill_super: mounting ext3
	filesystem as ext2
In-Reply-To: <20060421100503.GA28673@rap.rap.dk>
References: <20060421100503.GA28673@rap.rap.dk>
Message-ID: <20060422203008.GD6075@schatzie.adilger.int>

On Apr 21, 2006  12:05 +0200, Keld J?rn Simonsen wrote:
> I often get the message:
> 
> EXT2-fs warning (device hda6): ext2_fill_super: mounting ext3 filesystem as ext2
> 
> I have googled for a reason and a way to solve this -
> but not found something I could use. Maybe somebody here konws
> what to do?

It means your initrd (or /etc/fstab) is mounting an ext3 filesystem as ext2.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


From keld at dkuug.dk  Sat Apr 22 20:38:55 2006
From: keld at dkuug.dk (Keld =?iso-8859-1?Q?J=F8rn?= Simonsen)
Date: Sat, 22 Apr 2006 22:38:55 +0200
Subject: problem with e2fsck not knowing xfs
In-Reply-To: <20060422202934.GC6075@schatzie.adilger.int>
References: <20060421100000.GB28488@rap.rap.dk>
	<20060422202934.GC6075@schatzie.adilger.int>
Message-ID: <20060422203855.GA17657@rap.rap.dk>

On Sat, Apr 22, 2006 at 02:29:34PM -0600, Andreas Dilger wrote:
> On Apr 21, 2006  12:00 +0200, Keld J???rn Simonsen wrote:
> > I had problem yesterday with e2fsck. 
> > It reported a bad superblock.
> > I then tried to use one of the other superblocks.
> > To no avail. 
> > 
> > Then later I remembered that I had switched the fs type to xfs.
> > Maybe e2fsck could recognize other common fs types,
> > and report this instead?
> 
> Or, maybe you can change your /etc/fstab to report the filesystem type
> as xfs.

Of cause I did so, to make it work.

And I promise to never ever again make errors.

Well, I am just asking for a more intelligent error message than 
bad superblock. I think some of the other mkfs programs do so.

best regards
keld


From johann.lombardi at bull.net  Sun Apr 23 00:15:55 2006
From: johann.lombardi at bull.net (Johann Lombardi)
Date: Sun, 23 Apr 2006 02:15:55 +0200
Subject: ext3 data=ordered - good enough for oracle?
In-Reply-To: <4448F5EE.7030106@cc.kuleuven.be>
References: <4448F5EE.7030106@cc.kuleuven.be>
Message-ID: <20060423001555.GK11497@lombardij>

> Given that the default journaling mode of ext3 (i.e. ordered), does not 
> guarantee write ordering after a crash, is this journaling mode safe 
> enough to use for a database such as Oracle?  If so, how are out of sync 
> writes delt with?

Oracle manages its own I/O cache in userspace and handles data coherency related
to that. So data=journal is useless in this case.
I guess databases such as Oracle uses O_SYNC to control the flushing of data 
or even O_DIRECT to bypass the kernel cache.

Johann


From tytso at mit.edu  Sat Apr 22 08:37:57 2006
From: tytso at mit.edu (Theodore Ts'o)
Date: Sat, 22 Apr 2006 04:37:57 -0400
Subject: e2fsck dies with signal 11
In-Reply-To: <20060421095553.GA28488@rap.rap.dk>
References: <20060416123029.GA11999@rap.rap.dk>
	<20060417084125.GC13985@thunk.org>
	<20060417103023.GA6782@rap.rap.dk>
	<20060417111532.GA23376@thunk.org>
	<20060421095553.GA28488@rap.rap.dk>
Message-ID: <20060422083756.GA8519@thunk.org>

On Fri, Apr 21, 2006 at 11:55:53AM +0200, Keld J?rn Simonsen wrote:
> But why was e2fsck checking APM?

E2fsck will delay doing a full filesystem check based on number of
mounts or time since last full filesystem check if APM or ACPI reports
that the laptop is running on battery.  Eventually, if the user is
always booting without being connected to AC mains, e2fsck will force
a check anyway, but for most usage patterns it means that the check is
delayed for only a few boots until the user can boot while connected
to AC power.

							- Ted


From herta.vandeneynde at cc.kuleuven.be  Sun Apr 23 21:46:30 2006
From: herta.vandeneynde at cc.kuleuven.be (Herta Van den Eynde)
Date: Sun, 23 Apr 2006 23:46:30 +0200
Subject: ext3 data=ordered - good enough for oracle?
In-Reply-To: <20060423001555.GK11497@lombardij>
References: <4448F5EE.7030106@cc.kuleuven.be>
	<20060423001555.GK11497@lombardij>
Message-ID: <444BF5B6.8080505@cc.kuleuven.be>

Johann Lombardi wrote:
>>Given that the default journaling mode of ext3 (i.e. ordered), does not 
>>guarantee write ordering after a crash, is this journaling mode safe 
>>enough to use for a database such as Oracle?  If so, how are out of sync 
>>writes delt with?
> 
> 
> Oracle manages its own I/O cache in userspace and handles data coherency related
> to that. So data=journal is useless in this case.
> I guess databases such as Oracle uses O_SYNC to control the flushing of data 
> or even O_DIRECT to bypass the kernel cache.
> 
> Johann
 >
Thanks for the reply, Johann, but given that Oracle is still using the 
filesystem (unless you use raw devices or ASM), what good does caching 
do in case of a hard crash?
The O_SYNC and O_DIRECT would help.  Is there any way to verify that 
this is what Oracle actually does?

(Reason I'm asking is that I had a number of corruptions during the past 
year, and I have better things to do at nights than restoring databases.)

Kind regards,

Herta


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm


From mkatiyar at gmail.com  Tue Apr 25 09:45:09 2006
From: mkatiyar at gmail.com (Manish Katiyar)
Date: Tue, 25 Apr 2006 15:15:09 +0530
Subject: Debugging file system using debugfs
Message-ID: <ea11fea30604250245w7951ee9an8e76e9cb174a6f9f@mail.gmail.com>

Hello friends,
        I am trying to learn recovering of file using debugfs. But even
though i delete the file and run lsdel in debugfs
it always gives me 0 deleted nodes found. Where am i making mistake?.

[root at windce7 linux-2.4.32]# fdisk -l

Disk /dev/hda: 40.0 GB, 40016019456 bytes
255 heads, 63 sectors/track, 4865 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/hda1   *         1        13    104391   83  Linux
/dev/hda2            14      4735  37929465   83  Linux
/dev/hda3          4736      4865   1044225   82  Linux swap
[root at windce7 linux-2.4.32]# debugfs /dev/hda2
debugfs 1.32 (09-Nov-2002)
debugfs:  lsdel
 Inode  Owner  Mode    Size    Blocks   Time deleted
0 deleted inodes found.
debugfs:


Please help me......I am new to this

--
Thanks & Regards,
********************************************
Manish Katiyar
Ozone 2, SP Infocity (Software Park),
New Survey #208 Manjari Stud Farms Ltd.,
Phursungi Village, Haveli Taluka, Saswad Road,
Hadapsar, Pune - 412308, India
***********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20060425/24f94111/attachment.htm>

From mkatiyar at gmail.com  Tue Apr 25 09:45:09 2006
From: mkatiyar at gmail.com (Manish Katiyar)
Date: Tue, 25 Apr 2006 15:15:09 +0530
Subject: Debugging file system using debugfs
Message-ID: <ea11fea30604250245w7951ee9an8e76e9cb174a6f9f@mail.gmail.com>

Hello friends,
        I am trying to learn recovering of file using debugfs. But even
though i delete the file and run lsdel in debugfs
it always gives me 0 deleted nodes found. Where am i making mistake?.

[root at windce7 linux-2.4.32]# fdisk -l

Disk /dev/hda: 40.0 GB, 40016019456 bytes
255 heads, 63 sectors/track, 4865 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/hda1   *         1        13    104391   83  Linux
/dev/hda2            14      4735  37929465   83  Linux
/dev/hda3          4736      4865   1044225   82  Linux swap
[root at windce7 linux-2.4.32]# debugfs /dev/hda2
debugfs 1.32 (09-Nov-2002)
debugfs:  lsdel
 Inode  Owner  Mode    Size    Blocks   Time deleted
0 deleted inodes found.
debugfs:


Please help me......I am new to this

--
Thanks & Regards,
********************************************
Manish Katiyar
Ozone 2, SP Infocity (Software Park),
New Survey #208 Manjari Stud Farms Ltd.,
Phursungi Village, Haveli Taluka, Saswad Road,
Hadapsar, Pune - 412308, India
***********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20060425/24f94111/attachment-0001.htm>

From mkatiyar at gmail.com  Tue Apr 25 09:45:09 2006
From: mkatiyar at gmail.com (Manish Katiyar)
Date: Tue, 25 Apr 2006 15:15:09 +0530
Subject: Debugging file system using debugfs
Message-ID: <ea11fea30604250245w7951ee9an8e76e9cb174a6f9f@mail.gmail.com>

Hello friends,
        I am trying to learn recovering of file using debugfs. But even
though i delete the file and run lsdel in debugfs
it always gives me 0 deleted nodes found. Where am i making mistake?.

[root at windce7 linux-2.4.32]# fdisk -l

Disk /dev/hda: 40.0 GB, 40016019456 bytes
255 heads, 63 sectors/track, 4865 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/hda1   *         1        13    104391   83  Linux
/dev/hda2            14      4735  37929465   83  Linux
/dev/hda3          4736      4865   1044225   82  Linux swap
[root at windce7 linux-2.4.32]# debugfs /dev/hda2
debugfs 1.32 (09-Nov-2002)
debugfs:  lsdel
 Inode  Owner  Mode    Size    Blocks   Time deleted
0 deleted inodes found.
debugfs:


Please help me......I am new to this

--
Thanks & Regards,
********************************************
Manish Katiyar
Ozone 2, SP Infocity (Software Park),
New Survey #208 Manjari Stud Farms Ltd.,
Phursungi Village, Haveli Taluka, Saswad Road,
Hadapsar, Pune - 412308, India
***********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20060425/24f94111/attachment-0002.htm>

From mkatiyar at gmail.com  Tue Apr 25 09:45:09 2006
From: mkatiyar at gmail.com (Manish Katiyar)
Date: Tue, 25 Apr 2006 15:15:09 +0530
Subject: Debugging file system using debugfs
Message-ID: <ea11fea30604250245w7951ee9an8e76e9cb174a6f9f@mail.gmail.com>

Hello friends,
        I am trying to learn recovering of file using debugfs. But even
though i delete the file and run lsdel in debugfs
it always gives me 0 deleted nodes found. Where am i making mistake?.

[root at windce7 linux-2.4.32]# fdisk -l

Disk /dev/hda: 40.0 GB, 40016019456 bytes
255 heads, 63 sectors/track, 4865 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/hda1   *         1        13    104391   83  Linux
/dev/hda2            14      4735  37929465   83  Linux
/dev/hda3          4736      4865   1044225   82  Linux swap
[root at windce7 linux-2.4.32]# debugfs /dev/hda2
debugfs 1.32 (09-Nov-2002)
debugfs:  lsdel
 Inode  Owner  Mode    Size    Blocks   Time deleted
0 deleted inodes found.
debugfs:


Please help me......I am new to this

--
Thanks & Regards,
********************************************
Manish Katiyar
Ozone 2, SP Infocity (Software Park),
New Survey #208 Manjari Stud Farms Ltd.,
Phursungi Village, Haveli Taluka, Saswad Road,
Hadapsar, Pune - 412308, India
***********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20060425/24f94111/attachment-0003.htm>

From johann.lombardi at bull.net  Tue Apr 25 14:49:50 2006
From: johann.lombardi at bull.net (Johann Lombardi)
Date: Tue, 25 Apr 2006 16:49:50 +0200
Subject: ext3 data=ordered - good enough for oracle?
In-Reply-To: <444BF5B6.8080505@cc.kuleuven.be>
References: <4448F5EE.7030106@cc.kuleuven.be>
	<20060423001555.GK11497@lombardij>
	<444BF5B6.8080505@cc.kuleuven.be>
Message-ID: <20060425144950.GB4037@chiva>

Hi Herta,

> Thanks for the reply, Johann, but given that Oracle is still using the
> filesystem (unless you use raw devices or ASM), what good does caching
> do in case of a hard crash?

It's handled at the application level.

> The O_SYNC and O_DIRECT would help.  Is there any way to verify that
> this is what Oracle actually does?

It does:
http://www.oracle.com/technology/tech/linux/htdocs/oracleonlinux_faq.html#8
http://asktom.oracle.com/pls/ask/f?p=4950:8:::::F4950_P8_DISPLAYID:618260965466
(thread entitled "Commited data not "guaranteed" ?")
http://www.redhat.com/magazine/013nov05/features/oracle/

You can google "Oracle O_SYNC" for more pointers (or do it yourself with strace
or gdb).

Johann


From adilger at clusterfs.com  Tue Apr 25 18:18:59 2006
From: adilger at clusterfs.com (Andreas Dilger)
Date: Tue, 25 Apr 2006 12:18:59 -0600
Subject: Debugging file system using debugfs
In-Reply-To: <ea11fea30604250245w7951ee9an8e76e9cb174a6f9f@mail.gmail.com>
References: <ea11fea30604250245w7951ee9an8e76e9cb174a6f9f@mail.gmail.com>
Message-ID: <20060425181859.GD6075@schatzie.adilger.int>

On Apr 25, 2006  15:15 +0530, Manish Katiyar wrote:
>         I am trying to learn recovering of file using debugfs. But even
> though i delete the file and run lsdel in debugfs
> it always gives me 0 deleted nodes found. Where am i making mistake?.

The ext3 implementation makes is basically impossible to recover deleted
files, unless you search the whole disk looking for the data that you
want to recover.  This is an implementation detail for truncate, and may
concievably be fixed (I've discussed an improvement to do this several
times), but nobody has ever worked on it.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


From zach.brown at oracle.com  Mon Apr 24 16:51:26 2006
From: zach.brown at oracle.com (Zach Brown)
Date: Mon, 24 Apr 2006 09:51:26 -0700
Subject: ext3 data=ordered - good enough for oracle?
In-Reply-To: <4448F5EE.7030106@cc.kuleuven.be>
References: <4448F5EE.7030106@cc.kuleuven.be>
Message-ID: <444D020E.6080907@oracle.com>

Herta Van den Eynde wrote:
> Given that the default journaling mode of ext3 (i.e. ordered), does not
> guarantee write ordering after a crash, is this journaling mode safe
> enough to use for a database such as Oracle?

Yes, the database doesn't rely the kind of functionality that
data=journaled provides that data=ordered doesn't.  data=ordered is fine.

> If so, how are out of sync writes delt with?

The database, just like ext3/jbd, implements its own consistency
mechanisms by careful write ordering.  ext3 uses in-kernel device APIs
to issue writes and find out when they're on disk, the database ideally
uses O_DIRECT.

I looked around otn.oracle.com to find a doc that talks about
configuring and verifying AIO+O_DIRECT in the database but got tired of
searching.  You might be able to find something if you're more patient
than I was.

- z


From danield at igb.uiuc.edu  Wed Apr 26 18:33:16 2006
From: danield at igb.uiuc.edu (Daniel Davidson)
Date: Wed, 26 Apr 2006 13:33:16 -0500
Subject: re-linking hard links
Message-ID: <1146076397.3241.9.camel@arthur.igb.uiuc.edu>

Hello, 


I have a situation where I have numerous files with numerous hard links 
to each of them on an ext3 RHEL4.2 system.  Some of these files are 
duplicates of the others.  I would like to re-link all of the 
duplicates to point to a single inode.  For instance if file1 has 
hardlinks link1 and link2, and file2 has hardlinks link3 and link4, I 
need to change it so that link1, link2 (these two are already correct), 
file2, link3, and link4 are all hardinks to file1.  The only 
information I have to start with are the inode numbers of file1 and 
file2 and the pathnames of file1 and file2. 


Any ideas beyond searching all of the filenames on the system and 
replacing them with the proper link?  That takes a long time. 


thanks, 


Dan 


From herta.vandeneynde at cc.kuleuven.be  Wed Apr 26 23:46:49 2006
From: herta.vandeneynde at cc.kuleuven.be (Herta Van den Eynde)
Date: Thu, 27 Apr 2006 01:46:49 +0200
Subject: re-linking hard links
In-Reply-To: <1146076397.3241.9.camel@arthur.igb.uiuc.edu>
References: <1146076397.3241.9.camel@arthur.igb.uiuc.edu>
Message-ID: <44500669.5080103@cc.kuleuven.be>

Daniel Davidson wrote:
> Hello, 
> 
> I have a situation where I have numerous files with numerous hard links 
> to each of them on an ext3 RHEL4.2 system.  Some of these files are 
> duplicates of the others.  I would like to re-link all of the 
> duplicates to point to a single inode.  
> For instance if file1 has 
> hardlinks link1 and link2, and file2 has hardlinks link3 and link4, I 
> need to change it so that link1, link2 (these two are already correct), 
> file2, link3, and link4 are all hardinks to file1.  The only 
> information I have to start with are the inode numbers of file1 and 
> file2 and the pathnames of file1 and file2. 

Not sure I understand properly.  It looks as though you want to compare 
every file on a given filesystem with every other file on that 
filesystem, and if they are duplicates, replace one of the actual files 
with a hard link to the other file.

> Any ideas beyond searching all of the filenames on the system and 
> replacing them with the proper link?  

Remember that hardlinks cannot cross filesystem borders.

> That takes a long time. 

I suppose you could write a script that cksums all files on the 
filesystem, sorts the output, and verifies that two files with the same 
cksum are actually the same.  If they are, it could ask whether it's OK 
to overwrite one of the files with a hardlink to the other.  And yes, 
depending on the size of your filesystem, that would take time.

Kind regards,

Herta

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm


From herta.vandeneynde at cc.kuleuven.be  Wed Apr 26 23:49:22 2006
From: herta.vandeneynde at cc.kuleuven.be (Herta Van den Eynde)
Date: Thu, 27 Apr 2006 01:49:22 +0200
Subject: ext3 data=ordered - good enough for oracle?
In-Reply-To: <444D020E.6080907@oracle.com>
References: <4448F5EE.7030106@cc.kuleuven.be> <444D020E.6080907@oracle.com>
Message-ID: <44500702.2040307@cc.kuleuven.be>

Thanks for your replies and pointers, Johann and Zach.  I hope to find 
time next week to study the extra information.

Kind regards,

Herta

Zach Brown wrote:
> Herta Van den Eynde wrote:
> 
>>Given that the default journaling mode of ext3 (i.e. ordered), does not
>>guarantee write ordering after a crash, is this journaling mode safe
>>enough to use for a database such as Oracle?
> 
> 
> Yes, the database doesn't rely the kind of functionality that
> data=journaled provides that data=ordered doesn't.  data=ordered is fine.
> 
> 
>>If so, how are out of sync writes delt with?
> 
> 
> The database, just like ext3/jbd, implements its own consistency
> mechanisms by careful write ordering.  ext3 uses in-kernel device APIs
> to issue writes and find out when they're on disk, the database ideally
> uses O_DIRECT.
> 
> I looked around otn.oracle.com to find a doc that talks about
> configuring and verifying AIO+O_DIRECT in the database but got tired of
> searching.  You might be able to find something if you're more patient
> than I was.
> 
> - z
> 

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm


From AjitN at ami.com  Wed Apr 26 18:20:20 2006
From: AjitN at ami.com (Ajit Narayanan)
Date: Wed, 26 Apr 2006 11:20:20 -0700
Subject: Kernel panic from EXT3 filesystem
Message-ID: <3225AF1B8CBF83459982D4987F1549CE01746C@fre-ops.us.megatrends.com>

Hi All,
 
I'm using FC3 with 2.6.9 SMP Kernel. The root file system is EXT3 and
the volume is a XFS volume. While doing IO over a NFS v3 share, with
'watch df' running in parallel, kernel panic at SLAB memory is noticed.
When searching Internet, I also noticed that similar KP are reported at
free_block function; but could not find fix for this.
 
For me, this issue appears intermittently. It happens when doing IO over
60 NFS shares.
 
>From the KP message it appears to be an issue in Linux SLAB memory
module. Can anyone suggest a solution for this issue?  Is this issue
already addressed in later kernel like 2.6.12?
 
------KP start------
kernel: Unable to handle kernel paging request at virtual address
49bec98e
kernel:  printing eip:
kernel: 02140b7c
kernel: *pde = 00000000
kernel: Oops: 0002 [#1]
kernel: SMP
kernel: Modules linked in: xfs i2c_i801 bccfg(U) dvm(U) sg st osst nfsd
exportfs lockd md5 ipv6 autofs4 i2c_dev i2c_core sunrpc iptable_filter
ip_tables dm_mod button battery ac sr_mod usb_storage uhci_hcd ehci_hcd
e1000 floppy ext3 jbd bcraid aic79xx sd_mod scsi_mod
kernel: CPU:    3
kernel: EIP:    0060:[<02140b7c>]    Tainted: PF  VLI
kernel: EFLAGS: 00010087   (2.6.9-1.667smp)
kernel: EIP is at free_block+0x62/0xd6
kernel: eax: 00000029   ebx: 41db3000   ecx: 03f1ccbb   edx: 00000000
kernel: esi: 41f6f280   edi: 0000003c   ebp: 00000011   esp: 3e384e10
kernel: ds: 007b   es: 007b   ss: 0068
kernel: Process atd (pid: 2831, threadinfo=3e384000 task=3e622930)
kernel: Stack: 41e0d010 39f19000 09fa4540 41e0d010 0000003c 02140c5e
41e0d000 41f6f280
kernel:        41e0d000 09fa4540 41e0d010 00000202 0214102d 09fa4548
00000000 3f1bb808
kernel:        2674d1e0 42d02886 2674d1e0 41f47e00 3d5b2320 3d11ce8c
42d0291e 1a7b8880
kernel: Call Trace:
kernel:  [<02140c5e>] cache_flusharray+0x6e/0x9c
kernel:  [<0214102d>] kfree+0x43/0x51
kernel:  [<42d02886>] free_rb_tree_fname+0x31/0x6c [ext3]
kernel:  [<42d0291e>] ext3_htree_free_dir_info+0x8/0x10 [ext3]
kernel:  [<42d02cd3>] ext3_release_dir+0xf/0x14 [ext3]
kernel:  [<021549ca>] __fput+0x55/0x100
kernel:  [<021536f4>] filp_close+0x59/0x5f
kernel:  [<02121350>] put_files_struct+0x57/0xc0
kernel:  [<02121f4e>] do_exit+0x227/0x3bd
kernel:  [<021221d2>] sys_exit_group+0x0/0xd
kernel:  [<021294f8>] get_signal_to_deliver+0x341/0x369
kernel:  [<02105e6c>] do_signal+0x55/0xd5
kernel:  [<0216396f>] filldir64+0x0/0x122
kernel:  [<0214f08a>] rw_vm+0x27e/0x28c
kernel:  [<0214f3a5>] put_user_size+0x29/0x2d
kernel:  [<02163b30>] sys_getdents64+0x9f/0xa9
kernel:  [<02105f14>] do_notify_resume+0x28/0x38
kernel: Code: 1c 8b 53 04 8b 03 89 50 04 89 02 31 d2 2b 4b 0c c7 03 00
01 10 00 c7 43 04 00 02 20 00 89 c8 f7 b6 b0 00 00 00 89 c1 0f b7 43 14
<66> 89 44 4b 18 8b 43 10 66 89 4b 14 48 85 c0 89 43 10 75 41 8b
------ KP End ------
 
 
Thanks in Advance
Srikumar
 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20060426/443777db/attachment.htm>

From danield at igb.uiuc.edu  Thu Apr 27 17:27:30 2006
From: danield at igb.uiuc.edu (Daniel Davidson)
Date: Thu, 27 Apr 2006 12:27:30 -0500
Subject: re-linking hard links
In-Reply-To: <44500669.5080103@cc.kuleuven.be>
References: <1146076397.3241.9.camel@arthur.igb.uiuc.edu>
	<44500669.5080103@cc.kuleuven.be>
Message-ID: <1146158850.3876.10.camel@arthur.igb.uiuc.edu>

Nope, I am only using one drive (with a single ext3 filesystem on it).
I know I can do a find -inum, but I was wondering if there was something
more efficient.

I am actually using an md5 checksum to find duplicate files, but then I
need to hunt down all their hard links.

Dan


On Thu, 2006-04-27 at 01:46 +0200, Herta Van den Eynde wrote:
> Daniel Davidson wrote:
> > Hello, 
> > 
> > I have a situation where I have numerous files with numerous hard links 
> > to each of them on an ext3 RHEL4.2 system.  Some of these files are 
> > duplicates of the others.  I would like to re-link all of the 
> > duplicates to point to a single inode.  
> > For instance if file1 has 
> > hardlinks link1 and link2, and file2 has hardlinks link3 and link4, I 
> > need to change it so that link1, link2 (these two are already correct), 
> > file2, link3, and link4 are all hardinks to file1.  The only 
> > information I have to start with are the inode numbers of file1 and 
> > file2 and the pathnames of file1 and file2. 
> 
> Not sure I understand properly.  It looks as though you want to compare 
> every file on a given filesystem with every other file on that 
> filesystem, and if they are duplicates, replace one of the actual files 
> with a hard link to the other file.
> 
> > Any ideas beyond searching all of the filenames on the system and 
> > replacing them with the proper link?  
> 
> Remember that hardlinks cannot cross filesystem borders.
> 
> > That takes a long time. 
> 
> I suppose you could write a script that cksums all files on the 
> filesystem, sorts the output, and verifies that two files with the same 
> cksum are actually the same.  If they are, it could ask whether it's OK 
> to overwrite one of the files with a hardlink to the other.  And yes, 
> depending on the size of your filesystem, that would take time.
> 
> Kind regards,
> 
> Herta
> 
> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm


From smb94532543 at w-lan.mine.nu  Thu Apr 27 17:46:39 2006
From: smb94532543 at w-lan.mine.nu (Niki Hammler)
Date: Thu, 27 Apr 2006 19:46:39 +0200
Subject: Whats this for a block?
Message-ID: <4451037F.1020109@stiftingtal.net>

Hi,

I have got a question concerning directory entries. I have the following 
block containing exactly the filenames I had in one specified folder on 
the same file system:

http://www.sbox.tugraz.at/home/n/nobaq/ext2.dat

I really hoped that this is an directory block which could point me to 
the inode of the files.

But when I try to extract the data, I only get garbage. I'm reading the 
block this way: First 4 bytes are pointer to inode, second 4 bytes are 
length of the name and the the rest is the name itself and so on.

The first two entries should be '.' and '..', so the name lengths should 
be only 1 and 2, shouldn't they?

Do you know what's this for a data block? I'm just reading the wrong way?

Is there a chance to reconstruct useful information from that data block?

Thank you very much in advance,

Nikolaus Hammler


From jburgess at uklinux.net  Thu Apr 27 20:04:28 2006
From: jburgess at uklinux.net (Jon Burgess)
Date: Thu, 27 Apr 2006 21:04:28 +0100
Subject: re-linking hard links
In-Reply-To: <1146158850.3876.10.camel@arthur.igb.uiuc.edu>
References: <1146076397.3241.9.camel@arthur.igb.uiuc.edu>
	<44500669.5080103@cc.kuleuven.be>
	<1146158850.3876.10.camel@arthur.igb.uiuc.edu>
Message-ID: <1146168268.28767.51.camel@shark.home>

On Thu, 2006-04-27 at 12:27 -0500, Daniel Davidson wrote:
> Nope, I am only using one drive (with a single ext3 filesystem on it).
> I know I can do a find -inum, but I was wondering if there was something
> more efficient.
> 
> I am actually using an md5 checksum to find duplicate files, but then I
> need to hunt down all their hard links.
> 
> Dan
> 

There are existing tools which do both the md5sum and hardlinking of
duplicates for you, e.g. http://www.sodarock.com/hardlink/

AFAIK ext3 doesn't have any idea of the md5's of any file, nor is there
any reference from the inode back to the directory entries.

If you were doing this regularly I guess you might be able to cache some
of this info in extended attributes but you'd have to make sure you kept
the info up to date.

	Jon


From sct at redhat.com  Thu Apr 27 20:52:51 2006
From: sct at redhat.com (Stephen C. Tweedie)
Date: Thu, 27 Apr 2006 21:52:51 +0100
Subject: Whats this for a block?
In-Reply-To: <4451037F.1020109@stiftingtal.net>
References: <4451037F.1020109@stiftingtal.net>
Message-ID: <1146171171.16140.43.camel@sisko.sctweedie.blueyonder.co.uk>

Hi,

On Thu, 2006-04-27 at 19:46 +0200, Niki Hammler wrote:

> I have got a question concerning directory entries. I have the following 
> block containing exactly the filenames I had in one specified folder on 
> the same file system:
> 
> http://www.sbox.tugraz.at/home/n/nobaq/ext2.dat
> 
> I really hoped that this is an directory block which could point me to 
> the inode of the files.

Yes, it is.

> But when I try to extract the data, I only get garbage. I'm reading the 
> block this way: First 4 bytes are pointer to inode, second 4 bytes are 
> length of the name and the the rest is the name itself and so on.

Not quite.  It's an ext2_dir_entry_2 struct from
linux/include/linux/ext2_fs.h :

        struct ext2_dir_entry_2 {
        	__le32	inode;			/* Inode number */
        	__le16	rec_len;		/* Directory entry length */
        	__u8	name_len;		/* Name length */
        	__u8	file_type;
        	char	name[EXT2_NAME_LEN];	/* File name */
        };
        
so yes, the first 4 bytes are the inode number; but then you've got a 2-
byte record length, which includes the 8 byte directory entry struct
plus the name length rounded up to the next 4 bytes (to keep the entries
4-byte aligned on disk); then the name length itself, and the inode
type, both of them just 1 byte long.

> The first two entries should be '.' and '..', so the name lengths should 
> be only 1 and 2, shouldn't they?

They are: looking at the "hexdump -C" of the data, I see

00000000  01 40 01 00 0c 00 01 02  2e 00 00 00 b6 c1 08 00  |. at ..............|
00000010  0c 00 02 02 2e 2e 00 00  02 40 01 00 14 00 09 01  |......... at ......|

so you've got inode number 0x00014001, then 0x000c = 12 bytes record
length, then a 1-byte name and file_type 2, EXT2_FT_DIR; then "." for
the name.  That completes the first record.  Then you have inode
0x0008c1b6, record length 12, name length 2 and file_type 2, for the
name "..".  And so on.

Cheers,
 Stephen


From asi.linux at yahoo.com  Sat Apr 29 01:01:07 2006
From: asi.linux at yahoo.com (Muhammad Asif)
Date: Sat, 29 Apr 2006 02:01:07 +0100 (BST)
Subject: Ext3 Variables
Message-ID: <20060429010107.14172.qmail@web38304.mail.mud.yahoo.com>

Hello ,
                i want to create a script that should automatically free the proxy partition i.e /var. I am able to create that script.
  But the problem is that i don't know through which variables i can check my partition's space that show me full detail i.e remaining size, etc. 
  One for that is df -h and other fdisk -l /dev/hdxx.
  But when i will use if state then which variable i will use for comparision. Please help me in this regards.Can u tell me any variables in ext3 that can be used to check partitions's size
  Thanks
   
  Muhammad Asif

Send instant messages to your online friends http://uk.messenger.yahoo.com 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20060429/48a46ce9/attachment.htm>

From asi.linux at yahoo.com  Sat Apr 29 20:58:12 2006
From: asi.linux at yahoo.com (Muhammad Asif)
Date: Sat, 29 Apr 2006 21:58:12 +0100 (BST)
Subject: ext3 variables
Message-ID: <20060429205812.27767.qmail@web38312.mail.mud.yahoo.com>

Hello ,
                i want to create a script that should automatically free the proxy partition i.e /var. I am able to create that script.
  But the problem is that i don't know through which variables i can check my partition's space that show me full detail i.e remaining size, etc. 
  One for that is df -h and other fdisk -l /dev/hdxx.
  But when i will use if state then which variable i will use for comparision. Please help me in this regards.Can u tell me any variables in ext3 that can be used to check partitions's size
  Thanks
   
  Muhammad Asif

Send instant messages to your online friends http://uk.messenger.yahoo.com 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20060429/32d4c93e/attachment.htm>

From herta.vandeneynde at cc.kuleuven.be  Sat Apr 29 22:41:08 2006
From: herta.vandeneynde at cc.kuleuven.be (Herta Van den Eynde)
Date: Sun, 30 Apr 2006 00:41:08 +0200
Subject: ext3 variables
In-Reply-To: <20060429205812.27767.qmail@web38312.mail.mud.yahoo.com>
References: <20060429205812.27767.qmail@web38312.mail.mud.yahoo.com>
Message-ID: <4453EB84.1040004@cc.kuleuven.be>

Hi Muhammad,

Not sure if I understand properly.  It looks like you're confused about 
the difference in size between what fdisk and df report.
If that's the case: fdisk shows you the size of the disk partitions. 
You create a filesystem on those partitions, and each filesystem has a 
specific overhead (superblock, inode tables,...).  I.e. df shows you 
what is actually available to the user of the filesystem.

Kind regards,

Herta

Muhammad Asif wrote:
> Hello ,
>               i want to create a script that should automatically free 
> the proxy partition i.e /var. I am able to create that script.
> But the problem is that i don't know through which variables i can check 
> my partition's space that show me full detail i.e remaining size, etc.
> One for that is df -h and other fdisk -l /dev/hdxx.
> But when i will use if state then which variable i will use for 
> comparision. Please help me in this regards.Can u tell me any variables 
> in ext3 that can be used to check partitions's size
> Thanks
>  
> Muhammad Asif
> 
> Send instant messages to your online friends http://uk.messenger.yahoo.com
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm