From juri at koschikode.com  Thu May  1 12:27:54 2008
From: juri at koschikode.com (Juri Haberland)
Date: Thu, 01 May 2008 14:27:54 +0200
Subject: ext3 limits?
In-Reply-To: <cd6f6630ed284d613b2be9f402fb0b0d.squirrel@housecafe.dyndns.org>
References: <4815ABC9.1060209@cesca.es>	<0d4f9140d1599a445f4caab3cda3ea97.squirrel@housecafe.dyndns.org>	<4817379F.2080003@redhat.com>
	<cd6f6630ed284d613b2be9f402fb0b0d.squirrel@housecafe.dyndns.org>
Message-ID: <4819B74A.5080503@koschikode.com>

Christian Kujau wrote:
> On Tue, April 29, 2008 16:58, Eric Sandeen wrote:
>> Actually as of 2.6.18 (or is it .19...), ext3 kernel code should support
>> the full 16T, at least in terms of being able to address that many blocks
>> w/o corruption.  I did a fair amount of work in that time frame to root
>> out all the sign overflows etc to allow ext3 to get to 16T.

> Then someone should either update the FAQ or better yet put the FAQ on
> e2fsprogs.sf.net (or wherever the Ext2 homepage resides).

I'll update the FAQ as soon as possible.

Juri


From lists at nerdbynature.de  Thu May  1 21:57:55 2008
From: lists at nerdbynature.de (Christian Kujau)
Date: Thu, 1 May 2008 23:57:55 +0200 (CEST)
Subject: ext3 limits?
In-Reply-To: <4819B74A.5080503@koschikode.com>
References: <4815ABC9.1060209@cesca.es>
	<0d4f9140d1599a445f4caab3cda3ea97.squirrel@housecafe.dyndns.org>
	<4817379F.2080003@redhat.com>
	<cd6f6630ed284d613b2be9f402fb0b0d.squirrel@housecafe.dyndns.org>
	<4819B74A.5080503@koschikode.com>
Message-ID: <alpine.DEB.1.10.0805012354330.11017@sheep.housecafe.de>

On Thu, 1 May 2008, Juri Haberland wrote:
> I'll update the FAQ as soon as possible.

Thanks for maintaining the FAQ! Oh, and it's even referenced by the Ext3 
Wikipedia article, so it really *is* important ;-)

Christian.
-- 
BOFH excuse #199:

the curls in your keyboard cord are losing electricity.


From tytso at MIT.EDU  Fri May  2 14:51:02 2008
From: tytso at MIT.EDU (Theodore Tso)
Date: Fri, 2 May 2008 10:51:02 -0400
Subject: ext3 limits?
In-Reply-To: <alpine.DEB.1.10.0805012354330.11017@sheep.housecafe.de>
References: <4815ABC9.1060209@cesca.es>
	<0d4f9140d1599a445f4caab3cda3ea97.squirrel@housecafe.dyndns.org>
	<4817379F.2080003@redhat.com>
	<cd6f6630ed284d613b2be9f402fb0b0d.squirrel@housecafe.dyndns.org>
	<4819B74A.5080503@koschikode.com>
	<alpine.DEB.1.10.0805012354330.11017@sheep.housecafe.de>
Message-ID: <20080502145101.GK17365@mit.edu>

On Thu, May 01, 2008 at 11:57:55PM +0200, Christian Kujau wrote:
> On Thu, 1 May 2008, Juri Haberland wrote:
>> I'll update the FAQ as soon as possible.
>
> Thanks for maintaining the FAQ! Oh, and it's even referenced by the Ext3 
> Wikipedia article, so it really *is* important ;-)

Juri, would you have any objections to moving the FAQ to
ext4.wiki.kernel.org?  It might be easier to update the FAQ there.
I'll note that since it was last updated in 2004, there are a number
of questions that are mostly out of date, such as all references to
Linux 2.2, and things like ``How do I convert the journal file from
version 1 to version 2?''.

With your permission, we can mirror the questions onto
ext4.wiki.kernel.org, amd then if you would be willing to put a
redirect to new location, that would be great.

Note that the ext4.wiki.kernel.org is designed to cover ext2/3 topics
in addition to ext4 issues.

					- Ted


From juri at koschikode.com  Fri May  2 21:31:51 2008
From: juri at koschikode.com (Juri Haberland)
Date: Fri, 02 May 2008 23:31:51 +0200
Subject: ext3 limits?
In-Reply-To: <20080502145101.GK17365@mit.edu>
References: <4815ABC9.1060209@cesca.es>	<0d4f9140d1599a445f4caab3cda3ea97.squirrel@housecafe.dyndns.org>	<4817379F.2080003@redhat.com>	<cd6f6630ed284d613b2be9f402fb0b0d.squirrel@housecafe.dyndns.org>	<4819B74A.5080503@koschikode.com>	<alpine.DEB.1.10.0805012354330.11017@sheep.housecafe.de>
	<20080502145101.GK17365@mit.edu>
Message-ID: <481B8847.4050302@koschikode.com>

Theodore Tso wrote:
> Juri, would you have any objections to moving the FAQ to
> ext4.wiki.kernel.org? [...]

No, not at all!

> With your permission, we can mirror the questions onto
> ext4.wiki.kernel.org, amd then if you would be willing to put a
> redirect to new location, that would be great.

Sure, please do so.
Please correct me, if I'm wrong: I understand your offer also as an
offer to maintain the FAQ in the future. If you want me to maintain the
FAQ furthermore, I'll do so, but actually, I'd be glad to pass this
responsibility on ;)

When you have set up the new FAQ just give me a note and I'll set up a
301 redirect.


- Juri


PS: a tiny link back to my site would be much appreciated, but not
necessary ;)


From tytso at MIT.EDU  Sat May  3 15:48:42 2008
From: tytso at MIT.EDU (Theodore Tso)
Date: Sat, 3 May 2008 11:48:42 -0400
Subject: ext3 limits?
In-Reply-To: <481B8847.4050302@koschikode.com>
References: <4815ABC9.1060209@cesca.es>
	<0d4f9140d1599a445f4caab3cda3ea97.squirrel@housecafe.dyndns.org>
	<4817379F.2080003@redhat.com>
	<cd6f6630ed284d613b2be9f402fb0b0d.squirrel@housecafe.dyndns.org>
	<4819B74A.5080503@koschikode.com>
	<alpine.DEB.1.10.0805012354330.11017@sheep.housecafe.de>
	<20080502145101.GK17365@mit.edu> <481B8847.4050302@koschikode.com>
Message-ID: <20080503154842.GE9841@mit.edu>

On Fri, May 02, 2008 at 11:31:51PM +0200, Juri Haberland wrote:
> Theodore Tso wrote:
> > Juri, would you have any objections to moving the FAQ to
> > ext4.wiki.kernel.org? [...]
> 
> No, not at all!
> 
> > With your permission, we can mirror the questions onto
> > ext4.wiki.kernel.org, amd then if you would be willing to put a
> > redirect to new location, that would be great.
> 
> Sure, please do so.
> Please correct me, if I'm wrong: I understand your offer also as an
> offer to maintain the FAQ in the future. If you want me to maintain the
> FAQ furthermore, I'll do so, but actually, I'd be glad to pass this
> responsibility on ;)

Well, it's in a Wiki, which means everyone can edit it.  :-)

If you would like to help out with the Wiki, more help would always be
appreciated!

						- Ted


From Harald_Jensas at Dell.com  Sat May  3 21:16:02 2008
From: Harald_Jensas at Dell.com (Harald_Jensas at Dell.com)
Date: Sat, 3 May 2008 23:16:02 +0200
Subject: Best Practices for recovering corrupt ext2/3 filesystems
Message-ID: <87C820D35C176D428A1A8A3B34F6FB86308DD9@uppx3m1.upp.emea.dell.com>


Hi All,

I am writing a document on Best Practices for recovering corrupt ext2/3 filesystems. The documents start with recommending scheduling backup of partition tables and filesystem images created with e2image. I was hopeing that the subscribers of this list might be able to verify some things for me.

Would it be safe to say that fsck would fail to recover a filesystem if the following information from dumpe2fs on the filesystem and the filesystem image differs:

Inode count:              

Block count:              

Reserved block count:     

Block size:               
Group x: (Blocks Y-Z)


Are there any other entries that would categorize as "cannot differ" entries?


--
Harald Jens?s


From ross at biostat.ucsf.edu  Thu May  8 18:25:49 2008
From: ross at biostat.ucsf.edu (Ross Boylan)
Date: Thu, 08 May 2008 11:25:49 -0700
Subject: LD_PRELOAD library to speed directory traversal
Message-ID: <1210271149.20686.18.camel@corn.betterworld.us>

Ted Ts'o wrote spd_readdir.c
(http://marc.info/?l=mutt-dev&m=107226330912347&w=2) to improve the
performance of some applications when reading the directory.  As I
understand it, the standard system calls may not traverse in inode
order, and so can be much slower.

I am taking another crack at using this, since some backups are taking
over a day for me.

If I use it, it would be with an application that will be accessing both
ext3 and other (specifically, Reiser) filesystems.  Does anyone know if
it is safe to use in that context?

Thanks.
Ross

For reference, here's the code:
/*
 * readdir accelerator
 *
 * (C) Copyright 2003 by Theodore Ts'o.
 *
 * %Begin-Header%
 * This file may be redistributed under the terms of the GNU Public
 * License.
 * %End-Header%
 * 
 */

#define ALLOC_STEPSIZE	100
#define MAX_DIRSIZE	0

#define DEBUG

#ifdef DEBUG
#define DEBUG_DIR(x)	{if (do_debug) { x; }}
#else
#define DEBUG_DIR(x)
#endif

#define _GNU_SOURCE
#define __USE_LARGEFILE64

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <stdlib.h>
#include <string.h>
#include <dirent.h>
#include <errno.h>
#include <dlfcn.h>

struct dirent_s {
	unsigned long long d_ino;
	long long d_off;
	unsigned short int d_reclen;
	unsigned char d_type;
	char *d_name;
};

struct dir_s {
	DIR	*dir;
	int	num;
	int	max;
	struct dirent_s *dp;
	int	pos;
	struct dirent ret_dir;
	struct dirent64 ret_dir64;
};

static int (*real_closedir)(DIR *dir) = 0;
static DIR *(*real_opendir)(const char *name) = 0;
static struct dirent *(*real_readdir)(DIR *dir) = 0;
static struct dirent64 *(*real_readdir64)(DIR *dir) = 0;
static off_t (*real_telldir)(DIR *dir) = 0;
static void (*real_seekdir)(DIR *dir, off_t offset) = 0;
static unsigned long max_dirsize = MAX_DIRSIZE;
#ifdef DEBUG
static int do_debug = 1;
#endif

static void setup_ptr()
{
	char *cp;

	real_opendir = dlsym(RTLD_NEXT, "opendir");
	real_closedir = dlsym(RTLD_NEXT, "closedir");
	real_readdir = dlsym(RTLD_NEXT, "readdir");
	real_readdir64 = dlsym(RTLD_NEXT, "readdir64");
	real_telldir = dlsym(RTLD_NEXT, "telldir");
	real_seekdir = dlsym(RTLD_NEXT, "seekdir");
	if ((cp = getenv("SPD_READDIR_MAX_SIZE")) != NULL) {
		max_dirsize = atol(cp);
	}
#ifdef DEBUG
	if (getenv("SPD_READDIR_DEBUG"))
		do_debug++;
#endif
}

static void free_cached_dir(struct dir_s *dirstruct)
{
	int i;

	if (!dirstruct->dp)
		return;

	for (i=0; i < dirstruct->num; i++) {
		free(dirstruct->dp[i].d_name);
	}
	free(dirstruct->dp);
	dirstruct->dp = 0;
}	

static int ino_cmp(const void *a, const void *b)
{
	const struct dirent_s *ds_a = (const struct dirent_s *) a;
	const struct dirent_s *ds_b = (const struct dirent_s *) b;
	ino_t i_a, i_b;
	
	i_a = ds_a->d_ino;
	i_b = ds_b->d_ino;

	if (ds_a->d_name[0] == '.') {
		if (ds_a->d_name[1] == 0)
			i_a = 0;
		else if ((ds_a->d_name[1] == '.') && (ds_a->d_name[2] == 0))
			i_a = 1;
	}
	if (ds_b->d_name[0] == '.') {
		if (ds_b->d_name[1] == 0)
			i_b = 0;
		else if ((ds_b->d_name[1] == '.') && (ds_b->d_name[2] == 0))
			i_b = 1;
	}

	return (i_a - i_b);
}


DIR *opendir(const char *name)
{
	DIR *dir;
	struct dir_s	*dirstruct;
	struct dirent_s *ds, *dnew;
	struct dirent64 *d;
	struct stat st;

	if (!real_opendir)
		setup_ptr();

	dir = (*real_opendir)(name);
	if (!dir)
		return NULL;

	dirstruct = malloc(sizeof(struct dir_s));
	if (!dirstruct) {
		(*real_closedir)(dir);
		errno = -ENOMEM;
		return NULL;
	}
	dirstruct->num = 0;
	dirstruct->max = 0;
	dirstruct->dp = 0;
	dirstruct->pos = 0;
	dirstruct->dir = 0;

	if (max_dirsize && (stat(name, &st) == 0) && 
	    (st.st_size > max_dirsize)) {
		DEBUG_DIR(printf("Directory size %ld, using direct readdir\n",
				 st.st_size));
		dirstruct->dir = dir;
		return (DIR *) dirstruct;
	}

	while ((d = (*real_readdir64)(dir)) != NULL) {
		if (dirstruct->num >= dirstruct->max) {
			dirstruct->max += ALLOC_STEPSIZE;
			DEBUG_DIR(printf("Reallocating to size %d\n", 
					 dirstruct->max));
			dnew = realloc(dirstruct->dp, 
				       dirstruct->max * sizeof(struct dir_s));
			if (!dnew)
				goto nomem;
			dirstruct->dp = dnew;
		}
		ds = &dirstruct->dp[dirstruct->num++];
		ds->d_ino = d->d_ino;
		ds->d_off = d->d_off;
		ds->d_reclen = d->d_reclen;
		ds->d_type = d->d_type;
		if ((ds->d_name = malloc(strlen(d->d_name)+1)) == NULL) {
			dirstruct->num--;
			goto nomem;
		}
		strcpy(ds->d_name, d->d_name);
		DEBUG_DIR(printf("readdir: %lu %s\n", 
				 (unsigned long) d->d_ino, d->d_name));
	}
	(*real_closedir)(dir);
	qsort(dirstruct->dp, dirstruct->num, sizeof(struct dirent_s), ino_cmp);
	if (do_debug) {
	  int i;
	  printf("After sorting.\n");
	  for (i=0; i<dirstruct->num; i++)
	    printf("%lu %s\n", 
		   (unsigned long) dirstruct->dp[i].d_ino, dirstruct->dp[i].d_name);
	}
	return ((DIR *) dirstruct);
nomem:
	DEBUG_DIR(printf("No memory, backing off to direct readdir\n"));
	free_cached_dir(dirstruct);
	dirstruct->dir = dir;
	return ((DIR *) dirstruct);
}

int closedir(DIR *dir)
{
	struct dir_s	*dirstruct = (struct dir_s *) dir;

	if (dirstruct->dir)
		(*real_closedir)(dirstruct->dir);

	free_cached_dir(dirstruct);
	free(dirstruct);
	return 0;
}

struct dirent *readdir(DIR *dir)
{
	struct dir_s	*dirstruct = (struct dir_s *) dir;
	struct dirent_s *ds;

	if (dirstruct->dir)
		return (*real_readdir)(dirstruct->dir);

	if (dirstruct->pos >= dirstruct->num)
		return NULL;

	ds = &dirstruct->dp[dirstruct->pos++];
	dirstruct->ret_dir.d_ino = ds->d_ino;
	dirstruct->ret_dir.d_off = ds->d_off;
	dirstruct->ret_dir.d_reclen = ds->d_reclen;
	dirstruct->ret_dir.d_type = ds->d_type;
	strncpy(dirstruct->ret_dir.d_name, ds->d_name,
		sizeof(dirstruct->ret_dir.d_name));

	return (&dirstruct->ret_dir);
}

struct dirent64 *readdir64(DIR *dir)
{
	struct dir_s	*dirstruct = (struct dir_s *) dir;
	struct dirent_s *ds;

	if (dirstruct->dir)
		return (*real_readdir64)(dirstruct->dir);

	if (dirstruct->pos >= dirstruct->num)
		return NULL;

	ds = &dirstruct->dp[dirstruct->pos++];
	dirstruct->ret_dir64.d_ino = ds->d_ino;
	dirstruct->ret_dir64.d_off = ds->d_off;
	dirstruct->ret_dir64.d_reclen = ds->d_reclen;
	dirstruct->ret_dir64.d_type = ds->d_type;
	strncpy(dirstruct->ret_dir64.d_name, ds->d_name,
		sizeof(dirstruct->ret_dir64.d_name));

	return (&dirstruct->ret_dir64);
}

off_t telldir(DIR *dir)
{
	struct dir_s	*dirstruct = (struct dir_s *) dir;

	if (dirstruct->dir)
		return (*real_telldir)(dirstruct->dir);

	return ((off_t) dirstruct->pos);
}

void seekdir(DIR *dir, off_t offset)
{
	struct dir_s	*dirstruct = (struct dir_s *) dir;

	if (dirstruct->dir) {
		(*real_seekdir)(dirstruct->dir, offset);
		return;
	}

	dirstruct->pos = offset;
}


From adilger at sun.com  Fri May  9 05:49:03 2008
From: adilger at sun.com (Andreas Dilger)
Date: Thu, 08 May 2008 23:49:03 -0600
Subject: LD_PRELOAD library to speed directory traversal
In-Reply-To: <1210271149.20686.18.camel@corn.betterworld.us>
References: <1210271149.20686.18.camel@corn.betterworld.us>
Message-ID: <20080509054903.GW3627@webber.adilger.int>

On May 08, 2008  11:25 -0700, Ross Boylan wrote:
> Ted Ts'o wrote spd_readdir.c
> (http://marc.info/?l=mutt-dev&m=107226330912347&w=2) to improve the
> performance of some applications when reading the directory.  As I
> understand it, the standard system calls may not traverse in inode
> order, and so can be much slower.
> 
> I am taking another crack at using this, since some backups are taking
> over a day for me.
> 
> If I use it, it would be with an application that will be accessing both
> ext3 and other (specifically, Reiser) filesystems.  Does anyone know if
> it is safe to use in that context?

Yes, in fact for many filesystems the "inode number" does map in some
manner to disk offsets.

Please report your results here, as there isn't a lot of feedback about
using this code.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


From archiveacl at gmail.com  Thu May 15 16:11:45 2008
From: archiveacl at gmail.com (Jon Vincent)
Date: Thu, 15 May 2008 12:11:45 -0400
Subject: Extended permissions on ext3
Message-ID: <13206c790805150911q3d9d3312h9259ad35568379f5@mail.gmail.com>

Hello,

I am seeing some strange behavior with extended permissions on ext3. I am
writing a file as root and setting a user ACE. I then change to that user
and try to access the file based on the ACL that I have set.

In the example below, I am setting a user ACE to have no permissions to
access the file (---). However, I find that when I access the file as that
user, I am able to read it. I find this strange because according to the man
page, as soon as it matches the user ACE entry, it should allow or deny
access.

If I set an identical ACL except I add the "wx" permission bits to the user
ACE (-wx), I am rejected (which is what I expect). I am just wondering why I
can read the file when I have no permissions (---) set on the user ACE (I
expected to be rejected). Examples are below:

Example with no permissions for the user ACE:
-------------------------------------------------------------------------
[root at jvincent-D800 ~]# cd /tmp
[root at jvincent-D800 tmp]# echo "hello world" > file.txt
[root at jvincent-D800 tmp]# setfacl -m
u::rwx,g::rwx,o::rwx,u:postgres:---,m:--- file.txt
[root at jvincent-D800 tmp]# getfacl file.txt
# file: file.txt
# owner: root
# group: root
user::rwx
user:postgres:---
group::rwx                      #effective:---
mask::---
other::rwx

[root at jvincent-D800 tmp]# ls -l file.txt
-rwx---rwx+ 1 root root 12 May  7 11:33 file.txt

[root at jvincent-D800 tmp]# su - postgres
[postgres at jvincent-D800 ~]$ id
uid=501(postgres) gid=501(postgres) groups=501(postgres)
[postgres at jvincent-D800 ~]$ whoami
postgres
[postgres at jvincent-D800 ~]$ cat /tmp/file.txt
hello world
[postgres at jvincent-D800 ~]$


Example with -wx permissions for the user ACE:
-------------------------------------------------------------------------
[root at jvincent-D800 tmp]# cd /tmp
[root at jvincent-D800 tmp]# echo "hello world" > file.txt
[root at jvincent-D800 tmp]# setfacl -m
u::rwx,g::rwx,o::rwx,u:postgres:-wx,m:rwx file.txt
[root at jvincent-D800 tmp]# getfacl file.txt
# file: file.txt
# owner: root
# group: root
user::rwx
user:postgres:-wx
group::rwx
mask::rwx
other::rwx

[root at jvincent-D800 tmp]# ls -l file.txt
-rwxrwxr--+ 1 root root 12 May  7 13:47 file.txt
[root at jvincent-D800 tmp]# su - postgres
[postgres at jvincent-D800 ~]$ id
uid=501(postgres) gid=501(postgres) groups=501(postgres)
[postgres at jvincent-D800 ~]$ whoami
postgres
[postgres at jvincent-D800 ~]$ cat /tmp/file.txt
cat: /tmp/file.txt: Permission denied
[postgres at jvincent-D800 ~]$


Thanks!

Jon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20080515/692d83ee/attachment.htm>

From tytso at mit.edu  Sun May 18 16:24:29 2008
From: tytso at mit.edu (Theodore Tso)
Date: Sun, 18 May 2008 12:24:29 -0400
Subject: ext3_dx_add_entry: Directory index full!
In-Reply-To: <48304CE2.1090808@codewiz.org>
References: <48304CE2.1090808@codewiz.org>
Message-ID: <20080518162429.GE31413@mit.edu>

On Sun, May 18, 2008 at 05:36:02PM +0200, Bernie Innocenti wrote:
>
> Some background: I'm moving users' Maildirs to a separate filesystem tuned
> for small files to increase performance.  One of our users intentionally
> collected spam for 5 years in one folder and likes it this way.
> We could easily work it around, but first I'd like to understand whether
> the particular parameters we used trigger a bug in ext3 or if we're just
> hitting a (possibly undocumented) limit.

No, not a bug, but a limit.  Ext3's hash directores are limited to a
depth of 3 blocks, which normally isn't a problem if you are using a
4k blocksize, since each internal node is small; only 8 bytes.  So you
have a fanout of 508 for each internal node, and two internal nodes
gets you to over 250,000 4k directory blocks.  But with a 1k
blocksize, the internal node fanout is only 124, so that only gets you
a bit more than 15,000 1k directory blocks.

We could remove this limit at some point; the problem is that Daniel
Phillip's original code had this as a limitation, and fixing it would
mean replacing the tree implementation.  We actually have some code
from Lustre that we could use for this purpose, but to date we've been
focused on some other higher priority items for ext4.

							- Ted


From RossBoylan at stanfordalumni.org  Sun May 18 19:15:22 2008
From: RossBoylan at stanfordalumni.org (Ross Boylan)
Date: Sun, 18 May 2008 12:15:22 -0700
Subject: spd_readdir progress report/questions
Message-ID: <1211136061.1808.22.camel@corn.betterworld.us>

I've been trying to use Ted's spd_readdir.c to accelerate extremely slow
directory traversal on backup.  For some reason, using it seems to stop
the backup entirely.  I'm still investigating; I suspect it may be an
issue with use LD_PRELOAD with a program running as root (but not
setuid) when the library is not root.

1.  There was a missing #ifdef in the original code.  I have revised it
to be
#ifdef DEBUG
 	if (do_debug) {
	  int i;
	  printf("After sorting.\n");
	  for (i=0; i<dirstruct->num; i++)
	    printf("%lu %s\n", 
		   (unsigned long) dirstruct->dp[i].d_ino, dirstruct->dp[i].d_name);
	}
#endif

I commented out the #define DEBUG statement before building the module
I'm trying to load.  That was how I discovered the need to the guards
shown above.

2. I concocted the following Makefile:
SDIR = /usr/local/src/kernel/ext3-patch
#CFLAGS=-O0 -g
LDFLAGS=-ldl

go2:	tester libsd_readdir.so.1
	LD_LIBRARY_PATH=./ LD_PRELOAD=libsd_readdir.so.1 ./tester Makefile

tester:	tester.o

spd_readdir.o:	spd_readdir.c

tester.o:	tester.c

foo:	foo.o

foo.o:	foo.c

go:	libsd_readdir.so.1
	LD_LIBRARY_PATH=./ LD_PRELOAD=libsd_readdir.so.1 tar cf /dev/null
$(SDIR)

libsd_readdir.so.1: spd_readdir.c
	$(CC) -shared -fpic -o $@ $^ ${LDFLAGS}

clean:
	rm tester spd_readdir.o libsd_readdir.so.1

Peculiarly, when I didn't use the LDFLAGS argument for the
libsd_readdir.so.1 target, I seemed to be able to start the program, but
when I tried to stop it I got
# LD_LIBRARY_PATH=/usr/local/src/kernel/ext3-patch \
LD_PRELOAD=libsd_readdir.so.1 start-stop-daemon --stop -v \
--exec /usr/sbin/bacula-fd -- -c /etc/bacula/bacula-fd.conf \

start-stop-daemon: symbol lookup
error: /usr/local/src/kernel/ext3-patch/libsd_readdir.so.1: undefined
symbol: dlsym

tester is a little test program I wrote to verify I was picking up the
new code.  The test with tar (target go:) didn't show any acceleration,
but apparently tar doesn't use the right calls to benefit for the
library.

3. Originally I was concerned that environment variables set on the
command line for start-stop-daemon would not effect the executable.
However, it seems to work.  Using a non-root test program I was able to
echo the variables from the test program started via start-stop-daemon.
Also, the fact that the deamon (bacula-fd) stops working properly  shows
the outer variable setting is having some effect.

4. start-stop-daemon is part of Debian's infrastructure for launching
deamon processes.  bacula-fd is part of the backup system:
$ ls -l /usr/sbin/bacula-fd
-rwxr-xr-x 1 root root 347212 2008-04-15 14:19 /usr/sbin/bacula-fd
I believe this means it is not setuid; the docs say setuid programs have
a restricted interpretation of LD_PRELOAD.
The library I am loading is
$ ls -l /usr/local/src/kernel/ext3-patch/libsd_readdir*
-rwxr-xr-x 1 ross staff 9083 2008-05-17
23:56 /usr/local/src/kernel/ext3-patch/libsd_readdir.so.1

5. I have the source for bacula-fd.  However, I assume that if I simply
try to add spd_readdir.c to the build I will get multiply defined symbol
conflicts with the calls it shadows.

If anyone has any suggestions for how to make it work, or to diagnose
the problems, I'd love to hear them.

Thanks.
Ross Boylan


From tytso at MIT.EDU  Sun May 18 21:04:35 2008
From: tytso at MIT.EDU (Theodore Tso)
Date: Sun, 18 May 2008 17:04:35 -0400
Subject: ext3_dx_add_entry: Directory index full!
In-Reply-To: <48304D92.1080306@develer.com>
References: <48304CE2.1090808@codewiz.org> <48304D92.1080306@develer.com>
Message-ID: <20080518210435.GA8335@mit.edu>

On Sun, May 18, 2008 at 05:38:58PM +0200, Bernie Innocenti wrote:
> Bernie Innocenti wrote:
>> On 2.6.24.4-64.fc8, I createed and mounted a filesystem like this:
>>   mke2fs -m0 -b 1024 -R stride=64 -I 128 -i 2048 -j  -L mail -O 
>> dir_index,sparse_super -v /dev/sdc1
>
> I cannot reproduce it any more if I reformat omitting "-b 1024".
> Maybe it would reappear with 200K * 4 = 800K files?

Using a filesystem with 4k blocks, and assuming the filenames are of
the same average length, you should be able to get approximately 200k
* (4**3) = 12.8 million files in a single directory.  If you use a 2k
block filesystem, the limit will be approximately 200k * (2**3) = 1.6
million files in a single directory.

Regards,

						- Ted

P.S.  Past a certain point, you really don't want to have that many
files in a Maildir directory; if the user is never going to be
deleting his SPAM, then you should seriously think about using a Unix
mbox style storage scheme.  Even with a 1k block filesystem, at 12
million files you'll be wasting 6 gigabytes of disk space of slack
space that is totally being wasted since the whole point of using
Maildir is to make it easy to delete or replace individual mail
messages.  If you want to archive all of your SPAM, why use a Maildir
format mbox at all?


From tytso at MIT.EDU  Mon May 19 00:49:55 2008
From: tytso at MIT.EDU (Theodore Tso)
Date: Sun, 18 May 2008 20:49:55 -0400
Subject: ext3_dx_add_entry: Directory index full!
In-Reply-To: <3A94CB82-6BF7-42DD-96DE-5B6018600077@develer.com>
References: <48304CE2.1090808@codewiz.org> <20080518162429.GE31413@mit.edu>
	<3A94CB82-6BF7-42DD-96DE-5B6018600077@develer.com>
Message-ID: <20080519004955.GD8335@mit.edu>

On Mon, May 19, 2008 at 01:01:57AM +0200, Stefano Fedrigo wrote:
>
> So, if I understand correctly, with a 1024 bytes blocksize, dir_index, and 
> inode size of 128 byte, the maximum number of files in a directory is 
> 123008.  With 4k blocks this limit rises to 8,258,048 files?

It depends on the length of the directory entries, and how full the
various directory blocks end up getting (which is a function of the
directory names used and the per-filesystem hash seed).  But in
general, the maximum limit goes up as the cube of the blocksize.  So a
4k filesystem can store roughly 64 times as many files ; a filesystem
using 16k blocks (say, on a Power or IA64 architecture) will be able
to store roughly 4,096 as many files in a single directory.  (So
around 819 million files in a single directory, using the original
maildir example).

Seriously, though, past a certain point, if you really want to store
that many small datums, you should really consider a database....

     	  		    	   	  - Ted


From ross at biostat.ucsf.edu  Mon May 19 01:13:30 2008
From: ross at biostat.ucsf.edu (Ross Boylan)
Date: Sun, 18 May 2008 18:13:30 -0700
Subject: ext3_dx_add_entry: Directory index full!
In-Reply-To: <20080518210435.GA8335@mit.edu>
References: <48304CE2.1090808@codewiz.org> <48304D92.1080306@develer.com>
	<20080518210435.GA8335@mit.edu>
Message-ID: <1211159610.1808.38.camel@corn.betterworld.us>

On Sun, 2008-05-18 at 17:04 -0400, Theodore Tso wrote:
> P.S.  Past a certain point, you really don't want to have that many
> files in a Maildir directory; if the user is never going to be
> deleting his SPAM, then you should seriously think about using a Unix
> mbox style storage scheme.  Even with a 1k block filesystem, at 12
> million files you'll be wasting 6 gigabytes of disk space of slack
> space that is totally being wasted since the whole point of using
> Maildir is to make it easy to delete or replace individual mail
> messages.  If you want to archive all of your SPAM, why use a Maildir
> format mbox at all?
> 
Cyrus, which I am using on ext3, has a maildir like format in which each
message is a separate file.  (It might even be maildir, but I think
not).  This is what has led to my very slow directory traversal times on
backup.

Cyrus does not offer a choice of formats in the sense of switching to
something like mbox, and it is intended to be for large scale use.
Functionally, it is a mail database.

I suspect many other systems, even those offering a choice of format,
won't let you mix different formats.  So if you like maildir for some
stuff, you may need to use it for all.

So it seems to me it would be useful if the filesystem supported such
useage patterns well.

Ross Boylan

[I cut out most of the distribution list.]


From jelledejong at powercraft.nl  Tue May 27 08:56:32 2008
From: jelledejong at powercraft.nl (Jelle de Jong)
Date: Tue, 27 May 2008 10:56:32 +0200
Subject: needs help, root inode gone after usb bus reset on sata disks
Message-ID: <483BCCC0.5020502@powercraft.nl>

Hello everybody,

I am new to this list, so welcome everybody.

Last 2 week I had two harddisk crashes with my ext2 file system.

This is what sort of happed with both of the disk:

I pluged in my USB to SATA converter in my harddisk that has an ext2
filesystem. I mounted the partition, went to a directory that had a DVD
image. I mounted the dvd image in the same directory and started
watching the movie. After 40 minutes the movie stops.

After some investigation I saw the ISO image was not mounted anymore.
And dmesg was showing 3 USB bus reset log entries.

I rebooted the computer and tried to mount the usb ext2 disk partition
again. But it failed.

dmesg and a fsck showed messages about the root inode being gone!

fsck.ext2 -p /dev/sdd1 did not work manual run is needed.

On the first 500GB disk I did an fsck.ext2 -y /dev/sdd1 did did not
fixed my disk it had still errors, I lost 35% of my data, but the
partition was mountable again, and the files where in the lost+found
directory.

I don't want this to happen with the second 750GB harddisk, I would like
all my data back.

fsck.ext3 -n /dev/sdd1 > fsck-crash-info.txt 2>&1
http://filebin.ca/mczmks/fsck-crash-info.zip

What should I do? What commands do you want me to run to provide more
info? How can i restore my root inode?

Thanks in advance,

Jelle de Jong


From lists at nerdbynature.de  Tue May 27 11:01:12 2008
From: lists at nerdbynature.de (Christian Kujau)
Date: Tue, 27 May 2008 13:01:12 +0200 (CEST)
Subject: needs help, root inode gone after usb bus reset on sata disks
In-Reply-To: <483BCCC0.5020502@powercraft.nl>
References: <483BCCC0.5020502@powercraft.nl>
Message-ID: <2ba70e56951502af48871a1d5ccad573.squirrel@housecafe.dyndns.org>

On Tue, May 27, 2008 10:56, Jelle de Jong wrote:
> image. I mounted the dvd image in the same directory and started watching
> the movie. After 40 minutes the movie stops.

Maybe someone on the list can tell you how the movie ends, if you tell us
the title :-)

> dmesg and a fsck showed messages about the root inode being gone!
> fsck.ext2 -p /dev/sdd1 did not work manual run is needed.
> On the first 500GB disk I did an fsck.ext2 -y /dev/sdd1 did did not

Do you still have the logs from these fsck runs? Might be interesting what
the exact errors were...

> fsck.ext3 -n /dev/sdd1 > fsck-crash-info.txt 2>&1
> http://filebin.ca/mczmks/fsck-crash-info.zip

Hm, filebin.ca times out, any chance to put this fsck-crash-info.txt
somewhere else? (no need to zip, gzip will do fine...)

> What should I do? What commands do you want me to run to provide more
> info? How can i restore my root inode?

500GB, 750GB...and no backups? Ouch :-\

Christian.
-- 
make bzImage, not war


From jelledejong at powercraft.nl  Tue May 27 11:25:33 2008
From: jelledejong at powercraft.nl (Jelle de Jong)
Date: Tue, 27 May 2008 13:25:33 +0200
Subject: needs help, root inode gone after usb bus reset on sata disks
In-Reply-To: <2ba70e56951502af48871a1d5ccad573.squirrel@housecafe.dyndns.org>
References: <483BCCC0.5020502@powercraft.nl>
	<2ba70e56951502af48871a1d5ccad573.squirrel@housecafe.dyndns.org>
Message-ID: <483BEFAD.5060901@powercraft.nl>

Christian Kujau wrote:
> On Tue, May 27, 2008 10:56, Jelle de Jong wrote:
>> image. I mounted the dvd image in the same directory and started watching
>> the movie. After 40 minutes the movie stops.
> 
> Maybe someone on the list can tell you how the movie ends, if you tell us
> the title :-)

I also had the dvd on disk, it was Memento. But I am not so lucky with
all the other data on the disk.

>> dmesg and a fsck showed messages about the root inode being gone!
>> fsck.ext2 -p /dev/sdd1 did not work manual run is needed.
>> On the first 500GB disk I did an fsck.ext2 -y /dev/sdd1 did did not
> 
> Do you still have the logs from these fsck runs? Might be interesting what
> the exact errors were...

The logs will be almost exactly the same as with the second disk (see
the gziped file)

>> fsck.ext3 -n /dev/sdd1 > fsck-crash-info.txt 2>&1
>> http://filebin.ca/mczmks/fsck-crash-info.zip
> 
> Hm, filebin.ca times out, any chance to put this fsck-crash-info.txt
> somewhere else? (no need to zip, gzip will do fine...)

http://www.powercraft.nl/temp/fsck-crash-info.txt.gz

>> What should I do? What commands do you want me to run to provide more
>> info? How can i restore my root inode?
> 
> 500GB, 750GB...and no backups? Ouch :-\

Indeed, I really hope we can solve it.

If I need to buy an additional 750G disk to make a backup dd image
please tell me, it will make a hole in my wallet but if it is necessary...

> 
> Christian.

Thanks in advance,

Jelle


From lists at nerdbynature.de  Tue May 27 12:23:24 2008
From: lists at nerdbynature.de (Christian Kujau)
Date: Tue, 27 May 2008 14:23:24 +0200 (CEST)
Subject: needs help, root inode gone after usb bus reset on sata disks
In-Reply-To: <483BEFAD.5060901@powercraft.nl>
References: <483BCCC0.5020502@powercraft.nl>
	<2ba70e56951502af48871a1d5ccad573.squirrel@housecafe.dyndns.org>
	<483BEFAD.5060901@powercraft.nl>
Message-ID: <9e81bbd3329bf74b590195093d4cf727.squirrel@housecafe.dyndns.org>

On Tue, May 27, 2008 13:25, Jelle de Jong wrote:
> http://www.powercraft.nl/temp/fsck-crash-info.txt.gz

So, you've ran fsck but did not try to repair yet, right? If so and you do
happen to have a spare 750 GB *now*, try to dd(1) your data to this spare
disk: then you can fsck your filesystems as many times as you want to.

Are there any (USB-)device related errors in the syslogs? We don't wanna
run fsck when the underlying device is unstable. dd(1) would be a good way
to find out.

Also, is LVM or RAID or sth. like this involved? Searching the net for
these errors brought up quite a few hits related to LVM b0rkage...

However, let's hope some ext3 hacker will comment on the logs...

C.
-- 
make bzImage, not war


From tytso at mit.edu  Tue May 27 12:47:11 2008
From: tytso at mit.edu (Theodore Tso)
Date: Tue, 27 May 2008 08:47:11 -0400
Subject: needs help, root inode gone after usb bus reset on sata disks
In-Reply-To: <483BCCC0.5020502@powercraft.nl>
References: <483BCCC0.5020502@powercraft.nl>
Message-ID: <20080527124711.GI7515@mit.edu>

On Tue, May 27, 2008 at 10:56:32AM +0200, Jelle de Jong wrote:
>
> I pluged in my USB to SATA converter in my harddisk that has an ext2
> filesystem. I mounted the partition, went to a directory that had a DVD
> image. I mounted the dvd image in the same directory and started
> watching the movie. After 40 minutes the movie stops.

Were you doing anything else on the computer; where there any write
operations taking place?  If you were just reading from the
filesystem, the fact that your filesystem was that badly damaged makes
me deeply suspicious about your USB to SATA converter.

> fsck.ext2 -p /dev/sdd1 did not work manual run is needed.
>
> On the first 500GB disk I did an fsck.ext2 -y /dev/sdd1 did did not
> fixed my disk it had still errors, I lost 35% of my data, but the
> partition was mountable again, and the files where in the lost+found
> directory.

It looks like garbage was written into your block group descriptors,
but since the superblock looked OK, e2fsck -y tried its best, but in
this case it may have done more harm than good.  (In general, if you
see e2fsck asking permission to relocate an inode table; there's
something very wrong, and you probably want to say 'n' and do an image
level backup of the filesystem before proceeding.)

> I don't want this to happen with the second 750GB harddisk, I would like
> all my data back.

Well, there's no guarantee the same corruption will have taken place
on your other hard drive.  Running e2fsck -n on that second hard drive
and letting an expert examine it would be a good first step, *before*
blindly running e2fsck -y.

In the next version of e2fsprogs (in development in the git
repository), e2fsck will have the ability to create an "undo" log
which will make e2fsck -y safer, but personally I've always liked to
individually hit return to say 'yes' to each >question.

> fsck.ext3 -n /dev/sdd1 > fsck-crash-info.txt 2>&1
> http://filebin.ca/mczmks/fsck-crash-info.zip
>
> What should I do? What commands do you want me to run to provide more
> info? How can i restore my root inode?

So this is from your 500GB disk, as I understand it, right?  I'd
really need to see the results of "e2fsck -n" *before* you ran "e2fsck
-y" but seeing what I see there, taking an image-level backup before
you had begun would have been really good idea.

I'm not sure there's anythign you'll be able to do about restoring
your root inode.  But if it was just the root inode that was
destroyed, that's actually not a big deal; you'll just have files in
lost+found, and you can usually piece together the root directory
fairly easily.

The bigger problem is the other parts of the filesystem that were
corrupted, due to what was apparently a hardware failure.  I'm
actually really not a fan of USB as an interconnect for disks, because
the cables can be flakey; it's not that hard for them to come lose,
which may have been what caused your USB<->SATA converter to flake
out, but it apparently did so in a very spectacular fashion.

When I have time I'll have to add a better automated hueristic to
e2fsck try to do this automatically (although even when I make e2fsck
-y smarter, there *still* will be cases where a human with experience
and intelligence and common sense will do better than a program), but
for now, if you see a message about wanting to relocate an inode
table, you'll want to look at the output of "dumpe2fs /dev/sdXX",
"dumpe2fs -o superblock=32768 /dev/sdXX", and "dumpe2fs -o
superblock=98304 /dev/sdXX" (these numbers are assuming a 4k
blocksize, which is the common default).  If the location of the inode
table blocks makes more sense when dumpe2fs is told to look at the
backup superblock at 32768, it may be that e2fsck -b 32768 /dev/sdXX
will do a better job of recovering the filesystem.

     	  	     		       - Ted


From jelledejong at powercraft.nl  Tue May 27 12:52:46 2008
From: jelledejong at powercraft.nl (Jelle de Jong)
Date: Tue, 27 May 2008 14:52:46 +0200
Subject: needs help, root inode gone after usb bus reset on sata disks
In-Reply-To: <9e81bbd3329bf74b590195093d4cf727.squirrel@housecafe.dyndns.org>
References: <483BCCC0.5020502@powercraft.nl>
	<2ba70e56951502af48871a1d5ccad573.squirrel@housecafe.dyndns.org>
	<483BEFAD.5060901@powercraft.nl>
	<9e81bbd3329bf74b590195093d4cf727.squirrel@housecafe.dyndns.org>
Message-ID: <483C041E.7070605@powercraft.nl>

Christian Kujau wrote:
> On Tue, May 27, 2008 13:25, Jelle de Jong wrote:
>> http://www.powercraft.nl/temp/fsck-crash-info.txt.gz
> 
> So, you've ran fsck but did not try to repair yet, right? If so and you do
> happen to have a spare 750 GB *now*, try to dd(1) your data to this spare
> disk: then you can fsck your filesystems as many times as you want to.

I have not run fsck to repair with the previous disk this when wrong, so 
   I will buy a spare 750 GB disk today, and make a dd image from the 
one disk to the other.

> Are there any (USB-)device related errors in the syslogs? We don't wanna
> run fsck when the underlying device is unstable. dd(1) would be a good way
> to find out.

I have seen the usb bus reset errors before but I cant put my finger on 
the broken part. I have now been copying data from one usb disk to an 
other usb disk without any usb error messages and this is a 4 hour transfer.

> Also, is LVM or RAID or sth. like this involved? Searching the net for
> these errors brought up quite a few hits related to LVM b0rkage...

No RAID setup, but i am planning a software raid setup soon.

> However, let's hope some ext3 hacker will comment on the logs...

yes lets hope so...

> C.


From jelledejong at powercraft.nl  Tue May 27 13:09:02 2008
From: jelledejong at powercraft.nl (Jelle de Jong)
Date: Tue, 27 May 2008 15:09:02 +0200
Subject: needs help, root inode gone after usb bus reset on sata disks
In-Reply-To: <20080527124711.GI7515@mit.edu>
References: <483BCCC0.5020502@powercraft.nl> <20080527124711.GI7515@mit.edu>
Message-ID: <483C07EE.1060905@powercraft.nl>

Theodore Tso wrote:
> On Tue, May 27, 2008 at 10:56:32AM +0200, Jelle de Jong wrote:
>> I pluged in my USB to SATA converter in my harddisk that has an ext2
>> filesystem. I mounted the partition, went to a directory that had a DVD
>> image. I mounted the dvd image in the same directory and started
>> watching the movie. After 40 minutes the movie stops.
> 
> Were you doing anything else on the computer; where there any write
> operations taking place?  If you were just reading from the
> filesystem, the fact that your filesystem was that badly damaged makes
> me deeply suspicious about your USB to SATA converter.

There was nothing else going on then watching a DVD form the disk. It 
may have been an usb issue, but I am using the same sata usb converter 
for several hours now without any problem. But an usb converter / power 
failure should not be able to create so much damage when just reading 
files...

> 
>> fsck.ext2 -p /dev/sdd1 did not work manual run is needed.
>>
>> On the first 500GB disk I did an fsck.ext2 -y /dev/sdd1 did did not
>> fixed my disk it had still errors, I lost 35% of my data, but the
>> partition was mountable again, and the files where in the lost+found
>> directory.
> 
> It looks like garbage was written into your block group descriptors,
> but since the superblock looked OK, e2fsck -y tried its best, but in
> this case it may have done more harm than good.  (In general, if you
> see e2fsck asking permission to relocate an inode table; there's
> something very wrong, and you probably want to say 'n' and do an image
> level backup of the filesystem before proceeding.)
> 
>> I don't want this to happen with the second 750GB harddisk, I would like
>> all my data back.
> 
> Well, there's no guarantee the same corruption will have taken place
> on your other hard drive.  Running e2fsck -n on that second hard drive
> and letting an expert examine it would be a good first step, *before*
> blindly running e2fsck -y.
  >
> In the next version of e2fsprogs (in development in the git
> repository), e2fsck will have the ability to create an "undo" log
> which will make e2fsck -y safer, but personally I've always liked to
> individually hit return to say 'yes' to each >question.
> 
>> fsck.ext3 -n /dev/sdd1 > fsck-crash-info.txt 2>&1
>> http://filebin.ca/mczmks/fsck-crash-info.zip
>>
>> What should I do? What commands do you want me to run to provide more
>> info? How can i restore my root inode?
> 
> So this is from your 500GB disk, as I understand it, right?  I'd
> really need to see the results of "e2fsck -n" *before* you ran "e2fsck
> -y" but seeing what I see there, taking an image-level backup before
> you had begun would have been really good idea.

The log is of the second hard drive. I don't have a log of the first 
hard drive, but it had very very similar outputs. Going to create an 
image and hope an expert can tell me how to try fixing the file system.

> I'm not sure there's anythign you'll be able to do about restoring
> your root inode.  But if it was just the root inode that was
> destroyed, that's actually not a big deal; you'll just have files in
> lost+found, and you can usually piece together the root directory
> fairly easily.
> 
> The bigger problem is the other parts of the filesystem that were
> corrupted, due to what was apparently a hardware failure.  I'm
> actually really not a fan of USB as an interconnect for disks, because
> the cables can be flakey; it's not that hard for them to come lose,
> which may have been what caused your USB<->SATA converter to flake
> out, but it apparently did so in a very spectacular fashion.

The reason i used usb connections is power saving, just plug in the hard 
drive you need. I think I will have an closer look at a placing my 
harddrives in my server finding some way to hot-swap hot-powerplug 
drivers enable and disable the power to harddrivers.

> When I have time I'll have to add a better automated hueristic to
> e2fsck try to do this automatically (although even when I make e2fsck
> -y smarter, there *still* will be cases where a human with experience
> and intelligence and common sense will do better than a program), but
> for now, if you see a message about wanting to relocate an inode
> table, you'll want to look at the output of "dumpe2fs /dev/sdXX",
> "dumpe2fs -o superblock=32768 /dev/sdXX", and "dumpe2fs -o
> superblock=98304 /dev/sdXX" (these numbers are assuming a 4k
> blocksize, which is the common default).  If the location of the inode
> table blocks makes more sense when dumpe2fs is told to look at the
> backup superblock at 32768, it may be that e2fsck -b 32768 /dev/sdXX
> will do a better job of recovering the filesystem.

dumpe2fs /dev/sdXX
dumpe2fs -o superblock=32768 /dev/sdXX
dumpe2fs -o superblock=98304 /dev/sdXX
e2fsck -b 32768 /dev/sdXX

Sound like a lot of experimentation, so I am going to make a backup first.

I do not have an journaling system on my disk, would it have been a lot 
saver to have journaling on usb disk? and what about an auto sync option 
flag for usb disks?

Thank you for the information Ted,

Jelle


From jelledejong at powercraft.nl  Wed May 28 14:44:21 2008
From: jelledejong at powercraft.nl (Jelle de Jong)
Date: Wed, 28 May 2008 16:44:21 +0200
Subject: needs help, root inode gone after usb bus reset on sata disks
In-Reply-To: <483C07EE.1060905@powercraft.nl>
References: <483BCCC0.5020502@powercraft.nl> <20080527124711.GI7515@mit.edu>
	<483C07EE.1060905@powercraft.nl>
Message-ID: <483D6FC5.30109@powercraft.nl>

Jelle de Jong wrote:
> Theodore Tso wrote:
>> On Tue, May 27, 2008 at 10:56:32AM +0200, Jelle de Jong wrote:
>>> I pluged in my USB to SATA converter in my harddisk that has an ext2
>>> filesystem. I mounted the partition, went to a directory that had a DVD
>>> image. I mounted the dvd image in the same directory and started
>>> watching the movie. After 40 minutes the movie stops.
>>
>> Were you doing anything else on the computer; where there any write
>> operations taking place?  If you were just reading from the
>> filesystem, the fact that your filesystem was that badly damaged makes
>> me deeply suspicious about your USB to SATA converter.
> 
> There was nothing else going on then watching a DVD form the disk. It 
> may have been an usb issue, but I am using the same sata usb converter 
> for several hours now without any problem. But an usb converter / power 
> failure should not be able to create so much damage when just reading 
> files...
> 
>>
>>> fsck.ext2 -p /dev/sdd1 did not work manual run is needed.
>>>
>>> On the first 500GB disk I did an fsck.ext2 -y /dev/sdd1 did did not
>>> fixed my disk it had still errors, I lost 35% of my data, but the
>>> partition was mountable again, and the files where in the lost+found
>>> directory.
>>
>> It looks like garbage was written into your block group descriptors,
>> but since the superblock looked OK, e2fsck -y tried its best, but in
>> this case it may have done more harm than good.  (In general, if you
>> see e2fsck asking permission to relocate an inode table; there's
>> something very wrong, and you probably want to say 'n' and do an image
>> level backup of the filesystem before proceeding.)
>>
>>> I don't want this to happen with the second 750GB harddisk, I would like
>>> all my data back.
>>
>> Well, there's no guarantee the same corruption will have taken place
>> on your other hard drive.  Running e2fsck -n on that second hard drive
>> and letting an expert examine it would be a good first step, *before*
>> blindly running e2fsck -y.
>  >
>> In the next version of e2fsprogs (in development in the git
>> repository), e2fsck will have the ability to create an "undo" log
>> which will make e2fsck -y safer, but personally I've always liked to
>> individually hit return to say 'yes' to each >question.
>>
>>> fsck.ext3 -n /dev/sdd1 > fsck-crash-info.txt 2>&1
>>> http://filebin.ca/mczmks/fsck-crash-info.zip
>>>
>>> What should I do? What commands do you want me to run to provide more
>>> info? How can i restore my root inode?
>>
>> So this is from your 500GB disk, as I understand it, right?  I'd
>> really need to see the results of "e2fsck -n" *before* you ran "e2fsck
>> -y" but seeing what I see there, taking an image-level backup before
>> you had begun would have been really good idea.
> 
> The log is of the second hard drive. I don't have a log of the first 
> hard drive, but it had very very similar outputs. Going to create an 
> image and hope an expert can tell me how to try fixing the file system.
> 
>> I'm not sure there's anythign you'll be able to do about restoring
>> your root inode.  But if it was just the root inode that was
>> destroyed, that's actually not a big deal; you'll just have files in
>> lost+found, and you can usually piece together the root directory
>> fairly easily.
>>
>> The bigger problem is the other parts of the filesystem that were
>> corrupted, due to what was apparently a hardware failure.  I'm
>> actually really not a fan of USB as an interconnect for disks, because
>> the cables can be flakey; it's not that hard for them to come lose,
>> which may have been what caused your USB<->SATA converter to flake
>> out, but it apparently did so in a very spectacular fashion.
> 
> The reason i used usb connections is power saving, just plug in the hard 
> drive you need. I think I will have an closer look at a placing my 
> harddrives in my server finding some way to hot-swap hot-powerplug 
> drivers enable and disable the power to harddrivers.
> 
>> When I have time I'll have to add a better automated hueristic to
>> e2fsck try to do this automatically (although even when I make e2fsck
>> -y smarter, there *still* will be cases where a human with experience
>> and intelligence and common sense will do better than a program), but
>> for now, if you see a message about wanting to relocate an inode
>> table, you'll want to look at the output of "dumpe2fs /dev/sdXX",
>> "dumpe2fs -o superblock=32768 /dev/sdXX", and "dumpe2fs -o
>> superblock=98304 /dev/sdXX" (these numbers are assuming a 4k
>> blocksize, which is the common default).  If the location of the inode
>> table blocks makes more sense when dumpe2fs is told to look at the
>> backup superblock at 32768, it may be that e2fsck -b 32768 /dev/sdXX
>> will do a better job of recovering the filesystem.
> 
> dumpe2fs /dev/sdXX
> dumpe2fs -o superblock=32768 /dev/sdXX
> dumpe2fs -o superblock=98304 /dev/sdXX
> e2fsck -b 32768 /dev/sdXX
> 
> Sound like a lot of experimentation, so I am going to make a backup first.
> 
> I do not have an journaling system on my disk, would it have been a lot 
> saver to have journaling on usb disk? and what about an auto sync option 
> flag for usb disks?

So, it took 14 hours to pump over the 750G to an other disk, but i hope 
it went ok. I executed the below command created the logs and hope 
somebody can tell me what to do next?

dd if=/dev/sdb of=/dev/sda > dd-run-info.txt 2>&1
dumpe2fs /dev/sda1 > dumpe2fs-info-sda1.txt 2>&1
dumpe2fs -ob 32768 /dev/sda1 > dumpe2fs-32768-info-sda1.txt 2>&1
dumpe2fs -ob 98304 /dev/sda1 > dumpe2fs-98304-info-sda1.txt 2>&1
e2fsck -b 32768 /dev/sda1 (need terminal for interactive repairs)

http://www.powercraft.nl/temp/fsck-crash-info.txt.gz
http://www.powercraft.nl/temp/dumpe2fs-info-sda1.txt.gz
http://www.powercraft.nl/temp/dumpe2fs-32768-info-sda1.txt.gz
http://www.powercraft.nl/temp/dumpe2fs-98304-info-sda1.txt.gz

Thanks in advance,

Jelle


From tytso at mit.edu  Wed May 28 23:24:52 2008
From: tytso at mit.edu (Theodore Tso)
Date: Wed, 28 May 2008 19:24:52 -0400
Subject: needs help, root inode gone after usb bus reset on sata disks
In-Reply-To: <483D6FC5.30109@powercraft.nl>
References: <483BCCC0.5020502@powercraft.nl> <20080527124711.GI7515@mit.edu>
	<483C07EE.1060905@powercraft.nl> <483D6FC5.30109@powercraft.nl>
Message-ID: <20080528232452.GO6843@mit.edu>

On Wed, May 28, 2008 at 04:44:21PM +0200, Jelle de Jong wrote:
>> dumpe2fs -o superblock=32768 /dev/sdXX

I asked you to do the above, but you did this instead:

> dumpe2fs -ob 32768 /dev/sda1 > dumpe2fs-32768-info-sda1.txt 2>&1

Resulting in this:

dumpe2fs: No such file or directory while trying to open 32768

So I can't tell if the backup superblock was corrupted, but this is
definitely one for the record books.  Looking at primary superblock,
we see the following:

dumpe2fs 1.40-WIP (14-Nov-2006)
Filesystem volume name:   <none>
Last mounted on:          ^^<BA><8B>
Filesystem UUID:          2e27ae79-fc96-43f5-9758-15ed74dd55fb
Filesystem magic number:  0xEF53
Filesystem revision #:    0 (original)
Filesystem features:      (none)
Default mount options:    MNTOPT_15 MNTOPT_16 MNTOPT_18 MNTOPT_20 MNTOPT_21 MNTOPT_22 MNTOPT_24 MNTOPT_26

The above, especially the Filesystem features, and default mount
options, are garbage.  But it looks like the rest of the superblock,
including the magic number, the block counts, etc., look sane --- at
least in sane enough that it passed e2fsck's sanity checking.

This is unlike *any* corruption I've seen before; usually there will
be a single bit flip, or the entire disk sector is corrupted, but it's
extremely rare to see this kind of selective corruption.

It's even wierder that this apparently happened on more than one hard
drive.  In any case, I would ditch that USB<->SATA converter as fast
as possible, because there is something seriously wrong.  The other
possibility is that you're running with buggy kernel, but no one else
has ever reported anything like this, and for two disks to be
corrupted the same way means it's unlikely to be caused by a random
wild pointer or some such.  So if I really had to guess I'd go with
the USB converter, but that's not for certain.

In terms of how to fix it, I'd would have to see the results of 

dumpe2fs -o superblock=32768 /dev/sdXX

and 

dumpe2fs -o superblock=98304 /dev/sdXX

Hopefully one of the superblocks look OK.  We could also try manually
repairing the superblock with debugfs, in the worse case, but it'll be
easier if we can fix things via the backup superblock.

       	     	     	    	    	   - Ted


From tytso at mit.edu  Wed May 28 23:50:57 2008
From: tytso at mit.edu (Theodore Tso)
Date: Wed, 28 May 2008 19:50:57 -0400
Subject: needs help, root inode gone after usb bus reset on sata disks
In-Reply-To: <20080528232452.GO6843@mit.edu>
References: <483BCCC0.5020502@powercraft.nl> <20080527124711.GI7515@mit.edu>
	<483C07EE.1060905@powercraft.nl> <483D6FC5.30109@powercraft.nl>
	<20080528232452.GO6843@mit.edu>
Message-ID: <20080528235057.GP6843@mit.edu>

Oh, so I forgot to mention the main thing which caused e2fsck to
report that various inode tables needed to be moved.  The filesystem
feature field was zero'ed out, and looking at the dumpe2fs output,
it's clear the sparse_super feature should have been enabled.

The backup superblocks should have that feature set; if so, running
e2fsck telling it to pay attention to one of the backup superblocks
should address the problem.

Again, the really funny thing was how the superblock got corrupted in
such a funny and specific way.  It's almost as if it was corrupted by
Murphy (as in Murphy's Law) himself.

						- Ted


From jelledejong at powercraft.nl  Thu May 29 09:37:25 2008
From: jelledejong at powercraft.nl (Jelle de Jong)
Date: Thu, 29 May 2008 11:37:25 +0200
Subject: needs help, root inode gone after usb bus reset on sata disks
In-Reply-To: <20080528232452.GO6843@mit.edu>
References: <483BCCC0.5020502@powercraft.nl> <20080527124711.GI7515@mit.edu>
	<483C07EE.1060905@powercraft.nl> <483D6FC5.30109@powercraft.nl>
	<20080528232452.GO6843@mit.edu>
Message-ID: <483E7955.7020508@powercraft.nl>

Theodore Tso wrote:
> On Wed, May 28, 2008 at 04:44:21PM +0200, Jelle de Jong wrote:
>>> dumpe2fs -o superblock=32768 /dev/sdXX
> 
> I asked you to do the above, but you did this instead:
> 
>> dumpe2fs -ob 32768 /dev/sda1 > dumpe2fs-32768-info-sda1.txt 2>&1
> 

My humble excuse, i had to place the disk in a server and this server 
had an older version of the dumpe2fs tool that did not supported the 
superblock option. I upgraded the server and rerun all the test for you.

dumpe2fs /dev/sda1 > dumpe2fs-info-sda1.txt 2>&1
dumpe2fs -o superblock=32768 /dev/sda1 > 
dumpe2fs-superblock-32768-info-sda1.txt 2>&1
dumpe2fs -o superblock=98304 /dev/sda1 > 
dumpe2fs-superblock-98304-info-sda1.txt 2>&1
e2fsck -n /dev/sda1 > e2fsck-n-info-sda1.txt 2>&1

http://www.powercraft.nl/temp/dumpe2fs-info-sda1.txt.gz
http://www.powercraft.nl/temp/dumpe2fs-superblock-32768-info-sda1.txt.gz
http://www.powercraft.nl/temp/dumpe2fs-superblock-98304-info-sda1.txt.gz
http://www.powercraft.nl/temp/e2fsck-n-info-sda1.txt.gz

I hope this is the correct information, that can tell you want command 
is best to run to restore the filesystem with the data.

> Resulting in this:
> 
> dumpe2fs: No such file or directory while trying to open 32768
> 
> So I can't tell if the backup superblock was corrupted, but this is
> definitely one for the record books.  Looking at primary superblock,
> we see the following:
> 
> dumpe2fs 1.40-WIP (14-Nov-2006)
> Filesystem volume name:   <none>
> Last mounted on:          ^^<BA><8B>
> Filesystem UUID:          2e27ae79-fc96-43f5-9758-15ed74dd55fb
> Filesystem magic number:  0xEF53
> Filesystem revision #:    0 (original)
> Filesystem features:      (none)
> Default mount options:    MNTOPT_15 MNTOPT_16 MNTOPT_18 MNTOPT_20 MNTOPT_21 MNTOPT_22 MNTOPT_24 MNTOPT_26
> 
> The above, especially the Filesystem features, and default mount
> options, are garbage.  But it looks like the rest of the superblock,
> including the magic number, the block counts, etc., look sane --- at
> least in sane enough that it passed e2fsck's sanity checking.
> 
> This is unlike *any* corruption I've seen before; usually there will
> be a single bit flip, or the entire disk sector is corrupted, but it's
> extremely rare to see this kind of selective corruption.
> 
> It's even wierder that this apparently happened on more than one hard
> drive.  In any case, I would ditch that USB<->SATA converter as fast
> as possible, because there is something seriously wrong.  The other
> possibility is that you're running with buggy kernel, but no one else
> has ever reported anything like this, and for two disks to be
> corrupted the same way means it's unlikely to be caused by a random
> wild pointer or some such.  So if I really had to guess I'd go with
> the USB converter, but that's not for certain.
> 
> In terms of how to fix it, I'd would have to see the results of 
> 
> dumpe2fs -o superblock=32768 /dev/sdXX
> 
> and 
> 
> dumpe2fs -o superblock=98304 /dev/sdXX
> 
> Hopefully one of the superblocks look OK.  We could also try manually
> repairing the superblock with debugfs, in the worse case, but it'll be
> easier if we can fix things via the backup superblock.
> 
>        	     	     	    	    	   - Ted

I always seem to get the impossible out of Linux tools, but most times 
this is during quality tests... however this was on "normal usage". I 
hope it has noting to do with the latest release changes or with corrupt 
binaries on my client system.

Thank you Ted,

Kind regards,

Jelle


From alex at alex.org.uk  Thu May 29 10:51:46 2008
From: alex at alex.org.uk (Alex Bligh)
Date: Thu, 29 May 2008 11:51:46 +0100
Subject: HTREE corruption
Message-ID: <11D5729D8801AFEF019336E6@Ximines.local>

A power interruption caused an HTREE problem which appears to upset ext3
fsck.

Googling for '"Problem in HTREE directory inode" "has invalid depth"'
produces relatively few hits, and none where fsck dies.

Log below.

Running fsck manually offered the option of clearing 2 HTREE directory
inodes, which then produced a clean file system, which checks OK with
"fsck -f".

Anything to be worried about here?

Alex


Log of fsck -C -V -R -A -a
Thu May 29 10:32:18 2008

fsck 1.40-WIP (14-Nov-2006)
Checking all file systems.
[/sbin/fsck.ext3 (1) -- /var] fsck.ext3 -a -C0 /dev/hda7
/dev/hda7: clean, 4905/2443200 files, 272743/4885760 blocks
[/sbin/fsck.ext3 (1) -- /boot] fsck.ext3 -a -C0 /dev/hda5
/dev/hda5: clean, 57/245280 files, 30013/489974 blocks
[/sbin/fsck.ext3 (1) -- /usr] fsck.ext3 -a -C0 /dev/hda6
/dev/hda6: clean, 80834/2443200 files, 465978/4885760 blocks
[/sbin/fsck.ext3 (1) -- /home] fsck.ext3 -a -C0 /dev/hda8
/dev/hda8: clean, 18719/12222464 files, 4554802/24416784 blocks
[/sbin/fsck.ext3 (1) -- /var/imap] fsck.ext3 -a -C0 /dev/hda9
/dev/hda9 contains a file system with errors, check forced.
/dev/hda9: Problem in HTREE directory inode 16057109: node (256) has 
invalid depth
/dev/hda9: Problem in HTREE directory inode 16057109: node (256) has bad 
max hash
/dev/hda9: Problem in HTREE directory inode 16057109: node (256) not 
referenced
/dev/hda9: Problem in HTREE directory inode 16057109: node (257) has 
invalid depth
/dev/hda9: Problem in HTREE directory inode 16057109: node (257) has bad 
max hash
/dev/hda9: Problem in HTREE directory inode 16057109: node (257) not 
referenced
(and so on for nodes 258 to 1222)
/dev/hda9: Problem in HTREE directory inode 16057109: node (1223) has 
invalid depth
/dev/hda9: Problem in HTREE directory inode 16057109: node (1223) has bad 
max hash
/dev/hda9: Problem in HTREE directory inode 16057109: node (1223) not 
referenced
/dev/hda9: Problem in HTREE directory inode 16057109: node (1224) has 
invalid depth
/dev/hda9: Problem in HTREE directory inode 16057109: node (1224) has bad 
max hash
/dev/hda9: Problem in HTREE directory inode 16057109: node (1224) not 
referenced
/dev/hda9: Problem in HTREE directory inode 16057109: node (1225fsck died 
with exit status 4

Thu May 29 10:51:25 2008
----------------


From tytso at mit.edu  Thu May 29 12:58:16 2008
From: tytso at mit.edu (Theodore Tso)
Date: Thu, 29 May 2008 08:58:16 -0400
Subject: needs help, root inode gone after usb bus reset on sata disks
In-Reply-To: <483E7955.7020508@powercraft.nl>
References: <483BCCC0.5020502@powercraft.nl> <20080527124711.GI7515@mit.edu>
	<483C07EE.1060905@powercraft.nl> <483D6FC5.30109@powercraft.nl>
	<20080528232452.GO6843@mit.edu> <483E7955.7020508@powercraft.nl>
Message-ID: <20080529125816.GD8065@mit.edu>

On Thu, May 29, 2008 at 11:37:25AM +0200, Jelle de Jong wrote:
>
> My humble excuse, i had to place the disk in a server and this server had 
> an older version of the dumpe2fs tool that did not supported the superblock 
> option. I upgraded the server and rerun all the test for you.
>
> dumpe2fs -o superblock=32768 /dev/sda1 > 
> dumpe2fs-superblock-32768-info-sda1.txt 2>&1
> dumpe2fs -o superblock=98304 /dev/sda1 > 
> dumpe2fs-superblock-98304-info-sda1.txt 2>&1

Unfortunately, it looks like the backup superblocks were also
corrupted, and in the same way.  Did you *ever* run e2fsck in
read/write mode (i.e., without the -n option) on this filesystem after
when you think it had gotten corrupted?

So what I will suggest at this point is that you do the following:

debugfs -w /dev/sda1
debugfs: features dir_index filetype sparse_super
debugfs: quit

The run "e2fsck -nf /dev/sda1" and make the output looks relatively
clean. You should *not* see any messages about needing to relocate
inode tables.

If so, you can then run "e2fsck -f /dev/sda1 to fully recover the
filesysten.

> I always seem to get the impossible out of Linux tools, but most times this 
> is during quality tests... however this was on "normal usage". I hope it 
> has noting to do with the latest release changes or with corrupt binaries 
> on my client system.

Well, absolutely no one else reporting this problem or anything like
it...

      		    	     	       	    - Ted


From sandeen at redhat.com  Thu May 29 13:33:47 2008
From: sandeen at redhat.com (Eric Sandeen)
Date: Thu, 29 May 2008 08:33:47 -0500
Subject: HTREE corruption
In-Reply-To: <11D5729D8801AFEF019336E6@Ximines.local>
References: <11D5729D8801AFEF019336E6@Ximines.local>
Message-ID: <483EB0BB.7050103@redhat.com>

Alex Bligh wrote:
> A power interruption caused an HTREE problem which appears to upset ext3
> fsck.

You probably want to run with barriers enabled to avoid this next time.
;)  (there was a recent discussion on whether they should be default, to
protect situations like this...)

> Googling for '"Problem in HTREE directory inode" "has invalid depth"'
> produces relatively few hits, and none where fsck dies.
> 
> Log below.
> 
> Running fsck manually offered the option of clearing 2 HTREE directory
> inodes, which then produced a clean file system, which checks OK with
> "fsck -f".
> 
> Anything to be worried about here?

Hm, well I would suppose that fsck should not die, in any case.

You could find /mount/point -inum 16057109 to see which directory it was
and go see how it's looking, post-fsck....

-Eric

> Alex
> 
> 
> Log of fsck -C -V -R -A -a
> Thu May 29 10:32:18 2008
> 
> fsck 1.40-WIP (14-Nov-2006)
> Checking all file systems.
> [/sbin/fsck.ext3 (1) -- /var] fsck.ext3 -a -C0 /dev/hda7
> /dev/hda7: clean, 4905/2443200 files, 272743/4885760 blocks
> [/sbin/fsck.ext3 (1) -- /boot] fsck.ext3 -a -C0 /dev/hda5
> /dev/hda5: clean, 57/245280 files, 30013/489974 blocks
> [/sbin/fsck.ext3 (1) -- /usr] fsck.ext3 -a -C0 /dev/hda6
> /dev/hda6: clean, 80834/2443200 files, 465978/4885760 blocks
> [/sbin/fsck.ext3 (1) -- /home] fsck.ext3 -a -C0 /dev/hda8
> /dev/hda8: clean, 18719/12222464 files, 4554802/24416784 blocks
> [/sbin/fsck.ext3 (1) -- /var/imap] fsck.ext3 -a -C0 /dev/hda9
> /dev/hda9 contains a file system with errors, check forced.
> /dev/hda9: Problem in HTREE directory inode 16057109: node (256) has 
> invalid depth
> /dev/hda9: Problem in HTREE directory inode 16057109: node (256) has bad 
> max hash
> /dev/hda9: Problem in HTREE directory inode 16057109: node (256) not 
> referenced
> /dev/hda9: Problem in HTREE directory inode 16057109: node (257) has 
> invalid depth
> /dev/hda9: Problem in HTREE directory inode 16057109: node (257) has bad 
> max hash
> /dev/hda9: Problem in HTREE directory inode 16057109: node (257) not 
> referenced
> (and so on for nodes 258 to 1222)
> /dev/hda9: Problem in HTREE directory inode 16057109: node (1223) has 
> invalid depth
> /dev/hda9: Problem in HTREE directory inode 16057109: node (1223) has bad 
> max hash
> /dev/hda9: Problem in HTREE directory inode 16057109: node (1223) not 
> referenced
> /dev/hda9: Problem in HTREE directory inode 16057109: node (1224) has 
> invalid depth
> /dev/hda9: Problem in HTREE directory inode 16057109: node (1224) has bad 
> max hash
> /dev/hda9: Problem in HTREE directory inode 16057109: node (1224) not 
> referenced
> /dev/hda9: Problem in HTREE directory inode 16057109: node (1225fsck died 
> with exit status 4
> 
> Thu May 29 10:51:25 2008
> ----------------
> 
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users


From jelledejong at powercraft.nl  Thu May 29 14:44:08 2008
From: jelledejong at powercraft.nl (Jelle de Jong)
Date: Thu, 29 May 2008 16:44:08 +0200
Subject: needs help, root inode gone after usb bus reset on sata disks
In-Reply-To: <20080529125816.GD8065@mit.edu>
References: <483BCCC0.5020502@powercraft.nl> <20080527124711.GI7515@mit.edu>
	<483C07EE.1060905@powercraft.nl> <483D6FC5.30109@powercraft.nl>
	<20080528232452.GO6843@mit.edu> <483E7955.7020508@powercraft.nl>
	<20080529125816.GD8065@mit.edu>
Message-ID: <483EC138.5090200@powercraft.nl>

Theodore Tso wrote:
> On Thu, May 29, 2008 at 11:37:25AM +0200, Jelle de Jong wrote:
>> My humble excuse, i had to place the disk in a server and this server had 
>> an older version of the dumpe2fs tool that did not supported the superblock 
>> option. I upgraded the server and rerun all the test for you.
>>
>> dumpe2fs -o superblock=32768 /dev/sda1 > 
>> dumpe2fs-superblock-32768-info-sda1.txt 2>&1
>> dumpe2fs -o superblock=98304 /dev/sda1 > 
>> dumpe2fs-superblock-98304-info-sda1.txt 2>&1
> 
> Unfortunately, it looks like the backup superblocks were also
> corrupted, and in the same way.  Did you *ever* run e2fsck in
> read/write mode (i.e., without the -n option) on this filesystem after
> when you think it had gotten corrupted?

yes i made one mistake the first time i run fsck /dev/sdX i answered yes 
  to the fist question (autoreponse), then i saw the second question and 
saw that it was the same issue as with the previous disk and cancel the 
fsck and reported my issue to the list.

> 
> So what I will suggest at this point is that you do the following:
> 
> debugfs -w /dev/sda1
> debugfs: features dir_index filetype sparse_super
> debugfs: quit
> 
> The run "e2fsck -nf /dev/sda1" and make the output looks relatively
> clean. You should *not* see any messages about needing to relocate
> inode tables.
> 
> If so, you can then run "e2fsck -f /dev/sda1 to fully recover the
> filesysten.
> 
>> I always seem to get the impossible out of Linux tools, but most times this 
>> is during quality tests... however this was on "normal usage". I hope it 
>> has noting to do with the latest release changes or with corrupt binaries 
>> on my client system.
> 
> Well, absolutely no one else reporting this problem or anything like
> it...

Ok, this did not when so great...

e2fsck -nf /dev/sda1 > e2fsck-nf-info-sda1.txt 2>&1
e2fsck -fy /dev/sda1 > e2fsck-fy-info-sda1.txt 2>&1

http://www.powercraft.nl/temp/e2fsck-nf-info-sda1.txt.gz
http://www.powercraft.nl/temp/e2fsck-fy-info-sda1.txt.gz (log is of an 
second resumed run)

root at ashley:/media/sda1# ls -hal
total 8.0K
drwxr-xr-x 3 root root 4.0K 2008-05-29 15:28 .
drwxr-xr-x 5 root root  200 2008-05-29 16:10 ..
drwx------ 2 root root 4.0K 2008-05-29 15:28 lost+found
root at ashley:/media/sda1# cd lost+found/
root at ashley:/media/sda1/lost+found# ls -hal
total 8.0K
drwx------ 2 root root 4.0K 2008-05-29 15:28 .
drwxr-xr-x 3 root root 4.0K 2008-05-29 15:28 ..
root at ashley:/media/sda1/lost+found# df -hal
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             688G  8.0K  653G   1% /media/sda1

nothing there anymore :-S :-S

I am going to restore the backup image back to sda1 it is 750GB so this 
takes a while (14 hours)

Any ideas what went wrong?


From santi at usansolo.net  Thu May 29 15:15:06 2008
From: santi at usansolo.net (santi at usansolo.net)
Date: Thu, 29 May 2008 17:15:06 +0200
Subject: Find and delete "broken files/inodes" in ext3
Message-ID: <3fe72256f91b46d7d760af84a8dd1b67@usansolo.net>

Dear Srs,

I need to remove some broken files/inodes in ext3 filesystem running Linux
2.6.18, I can't delete using userspace utilitys:

# ls -l /tmp/apspkgarc/
total 0
?---------  ? ? ? ?           ? AdvancedPoll-2.03-30.app.zip
?---------  ? ? ? ?           ? Coppermine-1.3.3-45.app.zip
?---------  ? ? ? ?           ? joomla-1.0.12-36.app.zip
?---------  ? ? ? ?           ? Mambo-4.6.2-8.app.zip
?---------  ? ? ? ?           ? osCommerce-2.2ms2-52.app.zip
?---------  ? ? ? ?           ? phpBB-2.0.22-19.app.zip
?---------  ? ? ? ?           ? phpBook-1.50-26.app.zip
?---------  ? ? ? ?           ? WordPress-2.0-21.app.zip

# rm -rf /tmp/apspkgarc
rm: cannot remove directory `/tmp/apspkgarc': Directory not empty


Other times I have solved this problem using debugfs, but I don't know if
it's a secure method to make this:

# umount /dev/i2o/hda4
# debugfs -w /dev/i2o/hda4
debugfs 1.35 (28-Feb-2004)
debugfs: freei /var/run/named.pid

"freei" is a secure method to delete those files?

Also, I'm looking for a method or application to search this type of broken
files, now I'am using this shell one-liners:

export LANG=en_US
find / -type s -prune 2> /tmp/find_broken_files.txt > /dev/null
awk '/No such file or directory/ { print $2; }' /tmp/find_broken_files.txt

All advices are welcome, thanks!!

Regards,

--
Santi Saez


From balu.manyam at gmail.com  Thu May 29 17:01:30 2008
From: balu.manyam at gmail.com (Balu manyam)
Date: Thu, 29 May 2008 22:31:30 +0530
Subject: Find and delete "broken files/inodes" in ext3
In-Reply-To: <3fe72256f91b46d7d760af84a8dd1b67@usansolo.net>
References: <3fe72256f91b46d7d760af84a8dd1b67@usansolo.net>
Message-ID: <995392220805291001r5a1ea410q615505fbd679b6d3@mail.gmail.com>

ok - i wonder whats causing these "broken" files ?

On Thu, May 29, 2008 at 8:45 PM, <santi at usansolo.net> wrote:

> Dear Srs,
>
> I need to remove some broken files/inodes in ext3 filesystem running Linux
> 2.6.18, I can't delete using userspace utilitys:
>
> # ls -l /tmp/apspkgarc/
> total 0
> ?---------  ? ? ? ?           ? AdvancedPoll-2.03-30.app.zip
> ?---------  ? ? ? ?           ? Coppermine-1.3.3-45.app.zip
> ?---------  ? ? ? ?           ? joomla-1.0.12-36.app.zip
> ?---------  ? ? ? ?           ? Mambo-4.6.2-8.app.zip
> ?---------  ? ? ? ?           ? osCommerce-2.2ms2-52.app.zip
> ?---------  ? ? ? ?           ? phpBB-2.0.22-19.app.zip
> ?---------  ? ? ? ?           ? phpBook-1.50-26.app.zip
> ?---------  ? ? ? ?           ? WordPress-2.0-21.app.zip
>
> # rm -rf /tmp/apspkgarc
> rm: cannot remove directory `/tmp/apspkgarc': Directory not empty
>
>
> Other times I have solved this problem using debugfs, but I don't know if
> it's a secure method to make this:
>
> # umount /dev/i2o/hda4
> # debugfs -w /dev/i2o/hda4
> debugfs 1.35 (28-Feb-2004)
> debugfs: freei /var/run/named.pid
>
> "freei" is a secure method to delete those files?
>
> Also, I'm looking for a method or application to search this type of broken
> files, now I'am using this shell one-liners:
>
> export LANG=en_US
> find / -type s -prune 2> /tmp/find_broken_files.txt > /dev/null
> awk '/No such file or directory/ { print $2; }' /tmp/find_broken_files.txt
>
> All advices are welcome, thanks!!
>
> Regards,
>
> --
> Santi Saez
>
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20080529/22f85202/attachment.htm>

From tytso at mit.edu  Thu May 29 20:01:40 2008
From: tytso at mit.edu (Theodore Tso)
Date: Thu, 29 May 2008 16:01:40 -0400
Subject: needs help, root inode gone after usb bus reset on sata disks
In-Reply-To: <483EC138.5090200@powercraft.nl>
References: <483BCCC0.5020502@powercraft.nl> <20080527124711.GI7515@mit.edu>
	<483C07EE.1060905@powercraft.nl> <483D6FC5.30109@powercraft.nl>
	<20080528232452.GO6843@mit.edu> <483E7955.7020508@powercraft.nl>
	<20080529125816.GD8065@mit.edu> <483EC138.5090200@powercraft.nl>
Message-ID: <20080529200140.GF8065@mit.edu>

On Thu, May 29, 2008 at 04:44:08PM +0200, Jelle de Jong wrote:
> Ok, this did not when so great...
>
> e2fsck -nf /dev/sda1 > e2fsck-nf-info-sda1.txt 2>&1
> e2fsck -fy /dev/sda1 > e2fsck-fy-info-sda1.txt 2>&1

OK, this makes no sense whatsoever.  In the first pass, it complained
about the root inode being corrupted; that's fine, that was probably
from the initial hardware corruption.

The second time you ran e2fsck, it didn't complain about anything
until pass #5, so apparently the root inode was OK the second time
around.  But when you run e2fsck with -n, it opens the device
read-only, so theres no way the filesystem could have changed.

If you didn't run any other commands or do anything else between the
two runs of e2fsck, then you have some serious hardware problem where
the disk is not returning the same data between the first and second
e2fsck run.  And if you have wierd hardware problems, either with the
hard drive or how the hard drive is connected to the OS, there's
really nothing e2fsck can do to help you.....

       	       					- Ted


From jelledejong at powercraft.nl  Thu May 29 20:15:08 2008
From: jelledejong at powercraft.nl (Jelle de Jong)
Date: Thu, 29 May 2008 22:15:08 +0200
Subject: needs help, root inode gone after usb bus reset on sata disks
In-Reply-To: <20080529200140.GF8065@mit.edu>
References: <483BCCC0.5020502@powercraft.nl> <20080527124711.GI7515@mit.edu>
	<483C07EE.1060905@powercraft.nl> <483D6FC5.30109@powercraft.nl>
	<20080528232452.GO6843@mit.edu> <483E7955.7020508@powercraft.nl>
	<20080529125816.GD8065@mit.edu> <483EC138.5090200@powercraft.nl>
	<20080529200140.GF8065@mit.edu>
Message-ID: <483F0ECC.7030505@powercraft.nl>

Theodore Tso wrote:
> On Thu, May 29, 2008 at 04:44:08PM +0200, Jelle de Jong wrote:
>> Ok, this did not when so great...
>>
>> e2fsck -nf /dev/sda1 > e2fsck-nf-info-sda1.txt 2>&1
>> e2fsck -fy /dev/sda1 > e2fsck-fy-info-sda1.txt 2>&1
> 
> OK, this makes no sense whatsoever.  In the first pass, it complained
> about the root inode being corrupted; that's fine, that was probably
> from the initial hardware corruption.
> 
> The second time you ran e2fsck, it didn't complain about anything
> until pass #5, so apparently the root inode was OK the second time
> around.  But when you run e2fsck with -n, it opens the device
> read-only, so theres no way the filesystem could have changed.
> 
> If you didn't run any other commands or do anything else between the
> two runs of e2fsck, then you have some serious hardware problem where
> the disk is not returning the same data between the first and second
> e2fsck run.  And if you have wierd hardware problems, either with the
> hard drive or how the hard drive is connected to the OS, there's
> really nothing e2fsck can do to help you.....
> 

no no, i think i did not give enough info sorry :-p

I did the following:

debugfs -w /dev/sda1
debugfs: features dir_index filetype sparse_super
debugfs: quit

then i run

e2fsck -nf /dev/sda1

to see if it still wanted to relocate inodes. This was not the case 
anymore, however it still wanted to relocate the root inode...

I then run:

e2fsck -f /dev/sda1

and manual answer yes to the question until i had to enter a lot of "y" 
(see logs) and killed the program with ctrl-c

then i run the following commands:

e2fsck -nf /dev/sda1 > e2fsck-nf-info-sda1.txt 2>&1
e2fsck -fy /dev/sda1 > e2fsck-fy-info-sda1.txt 2>&1

I am now restoring the backup so i we can try again....

Hope things make more sens now.


Peace,

Jelle


From tytso at mit.edu  Thu May 29 21:20:48 2008
From: tytso at mit.edu (Theodore Tso)
Date: Thu, 29 May 2008 17:20:48 -0400
Subject: needs help, root inode gone after usb bus reset on sata disks
In-Reply-To: <483F0ECC.7030505@powercraft.nl>
References: <483BCCC0.5020502@powercraft.nl> <20080527124711.GI7515@mit.edu>
	<483C07EE.1060905@powercraft.nl> <483D6FC5.30109@powercraft.nl>
	<20080528232452.GO6843@mit.edu> <483E7955.7020508@powercraft.nl>
	<20080529125816.GD8065@mit.edu> <483EC138.5090200@powercraft.nl>
	<20080529200140.GF8065@mit.edu> <483F0ECC.7030505@powercraft.nl>
Message-ID: <20080529212048.GI8065@mit.edu>

On Thu, May 29, 2008 at 10:15:08PM +0200, Jelle de Jong wrote:
> I did the following:
>
> debugfs -w /dev/sda1
> debugfs: features dir_index filetype sparse_super
> debugfs: quit
>
> then i run
>
> e2fsck -nf /dev/sda1
>
> to see if it still wanted to relocate inodes. This was not the case 
> anymore, however it still wanted to relocate the root inode...
>
> I then run:
>
> e2fsck -f /dev/sda1
>
> and manual answer yes to the question until i had to enter a lot of "y" 
> (see logs) and killed the program with ctrl-c

what answers did you answer yes to?  I don't have a log of your
"e2fsck -f /dev/sda1" run, and so I can't tell what happened.  The
e2fsck -fy run you gave me was large, but information-free, since it
just had pass #5 messages regarding adjusting accounting information.

If it was just deleting the root inode (because it was corrupted), and
creating a new root inode, that doesn't explain why all of the inodes
disappeared, unless the inode table had somehow gotten completely
zero'ed out

At this point, what I would probably suggest is that you run 

	e2image -r /dev/hda1 - | bzip2 > hda1.e2i.bz2

... and put it someplace where I can download it and take a look at
what the heck happened to your filesystem.

By the way, please look at the "script" command ("man script"); it is
very handy for capturing a record of what an interactive session with
a program like e2fsck.

							- Ted


From lists at nerdbynature.de  Fri May 30 08:36:06 2008
From: lists at nerdbynature.de (Christian Kujau)
Date: Fri, 30 May 2008 10:36:06 +0200 (CEST)
Subject: Find and delete 'broken files/inodes' in ext3
In-Reply-To: <3fe72256f91b46d7d760af84a8dd1b67@usansolo.net>
References: <3fe72256f91b46d7d760af84a8dd1b67@usansolo.net>
Message-ID: <dc620fe23fbc775a0c8aa1cc296e33d9.squirrel@housecafe.dyndns.org>

On Thu, May 29, 2008 17:15, santi at usansolo.net wrote:
> # ls -l /tmp/apspkgarc/
> total 0
> ?---------  ? ? ? ?           ? AdvancedPoll-2.03-30.app.zip
> ?---------  ? ? ? ?           ? Coppermine-1.3.3-45.app.zip
> ?---------  ? ? ? ?           ? joomla-1.0.12-36.app.zip
> ?---------  ? ? ? ?           ? Mambo-4.6.2-8.app.zip

Hm, looks lile filesystem errors to me. Did you try to e2fsck your /tmp
partition?

> # umount /dev/i2o/hda4
> # debugfs -w /dev/i2o/hda4
> debugfs 1.35 (28-Feb-2004) debugfs: freei /var/run/named.pid
> "freei" is a secure method to delete those files?

If your fs is clean, debugfs can do wonders. If it's not, debugfs seems
rather dangerous...

> find / -type s -prune 2> /tmp/find_broken_files.txt > /dev/null

You're searching for sockets and find will complain to stderr if it cannot
find a referenced file. Again, use e2fsck. No filesystem should have
"broken files"...

C.
-- 
make bzImage, not war