From keith at karsites.net  Sun Dec  5 22:10:38 2010
From: keith at karsites.net (Keith Roberts)
Date: Sun, 5 Dec 2010 22:10:38 +0000 (GMT)
Subject: Inode 196617 has imagic flag set
Message-ID: <alpine.LRH.2.02.1012052205190.3973@www.karsites.net>

This is stopping my new Centos 5.5 installation from 
booting.

I have dropped into maintainance mode and run e2fsck without 
getting any errors. I used the -c option, and no bad blocks 
were found.

I've run another disk checker program called Vivard, from 
the Ultimate Boot CD disk. That completed without any errors 
as well.

So this is a mystery why I get this error at boot time.

Any ideas what's happening please?

Where can I find out more about ext3, such as inodes, dtime 
and imagic flag?

Kind Regards,

Keith Roberts

-- 
In theory, theory and practice are the same;
in practice they are not.

This email was sent from my laptop with Centos 5.5



From tytso at mit.edu  Mon Dec  6 02:24:53 2010
From: tytso at mit.edu (Ted Ts'o)
Date: Sun, 5 Dec 2010 21:24:53 -0500
Subject: Inode 196617 has imagic flag set
In-Reply-To: <alpine.LRH.2.02.1012052205190.3973@www.karsites.net>
References: <alpine.LRH.2.02.1012052205190.3973@www.karsites.net>
Message-ID: <20101206022453.GF4273@thunk.org>

On Sun, Dec 05, 2010 at 10:10:38PM +0000, Keith Roberts wrote:
> This is stopping my new Centos 5.5 installation from booting.
> 
> I have dropped into maintainance mode and run e2fsck without getting
> any errors. I used the -c option, and no bad blocks were found.

That error "Inode ... has imagic flag set" is an e2fsck error.  Do you
have more than one file system on your system?  Maybe you checked the
one file system, and the error was on another file system.

That error very often means that part of your inode table has gotten
corrupted, since that flag should never get set during normal
operation.  (It was implemented for AFS file servers.)

	    	    		    	     - Ted



From keith at karsites.net  Mon Dec  6 13:07:38 2010
From: keith at karsites.net (Keith Roberts)
Date: Mon, 6 Dec 2010 13:07:38 +0000 (GMT)
Subject: Inode 196617 has imagic flag set
In-Reply-To: <20101206022453.GF4273@thunk.org>
References: <alpine.LRH.2.02.1012052205190.3973@www.karsites.net>
	<20101206022453.GF4273@thunk.org>
Message-ID: <alpine.LRH.2.02.1012061254250.3843@www.karsites.net>

On Sun, 5 Dec 2010, Ted Ts'o wrote:

> To: Keith Roberts <keith at karsites.net>
> From: Ted Ts'o <tytso at mit.edu>
> Subject: Re: Inode 196617 has imagic flag set
> 
> On Sun, Dec 05, 2010 at 10:10:38PM +0000, Keith Roberts wrote:
>> This is stopping my new Centos 5.5 installation from booting.
>>
>> I have dropped into maintainance mode and run e2fsck without getting
>> any errors. I used the -c option, and no bad blocks were found.
>
> That error "Inode ... has imagic flag set" is an e2fsck error.  Do you
> have more than one file system on your system?  Maybe you checked the
> one file system, and the error was on another file system.

Absolutely spon on Ted!

I did have a USB stick plugged in, to boot my Kickstart file 
from. I also added another partition to the USB drive, also 
with a partition label called 'websites'. So I could upload 
my websites to my hosting provider from my laptop.

Maybe e2fsck was getting confused at having two partitions 
with the same partition label on the system?

So I renamed the partition on my USB drive, to my-websites.

+++

I have found it now Ted.

I'm on my laptop, and have run e2fskck on the USB drive.

Here is the output:

[root at karsites ~]# e2label /dev/sdb2
my-websites
[root at karsites ~]# e2fsck -vf -Cd /dev/sdb2
e2fsck 1.39 (29-May-2006)
Pass 1: Checking inodes, blocks, and sizes
Deleted inode 196610 has zero dtime.  Fix<y>? no

Inode 196617 is in use, but has dtime set.  Fix<y>? no

Inode 196617 has imagic flag set.  Clear<y>? no

Inode 196618 is in use, but has dtime set.  Fix<y>? no

Inode 196618 has imagic flag set.  Clear<y>? no

Inode 196619 is in use, but has dtime set.  Fix<y>? no

Inode 196619 has imagic flag set.  Clear<y>? no

Inode 196620 is in use, but has dtime set.  Fix<y>? no

Inode 196620 has imagic flag set.  Clear<y>?

So it looks like e2fsck was checking the USB drive at 
bootup, and because it inadvertently had the same partition 
lable name, 'websites'.

I thought it was my main HDD I was installing Centos onto 
that had the errors.

Whew!

That's cool, because this is a brand new 500GB hard drive.

I shall make sure in future, that there are no conflicts 
with my partition label names - especially with removable 
devices like USB drives.

Thanks for all the help.

Kind Regards,

Keith Roberts

PS Are there any PDF docs that would give me an overview of 
the ext3 FS, and how it works?

> That error very often means that part of your inode table has gotten
> corrupted, since that flag should never get set during normal
> operation.  (It was implemented for AFS file servers.)
>
> 	    	    		    	     - Ted
>

-- 
In theory, theory and practice are the same;
in practice they are not.

This email was sent from my laptop with Centos 5.5



From tytso at mit.edu  Mon Dec  6 13:27:44 2010
From: tytso at mit.edu (Ted Ts'o)
Date: Mon, 6 Dec 2010 08:27:44 -0500
Subject: Inode 196617 has imagic flag set
In-Reply-To: <alpine.LRH.2.02.1012061254250.3843@www.karsites.net>
References: <alpine.LRH.2.02.1012052205190.3973@www.karsites.net>
	<20101206022453.GF4273@thunk.org>
	<alpine.LRH.2.02.1012061254250.3843@www.karsites.net>
Message-ID: <20101206132744.GA8135@thunk.org>

On Mon, Dec 06, 2010 at 01:07:38PM +0000, Keith Roberts wrote:
> 
> Maybe e2fsck was getting confused at having two partitions with the
> same partition label on the system?

Well, the blkid library, specifically, was getting confused.  E2fsck
uses the blkid library to map LABEL= and UUID= references to device
names.

One thing that you might do for devices that are always present (which
generally means you're not changing them in your /etc/fstab often) is
to use a UUID= reference instead.  They are definitely less convenient
than labels, but they are also much less likely to cause confusion by
having duplicately labelled file systems.

Granted, no one wants to *type* "mount
UUID=9e132c06-1fd0-4bbd-ad06-5995a8f45b26"; they'd much rather type
"mount LABEL=websites".  But if the only place the a
UUID=... specification shows up is in /etc/fstab, it might be worth
it.

Then you can just zap the label on file systems that you only plan to
reference via UUID, and then that reduces the chances for confusion in
the future.

Best regards,

> PS Are there any PDF docs that would give me an overview of the ext3
> FS, and how it works?

Try the list of Articles and Publications here:

https://ext4.wiki.kernel.org/index.php/Publications

						- Ted



From keith at karsites.net  Mon Dec  6 14:20:24 2010
From: keith at karsites.net (Keith Roberts)
Date: Mon, 6 Dec 2010 14:20:24 +0000 (GMT)
Subject: Inode 196617 has imagic flag set
In-Reply-To: <20101206132744.GA8135@thunk.org>
References: <alpine.LRH.2.02.1012052205190.3973@www.karsites.net>
	<20101206022453.GF4273@thunk.org>
	<alpine.LRH.2.02.1012061254250.3843@www.karsites.net>
	<20101206132744.GA8135@thunk.org>
Message-ID: <alpine.LRH.2.02.1012061414040.4051@www.karsites.net>

On Mon, 6 Dec 2010, Ted Ts'o wrote:

> To: Keith Roberts <keith at karsites.net>
> From: Ted Ts'o <tytso at mit.edu>
> Subject: Re: Inode 196617 has imagic flag set
> 
> On Mon, Dec 06, 2010 at 01:07:38PM +0000, Keith Roberts wrote:
>>
>> Maybe e2fsck was getting confused at having two partitions with the
>> same partition label on the system?
>
> Well, the blkid library, specifically, was getting confused.  E2fsck
> uses the blkid library to map LABEL= and UUID= references to device
> names.

What about if th blkid library was to report some sort of 
'duplicate label name' error, when it maps the devices to 
label names?

That would be a great help.

> One thing that you might do for devices that are always present (which
> generally means you're not changing them in your /etc/fstab often) is
> to use a UUID= reference instead.  They are definitely less convenient
> than labels, but they are also much less likely to cause confusion by
> having duplicately labelled file systems.
>
> Granted, no one wants to *type* "mount
> UUID=9e132c06-1fd0-4bbd-ad06-5995a8f45b26"; they'd much rather type
> "mount LABEL=websites".  But if the only place the a
> UUID=... specification shows up is in /etc/fstab, it might be worth
> it.

Can I set the UUID value, with some descriptive 
text,(something like a long label name), or is the UUID only 
system generated?

> Then you can just zap the label on file systems that you only plan to
> reference via UUID, and then that reduces the chances for confusion in
> the future.
>
> Best regards,
>
>> PS Are there any PDF docs that would give me an overview of the ext3
>> FS, and how it works?
>
> Try the list of Articles and Publications here:
>
> https://ext4.wiki.kernel.org/index.php/Publications

I'm reading up on that stuff now.

The first link in the list is interesting :)

I think the reason the USB has SO MANY errors on the FS is 
because I possibly unplugged it, before umounting it!

I understand that the umount command flushes any disk I/O 
buffers back to the drive?

Kind Regards,

Keith Roberts

---

In theory, theory and practice are the same;
in practice they are not.

This email was sent from my laptop with Centos 5.5



From tytso at mit.edu  Mon Dec  6 15:46:46 2010
From: tytso at mit.edu (Ted Ts'o)
Date: Mon, 6 Dec 2010 10:46:46 -0500
Subject: Inode 196617 has imagic flag set
In-Reply-To: <alpine.LRH.2.02.1012061414040.4051@www.karsites.net>
References: <alpine.LRH.2.02.1012052205190.3973@www.karsites.net>
	<20101206022453.GF4273@thunk.org>
	<alpine.LRH.2.02.1012061254250.3843@www.karsites.net>
	<20101206132744.GA8135@thunk.org>
	<alpine.LRH.2.02.1012061414040.4051@www.karsites.net>
Message-ID: <20101206154646.GE8135@thunk.org>

On Mon, Dec 06, 2010 at 02:20:24PM +0000, Keith Roberts wrote:
> 
> Can I set the UUID value, with some descriptive text,(something like
> a long label name), or is the UUID only system generated?

The UUID is a Universally Unique ID; it is a 128-byte number,
constructed using the rules specified by RFC 4122.  See:

	    http://www.ietf.org/rfc/rfc4122.txt

You can set the UUID to something else, although the only way I
suggest people make use of this functionality to use "tune2fs -U
random" after doing an image copy of a file system.  The whole point
of a UUID is that it should be universally unique, and humans are
notoriously bad at picking ID's that are truly unqiue.

> The first link in the list is interesting :)
> 
> I think the reason the USB has SO MANY errors on the FS is because I
> possibly unplugged it, before umounting it!
> 
> I understand that the umount command flushes any disk I/O buffers
> back to the drive?

Correct; I'd do a sync after the umount just to be absolutely sure,
though.  IIRC the umount doesn't wait until the USB stick has
acknowledged that it is done writing everything to flash.

	     	       	     	 	- Ted



From danielk1977 at gmail.com  Mon Dec  6 18:42:08 2010
From: danielk1977 at gmail.com (Dan Kennedy)
Date: Tue, 07 Dec 2010 01:42:08 +0700
Subject: SQLite and ext3 journalling mode
Message-ID: <4CFD2E80.5090203@gmail.com>

Hi,

Are SQLite users that are worried about losing data that has been
committed (fsynced) better off setting data=journal than
data=ordered (or even data=writeback)?

The context is trying to reduce the number of writes to a flash
file-system without sacrificing data integrity in the event of a
power failure or OS crash.

Thanks,
Dan Kennedy.



From Ralf.Hildebrandt at charite.de  Mon Dec  6 19:34:23 2010
From: Ralf.Hildebrandt at charite.de (Ralf Hildebrandt)
Date: Mon, 6 Dec 2010 20:34:23 +0100
Subject: Squid and first-level subdirectories & second-level
	subdirectories	on ext3/4
In-Reply-To: <6982E360-6DE7-4DEF-8F4D-E5031EB7151B@dilger.ca>
References: <20100813100055.GT16572@charite.de>
	<6982E360-6DE7-4DEF-8F4D-E5031EB7151B@dilger.ca>
Message-ID: <20101206193423.GC28967@charite.de>

* Andreas Dilger <adilger at dilger.ca>:

> In ext3/4 the top-level inodes are spread around the filesystem, on the
> assumtion that something like /home or / is allocating trees of
> unrelated subdirectories at the top level, but that files within those
> subdirectories ARE related and should be allocated together.
> 
> Depending on how many files are in your cache, the 256 * {small files}
> is likely too big to fit into a single block group (32k inodes, 32k
> blocks) so you may want to consider marking the first level of
> subdirectories with the "TOPDIR" flag, that indicates the second-level
> (256) subdirs should also be spread around the disk.

How do I mark subdirectories with the "TOPDIR" flag?

-- 
Ralf Hildebrandt
  Gesch?ftsbereich IT | Abteilung Netzwerk
  Charit? - Universit?tsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebrandt at charite.de | http://www.charite.de
	    



From Ralf.Hildebrandt at charite.de  Mon Dec  6 20:33:27 2010
From: Ralf.Hildebrandt at charite.de (Ralf Hildebrandt)
Date: Mon, 6 Dec 2010 21:33:27 +0100
Subject: Squid and first-level subdirectories & second-level
	subdirectories	on ext3/4
In-Reply-To: <20101206193423.GC28967@charite.de>
References: <20100813100055.GT16572@charite.de>
	<6982E360-6DE7-4DEF-8F4D-E5031EB7151B@dilger.ca>
	<20101206193423.GC28967@charite.de>
Message-ID: <20101206203327.GG28967@charite.de>

* Ralf Hildebrandt <Ralf.Hildebrandt at charite.de>:
> * Andreas Dilger <adilger at dilger.ca>:
> 
> > In ext3/4 the top-level inodes are spread around the filesystem, on the
> > assumtion that something like /home or / is allocating trees of
> > unrelated subdirectories at the top level, but that files within those
> > subdirectories ARE related and should be allocated together.
> > 
> > Depending on how many files are in your cache, the 256 * {small files}
> > is likely too big to fit into a single block group (32k inodes, 32k
> > blocks) so you may want to consider marking the first level of
> > subdirectories with the "TOPDIR" flag, that indicates the second-level
> > (256) subdirs should also be spread around the disk.
> 
> How do I mark subdirectories with the "TOPDIR" flag?

Found it: chattr +T



From tytso at mit.edu  Mon Dec  6 21:31:02 2010
From: tytso at mit.edu (Ted Ts'o)
Date: Mon, 6 Dec 2010 16:31:02 -0500
Subject: SQLite and ext3 journalling mode
In-Reply-To: <4CFD2E80.5090203@gmail.com>
References: <4CFD2E80.5090203@gmail.com>
Message-ID: <20101206213102.GA24607@thunk.org>

On Tue, Dec 07, 2010 at 01:42:08AM +0700, Dan Kennedy wrote:
> 
> Are SQLite users that are worried about losing data that has been
> committed (fsynced) better off setting data=journal than
> data=ordered (or even data=writeback)?

Well, they won't be better off a data integrity point of view.
Depending on how SQLite is configured, and how many fsync's are issued
by SQLite in response to application queries, and depending on your
background workload by other applications, using data=journal *might*
be a performance win.

In general, though, if you have background workloads that are
downloading torrents, data=journal is going to hurt a lot.  So I don't
recommend it except for fairly specialized deployments where there's
only one primary user of the file system.

          	 	      	     	  - Ted



From danielk1977 at gmail.com  Wed Dec  8 11:52:32 2010
From: danielk1977 at gmail.com (Dan Kennedy)
Date: Wed, 08 Dec 2010 18:52:32 +0700
Subject: SQLite and ext3 journalling mode
In-Reply-To: <20101206213102.GA24607@thunk.org>
References: <4CFD2E80.5090203@gmail.com> <20101206213102.GA24607@thunk.org>
Message-ID: <4CFF7180.1030607@gmail.com>

On 12/07/2010 04:31 AM, Ted Ts'o wrote:
> On Tue, Dec 07, 2010 at 01:42:08AM +0700, Dan Kennedy wrote:
>>
>> Are SQLite users that are worried about losing data that has been
>> committed (fsynced) better off setting data=journal than
>> data=ordered (or even data=writeback)?
>
> Well, they won't be better off a data integrity point of view.
> Depending on how SQLite is configured, and how many fsync's are issued
> by SQLite in response to application queries, and depending on your
> background workload by other applications, using data=journal *might*
> be a performance win.
>
> In general, though, if you have background workloads that are
> downloading torrents, data=journal is going to hurt a lot.  So I don't
> recommend it except for fairly specialized deployments where there's
> only one primary user of the file system.

Thanks. But to be clear, is data=ordered better than data=writeback
wrt. data integrity following a power failure?

Regards,
Dan.



From ricwheeler at gmail.com  Wed Dec  8 16:25:06 2010
From: ricwheeler at gmail.com (Ric Wheeler)
Date: Wed, 08 Dec 2010 11:25:06 -0500
Subject: SQLite and ext3 journalling mode
In-Reply-To: <4CFF7180.1030607@gmail.com>
References: <4CFD2E80.5090203@gmail.com> <20101206213102.GA24607@thunk.org>
	<4CFF7180.1030607@gmail.com>
Message-ID: <4CFFB162.9030306@gmail.com>

On 12/08/2010 06:52 AM, Dan Kennedy wrote:
> On 12/07/2010 04:31 AM, Ted Ts'o wrote:
>> On Tue, Dec 07, 2010 at 01:42:08AM +0700, Dan Kennedy wrote:
>>>
>>> Are SQLite users that are worried about losing data that has been
>>> committed (fsynced) better off setting data=journal than
>>> data=ordered (or even data=writeback)?
>>
>> Well, they won't be better off a data integrity point of view.
>> Depending on how SQLite is configured, and how many fsync's are issued
>> by SQLite in response to application queries, and depending on your
>> background workload by other applications, using data=journal *might*
>> be a performance win.
>>
>> In general, though, if you have background workloads that are
>> downloading torrents, data=journal is going to hurt a lot.  So I don't
>> recommend it except for fairly specialized deployments where there's
>> only one primary user of the file system.
>
> Thanks. But to be clear, is data=ordered better than data=writeback
> wrt. data integrity following a power failure?
>
> Regards,
> Dan.
>

Data integrity can mean a couple of different things.

If you are file system meta-data centric (i.e., a file system developer or just 
worried about having to run fsck after a crash to repair the file system), then 
both options *should* be equivalent.

If you are one of those annoying users who define data integrity to include 
those annoying details like will my file have garbage in it after a crash that 
will make my DB or other app puke, then data ordered is clearly more robust.

Note that most distributions (including RHEL) support & focus testing only 
ordered mode....

Hope this helps :)

Ric



From drh at sqlite.org  Wed Dec  8 16:56:21 2010
From: drh at sqlite.org (Richard Hipp)
Date: Wed, 8 Dec 2010 11:56:21 -0500
Subject: SQLite and ext3 journalling mode
In-Reply-To: <4CFFB162.9030306@gmail.com>
References: <4CFD2E80.5090203@gmail.com> <20101206213102.GA24607@thunk.org>
	<4CFF7180.1030607@gmail.com> <4CFFB162.9030306@gmail.com>
Message-ID: <AANLkTi=dQN8sdO-+Ps2hkoKf+6kbuxvXOXyvPB46GtN-@mail.gmail.com>

On Wed, Dec 8, 2010 at 11:25 AM, Ric Wheeler <ricwheeler at gmail.com> wrote:

> On 12/08/2010 06:52 AM, Dan Kennedy wrote:
>
>>
>> Thanks. But to be clear, is data=ordered better than data=writeback
>> wrt. data integrity following a power failure?
>>
>> Regards,
>> Dan.
>>
>>
> Data integrity can mean a couple of different things.
>
> If you are file system meta-data centric (i.e., a file system developer or
> just worried about having to run fsck after a crash to repair the file
> system), then both options *should* be equivalent.
>
> If you are one of those annoying users who define data integrity to include
> those annoying details like will my file have garbage in it after a crash
> that will make my DB or other app puke, then data ordered is clearly more
> robust.
>

Thanks, Ric.  Yes, we are numbered among the "annoying users".  Based on
what you are telling us, we'll recommend that people use data=ordered,
barrier=1 for maximum data reliability in the face of power loss.


>
> Note that most distributions (including RHEL) support & focus testing only
> ordered mode....
>
> Hope this helps :)
>
> Ric
>
>
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users
>



-- 
D. Richard Hipp
drh at sqlite.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20101208/fda3521f/attachment.htm>

From ricwheeler at gmail.com  Wed Dec  8 17:07:32 2010
From: ricwheeler at gmail.com (Ric Wheeler)
Date: Wed, 08 Dec 2010 12:07:32 -0500
Subject: SQLite and ext3 journalling mode
In-Reply-To: <AANLkTi=dQN8sdO-+Ps2hkoKf+6kbuxvXOXyvPB46GtN-@mail.gmail.com>
References: <4CFD2E80.5090203@gmail.com>
	<20101206213102.GA24607@thunk.org>	<4CFF7180.1030607@gmail.com>
	<4CFFB162.9030306@gmail.com>
	<AANLkTi=dQN8sdO-+Ps2hkoKf+6kbuxvXOXyvPB46GtN-@mail.gmail.com>
Message-ID: <4CFFBB54.8010205@gmail.com>

On 12/08/2010 11:56 AM, Richard Hipp wrote:
>
>
> On Wed, Dec 8, 2010 at 11:25 AM, Ric Wheeler <ricwheeler at gmail.com 
> <mailto:ricwheeler at gmail.com>> wrote:
>
>     On 12/08/2010 06:52 AM, Dan Kennedy wrote:
>
>
>         Thanks. But to be clear, is data=ordered better than data=writeback
>         wrt. data integrity following a power failure?
>
>         Regards,
>         Dan.
>
>
>     Data integrity can mean a couple of different things.
>
>     If you are file system meta-data centric (i.e., a file system developer or
>     just worried about having to run fsck after a crash to repair the file
>     system), then both options *should* be equivalent.
>
>     If you are one of those annoying users who define data integrity to
>     include those annoying details like will my file have garbage in it after
>     a crash that will make my DB or other app puke, then data ordered is
>     clearly more robust.
>
>
> Thanks, Ric.  Yes, we are numbered among the "annoying users".  Based on what 
> you are telling us, we'll recommend that people use data=ordered, barrier=1 
> for maximum data reliability in the face of power loss.

That is what I do as well - there are use cases and users that prefer the lower 
latency and can accept the trade offs that come with data writeback or 
non-barrier use, but I certainly think most users would be better using the 
settings you have above.

Good luck!

Ric



From Mike.Miller at hp.com  Wed Dec  8 19:02:13 2010
From: Mike.Miller at hp.com (Miller, Mike (OS Dev))
Date: Wed, 8 Dec 2010 19:02:13 +0000
Subject: SQLite and ext3 journalling mode
In-Reply-To: <4CFFBB54.8010205@gmail.com>
References: <4CFD2E80.5090203@gmail.com>	<20101206213102.GA24607@thunk.org>
	<4CFF7180.1030607@gmail.com>	<4CFFB162.9030306@gmail.com>
	<AANLkTi=dQN8sdO-+Ps2hkoKf+6kbuxvXOXyvPB46GtN-@mail.gmail.com>
	<4CFFBB54.8010205@gmail.com>
Message-ID: <0F5B06BAB751E047AB5C87D1F77A77887D1070C88B@GVW0547EXC.americas.hpqcorp.net>



> -----Original Message-----
> From: ext3-users-bounces at redhat.com [mailto:ext3-users-
> bounces at redhat.com] On Behalf Of Ric Wheeler
> Sent: Wednesday, December 08, 2010 11:08 AM
> To: Richard Hipp
> Cc: ext3-users at redhat.com
> Subject: Re: SQLite and ext3 journalling mode
> 
> On 12/08/2010 11:56 AM, Richard Hipp wrote:
> >
> >
> > On Wed, Dec 8, 2010 at 11:25 AM, Ric Wheeler <ricwheeler at gmail.com
> > <mailto:ricwheeler at gmail.com>> wrote:
> >
> >     On 12/08/2010 06:52 AM, Dan Kennedy wrote:
> >
> >
> >         Thanks. But to be clear, is data=ordered better than
> data=writeback
> >         wrt. data integrity following a power failure?
> >
> >         Regards,
> >         Dan.
> >
> >
> >     Data integrity can mean a couple of different things.
> >
> >     If you are file system meta-data centric (i.e., a file system
> developer or
> >     just worried about having to run fsck after a crash to repair the
> file
> >     system), then both options *should* be equivalent.
> >
> >     If you are one of those annoying users who define data integrity
> to
> >     include those annoying details like will my file have garbage in
> it after
> >     a crash that will make my DB or other app puke, then data ordered
> is
> >     clearly more robust.
> >
> >
> > Thanks, Ric.  Yes, we are numbered among the "annoying users".  Based
> on what
> > you are telling us, we'll recommend that people use data=ordered,
> barrier=1

Just as an FYI, not all HW vendors enable the drive write cache especially on array controllers. In those cases barriers do nothing.

-- mikem

> > for maximum data reliability in the face of power loss.
> 
> That is what I do as well - there are use cases and users that prefer
> the lower
> latency and can accept the trade offs that come with data writeback or
> non-barrier use, but I certainly think most users would be better using
> the
> settings you have above.
> 
> Good luck!
> 
> Ric
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users



From rwheeler at redhat.com  Wed Dec  8 13:26:27 2010
From: rwheeler at redhat.com (Ric Wheeler)
Date: Wed, 08 Dec 2010 08:26:27 -0500
Subject: SQLite and ext3 journalling mode
In-Reply-To: <0F5B06BAB751E047AB5C87D1F77A77887D1070C88B@GVW0547EXC.americas.hpqcorp.net>
References: <4CFD2E80.5090203@gmail.com>	<20101206213102.GA24607@thunk.org>	<4CFF7180.1030607@gmail.com>	<4CFFB162.9030306@gmail.com>	<AANLkTi=dQN8sdO-+Ps2hkoKf+6kbuxvXOXyvPB46GtN-@mail.gmail.com>	<4CFFBB54.8010205@gmail.com>
	<0F5B06BAB751E047AB5C87D1F77A77887D1070C88B@GVW0547EXC.americas.hpqcorp.net>
Message-ID: <4CFF8783.60104@redhat.com>

On 12/08/2010 02:02 PM, Miller, Mike (OS Dev) wrote:
>
>> -----Original Message-----
>> From: ext3-users-bounces at redhat.com [mailto:ext3-users-
>> bounces at redhat.com] On Behalf Of Ric Wheeler
>> Sent: Wednesday, December 08, 2010 11:08 AM
>> To: Richard Hipp
>> Cc: ext3-users at redhat.com
>> Subject: Re: SQLite and ext3 journalling mode
>>
>> On 12/08/2010 11:56 AM, Richard Hipp wrote:
>>>
>>> On Wed, Dec 8, 2010 at 11:25 AM, Ric Wheeler<ricwheeler at gmail.com
>>> <mailto:ricwheeler at gmail.com>>  wrote:
>>>
>>>      On 12/08/2010 06:52 AM, Dan Kennedy wrote:
>>>
>>>
>>>          Thanks. But to be clear, is data=ordered better than
>> data=writeback
>>>          wrt. data integrity following a power failure?
>>>
>>>          Regards,
>>>          Dan.
>>>
>>>
>>>      Data integrity can mean a couple of different things.
>>>
>>>      If you are file system meta-data centric (i.e., a file system
>> developer or
>>>      just worried about having to run fsck after a crash to repair the
>> file
>>>      system), then both options *should* be equivalent.
>>>
>>>      If you are one of those annoying users who define data integrity
>> to
>>>      include those annoying details like will my file have garbage in
>> it after
>>>      a crash that will make my DB or other app puke, then data ordered
>> is
>>>      clearly more robust.
>>>
>>>
>>> Thanks, Ric.  Yes, we are numbered among the "annoying users".  Based
>> on what
>>> you are telling us, we'll recommend that people use data=ordered,
>> barrier=1
> Just as an FYI, not all HW vendors enable the drive write cache especially on array controllers. In those cases barriers do nothing.
>
> -- mikem
>
>

Right - upstream has been working to make sure that we can default to barriers 
on and not see a performance hit for devices like arrays that don't need them ...

Ric



From tytso at mit.edu  Wed Dec  8 22:30:15 2010
From: tytso at mit.edu (Ted Ts'o)
Date: Wed, 8 Dec 2010 17:30:15 -0500
Subject: SQLite and ext3 journalling mode
In-Reply-To: <AANLkTi=dQN8sdO-+Ps2hkoKf+6kbuxvXOXyvPB46GtN-@mail.gmail.com>
References: <4CFD2E80.5090203@gmail.com> <20101206213102.GA24607@thunk.org>
	<4CFF7180.1030607@gmail.com> <4CFFB162.9030306@gmail.com>
	<AANLkTi=dQN8sdO-+Ps2hkoKf+6kbuxvXOXyvPB46GtN-@mail.gmail.com>
Message-ID: <20101208223015.GD2921@thunk.org>

On Wed, Dec 08, 2010 at 11:56:21AM -0500, Richard Hipp wrote:
> 
> Thanks, Ric.  Yes, we are numbered among the "annoying users".  Based on
> what you are telling us, we'll recommend that people use data=ordered,
> barrier=1 for maximum data reliability in the face of power loss.

Note that given that SQLite uses fsync() properly, data=ordered
shouldn't hurt or help --- with respect to the very narrow issue of
data integrity in the SQLite database.

What data=ordered protects against is the possibility that stale data
(from a deleted file, possibly from some other user's) might show up
as "garbage" in a file whose newly allocated blocks didn't get written
out to disk before a system crash or other unclean shutdown.

To the extent that SQLite properly uses fsync() and doesn't pay
attention to data in parts of the db file that hasn't been committed
yet, this won't matter.  From the perspective of users who care about
stale data showing up in a file being a potential security issue, they
should definitely use data=ordered in ext3.

However, if you have a specific use case where the only thing that
matters is SQLite, then assuming that fsync() is being used properly,
SQLite itself should be OK.

I hope this helps,

						- Ted



From florian.weber at bn-paf.de  Mon Dec 27 16:38:09 2010
From: florian.weber at bn-paf.de (Florian Weber)
Date: Mon, 27 Dec 2010 16:38:09 -0000
Subject: Overwritten beginning of ext3 filesystem. Recovery?
Message-ID: <20101227173751.qmvcc2ec8cgwwk0k@mail.bn-paf.de>

Hello list

I accidentally trashed the first ~10-20GB of a 1TB ext3 filesystem with
a heedless RAID1 rebuild (excruciating detail below). I'm now looking for
options to get as much as possible of the remaining data back.

I've been searching the web for over a day now but all my results are  
either not
what I need (MBR, partition table and superblock are OK) or too lowlevel
(revocering many thousands of nameless and structureless mails/jpgs/docs just
doesn't cut it here, IMVHO).
My main problem is not that I accidentally deleted files, but that  
basically my
/ directory just went "poof" and left the rest sitting around.

Since the damaged filesystem was clean before my accident, I'm figuring I just
might get most of the data back: even much of the directory structure should
still be there if I only knew how to get at it.

I'd be most grateful for any tips, tools, or even documentation to aid in
writing my own tool.

Thans in advance for your time,
Florian Weber

PS: that _was_ my backup :-( Thanks for not mentioning it.

-------------

Details:

Starting point:
---------------
I've been running the following setup on my machine:
* Two same-size harddisks, currently 1TB, one big partition each -->  
sda[1], sdb[1]
* Linux software RAID1 consisting of these partitions --> md0
* A single ext3 filesystem, default parameters, reserved blocks lowered to 1%
* All system and data inside this single partition, ca. 350-400GB
* (Much too) infrequent backups ... yes, yes, I know, I know ...

Intentions:
-----------
After many years, I wanted to move from Gentoo to KUbuntu. No big deal:
* Shutdown PC, pull disk sdb from the RAID
* Install Ubuntu on sda as if working on a blank disk (setup as above, with
one of the RAID1 disks physically missing during the install)
* Boot the new system from sda, still in degraded mode
* Treating sdb like a standalone ext3 disk: mount, copy configs and /home,
umount
* Get the system into working order (config files reconciled, all applications
running)
* Determine that the "old" stuff is not needed anymore
* Put sdb back into the RAID1 and rebuild

What went wrong:
----------------
Before the initial shutdown, I did not change the partition type on sdb from
0xFD to 0x83 to prevent RAID autodetection. Booting with sdb  
reattached (to get
at my personal data) would therefore (correctly) have resulted in a  
RAID rebuild --> very bad.

So I figured: I'll attach the disk, boot with "raid=noautodetect" in the
kernel commandline, and I'll be fine. But: unlike my previous setup,  
Ubuntu has
a silent bootloader and I missed my chance to enter the commandline.

And the RAID instantly started rebuilding itself onto my backup disk :-O

I quickly realised what was happening and cleanly shut down my
system (incurring some additional damage from the running rebuild, but  
the worst
was already done). Total running time was about 3 minutes, in parallel to the
system booting up and shutting down.

What I have now:
----------------
* A working, new Ubuntu installation on a degraded RAID1 array, without
personal data. I'm currently typing on this system.

* A harddisk (sdb) that previously contained a working system with a total of
350-400GB data, but was subject to a RAID1 rebuild for <3-4 minutes
at <=100MB/sec. The disc is not connected at the moment.

* The MBR on sdb is the new one. That's OK.
* The partition table on sdb is the new one. It looks identical to the  
old one.
* The ext3 superblock on sdb1 is the new one. It's basically the same  
as the old
one. I compared it against one of the (old) backup superblocks at the  
end of the
partition.
* I have a dd image of partition sdb1
* I can mount the image of sdb1 and do an ls. I see data from the new  
system. Much content is missing, obviously, since it was not synced  
over yet
* I can "fsck -n" the image of sdb1. Many errors of course ("inode contains
invalid block", "too many illegal blocks", "i_size wrong", "i_Blocks wrong"),
since much stuff was not synced over yet
* At some point, "fsck -n" stops with "illegal indirect block"
* I have not yet tried to "fsck -y". That would be my next step.

* I have 1TB of free space available and can organise more

I do realise this is not for the faint of heart, but I'm done with my  
fainting for this instance ;-)

Still with hope,
Florian Weber

-------------------------------------------------------
Buergernetz Pfaffenhofen Webmail - http://www.bn-paf.de