From keith at karsites.net Sun Dec 5 22:10:38 2010 From: keith at karsites.net (Keith Roberts) Date: Sun, 5 Dec 2010 22:10:38 +0000 (GMT) Subject: Inode 196617 has imagic flag set Message-ID: This is stopping my new Centos 5.5 installation from booting. I have dropped into maintainance mode and run e2fsck without getting any errors. I used the -c option, and no bad blocks were found. I've run another disk checker program called Vivard, from the Ultimate Boot CD disk. That completed without any errors as well. So this is a mystery why I get this error at boot time. Any ideas what's happening please? Where can I find out more about ext3, such as inodes, dtime and imagic flag? Kind Regards, Keith Roberts -- In theory, theory and practice are the same; in practice they are not. This email was sent from my laptop with Centos 5.5 From tytso at mit.edu Mon Dec 6 02:24:53 2010 From: tytso at mit.edu (Ted Ts'o) Date: Sun, 5 Dec 2010 21:24:53 -0500 Subject: Inode 196617 has imagic flag set In-Reply-To: References: Message-ID: <20101206022453.GF4273@thunk.org> On Sun, Dec 05, 2010 at 10:10:38PM +0000, Keith Roberts wrote: > This is stopping my new Centos 5.5 installation from booting. > > I have dropped into maintainance mode and run e2fsck without getting > any errors. I used the -c option, and no bad blocks were found. That error "Inode ... has imagic flag set" is an e2fsck error. Do you have more than one file system on your system? Maybe you checked the one file system, and the error was on another file system. That error very often means that part of your inode table has gotten corrupted, since that flag should never get set during normal operation. (It was implemented for AFS file servers.) - Ted From keith at karsites.net Mon Dec 6 13:07:38 2010 From: keith at karsites.net (Keith Roberts) Date: Mon, 6 Dec 2010 13:07:38 +0000 (GMT) Subject: Inode 196617 has imagic flag set In-Reply-To: <20101206022453.GF4273@thunk.org> References: <20101206022453.GF4273@thunk.org> Message-ID: On Sun, 5 Dec 2010, Ted Ts'o wrote: > To: Keith Roberts > From: Ted Ts'o > Subject: Re: Inode 196617 has imagic flag set > > On Sun, Dec 05, 2010 at 10:10:38PM +0000, Keith Roberts wrote: >> This is stopping my new Centos 5.5 installation from booting. >> >> I have dropped into maintainance mode and run e2fsck without getting >> any errors. I used the -c option, and no bad blocks were found. > > That error "Inode ... has imagic flag set" is an e2fsck error. Do you > have more than one file system on your system? Maybe you checked the > one file system, and the error was on another file system. Absolutely spon on Ted! I did have a USB stick plugged in, to boot my Kickstart file from. I also added another partition to the USB drive, also with a partition label called 'websites'. So I could upload my websites to my hosting provider from my laptop. Maybe e2fsck was getting confused at having two partitions with the same partition label on the system? So I renamed the partition on my USB drive, to my-websites. +++ I have found it now Ted. I'm on my laptop, and have run e2fskck on the USB drive. Here is the output: [root at karsites ~]# e2label /dev/sdb2 my-websites [root at karsites ~]# e2fsck -vf -Cd /dev/sdb2 e2fsck 1.39 (29-May-2006) Pass 1: Checking inodes, blocks, and sizes Deleted inode 196610 has zero dtime. Fix? no Inode 196617 is in use, but has dtime set. Fix? no Inode 196617 has imagic flag set. Clear? no Inode 196618 is in use, but has dtime set. Fix? no Inode 196618 has imagic flag set. Clear? no Inode 196619 is in use, but has dtime set. Fix? no Inode 196619 has imagic flag set. Clear? no Inode 196620 is in use, but has dtime set. Fix? no Inode 196620 has imagic flag set. Clear? So it looks like e2fsck was checking the USB drive at bootup, and because it inadvertently had the same partition lable name, 'websites'. I thought it was my main HDD I was installing Centos onto that had the errors. Whew! That's cool, because this is a brand new 500GB hard drive. I shall make sure in future, that there are no conflicts with my partition label names - especially with removable devices like USB drives. Thanks for all the help. Kind Regards, Keith Roberts PS Are there any PDF docs that would give me an overview of the ext3 FS, and how it works? > That error very often means that part of your inode table has gotten > corrupted, since that flag should never get set during normal > operation. (It was implemented for AFS file servers.) > > - Ted > -- In theory, theory and practice are the same; in practice they are not. This email was sent from my laptop with Centos 5.5 From tytso at mit.edu Mon Dec 6 13:27:44 2010 From: tytso at mit.edu (Ted Ts'o) Date: Mon, 6 Dec 2010 08:27:44 -0500 Subject: Inode 196617 has imagic flag set In-Reply-To: References: <20101206022453.GF4273@thunk.org> Message-ID: <20101206132744.GA8135@thunk.org> On Mon, Dec 06, 2010 at 01:07:38PM +0000, Keith Roberts wrote: > > Maybe e2fsck was getting confused at having two partitions with the > same partition label on the system? Well, the blkid library, specifically, was getting confused. E2fsck uses the blkid library to map LABEL= and UUID= references to device names. One thing that you might do for devices that are always present (which generally means you're not changing them in your /etc/fstab often) is to use a UUID= reference instead. They are definitely less convenient than labels, but they are also much less likely to cause confusion by having duplicately labelled file systems. Granted, no one wants to *type* "mount UUID=9e132c06-1fd0-4bbd-ad06-5995a8f45b26"; they'd much rather type "mount LABEL=websites". But if the only place the a UUID=... specification shows up is in /etc/fstab, it might be worth it. Then you can just zap the label on file systems that you only plan to reference via UUID, and then that reduces the chances for confusion in the future. Best regards, > PS Are there any PDF docs that would give me an overview of the ext3 > FS, and how it works? Try the list of Articles and Publications here: https://ext4.wiki.kernel.org/index.php/Publications - Ted From keith at karsites.net Mon Dec 6 14:20:24 2010 From: keith at karsites.net (Keith Roberts) Date: Mon, 6 Dec 2010 14:20:24 +0000 (GMT) Subject: Inode 196617 has imagic flag set In-Reply-To: <20101206132744.GA8135@thunk.org> References: <20101206022453.GF4273@thunk.org> <20101206132744.GA8135@thunk.org> Message-ID: On Mon, 6 Dec 2010, Ted Ts'o wrote: > To: Keith Roberts > From: Ted Ts'o > Subject: Re: Inode 196617 has imagic flag set > > On Mon, Dec 06, 2010 at 01:07:38PM +0000, Keith Roberts wrote: >> >> Maybe e2fsck was getting confused at having two partitions with the >> same partition label on the system? > > Well, the blkid library, specifically, was getting confused. E2fsck > uses the blkid library to map LABEL= and UUID= references to device > names. What about if th blkid library was to report some sort of 'duplicate label name' error, when it maps the devices to label names? That would be a great help. > One thing that you might do for devices that are always present (which > generally means you're not changing them in your /etc/fstab often) is > to use a UUID= reference instead. They are definitely less convenient > than labels, but they are also much less likely to cause confusion by > having duplicately labelled file systems. > > Granted, no one wants to *type* "mount > UUID=9e132c06-1fd0-4bbd-ad06-5995a8f45b26"; they'd much rather type > "mount LABEL=websites". But if the only place the a > UUID=... specification shows up is in /etc/fstab, it might be worth > it. Can I set the UUID value, with some descriptive text,(something like a long label name), or is the UUID only system generated? > Then you can just zap the label on file systems that you only plan to > reference via UUID, and then that reduces the chances for confusion in > the future. > > Best regards, > >> PS Are there any PDF docs that would give me an overview of the ext3 >> FS, and how it works? > > Try the list of Articles and Publications here: > > https://ext4.wiki.kernel.org/index.php/Publications I'm reading up on that stuff now. The first link in the list is interesting :) I think the reason the USB has SO MANY errors on the FS is because I possibly unplugged it, before umounting it! I understand that the umount command flushes any disk I/O buffers back to the drive? Kind Regards, Keith Roberts --- In theory, theory and practice are the same; in practice they are not. This email was sent from my laptop with Centos 5.5 From tytso at mit.edu Mon Dec 6 15:46:46 2010 From: tytso at mit.edu (Ted Ts'o) Date: Mon, 6 Dec 2010 10:46:46 -0500 Subject: Inode 196617 has imagic flag set In-Reply-To: References: <20101206022453.GF4273@thunk.org> <20101206132744.GA8135@thunk.org> Message-ID: <20101206154646.GE8135@thunk.org> On Mon, Dec 06, 2010 at 02:20:24PM +0000, Keith Roberts wrote: > > Can I set the UUID value, with some descriptive text,(something like > a long label name), or is the UUID only system generated? The UUID is a Universally Unique ID; it is a 128-byte number, constructed using the rules specified by RFC 4122. See: http://www.ietf.org/rfc/rfc4122.txt You can set the UUID to something else, although the only way I suggest people make use of this functionality to use "tune2fs -U random" after doing an image copy of a file system. The whole point of a UUID is that it should be universally unique, and humans are notoriously bad at picking ID's that are truly unqiue. > The first link in the list is interesting :) > > I think the reason the USB has SO MANY errors on the FS is because I > possibly unplugged it, before umounting it! > > I understand that the umount command flushes any disk I/O buffers > back to the drive? Correct; I'd do a sync after the umount just to be absolutely sure, though. IIRC the umount doesn't wait until the USB stick has acknowledged that it is done writing everything to flash. - Ted From danielk1977 at gmail.com Mon Dec 6 18:42:08 2010 From: danielk1977 at gmail.com (Dan Kennedy) Date: Tue, 07 Dec 2010 01:42:08 +0700 Subject: SQLite and ext3 journalling mode Message-ID: <4CFD2E80.5090203@gmail.com> Hi, Are SQLite users that are worried about losing data that has been committed (fsynced) better off setting data=journal than data=ordered (or even data=writeback)? The context is trying to reduce the number of writes to a flash file-system without sacrificing data integrity in the event of a power failure or OS crash. Thanks, Dan Kennedy. From Ralf.Hildebrandt at charite.de Mon Dec 6 19:34:23 2010 From: Ralf.Hildebrandt at charite.de (Ralf Hildebrandt) Date: Mon, 6 Dec 2010 20:34:23 +0100 Subject: Squid and first-level subdirectories & second-level subdirectories on ext3/4 In-Reply-To: <6982E360-6DE7-4DEF-8F4D-E5031EB7151B@dilger.ca> References: <20100813100055.GT16572@charite.de> <6982E360-6DE7-4DEF-8F4D-E5031EB7151B@dilger.ca> Message-ID: <20101206193423.GC28967@charite.de> * Andreas Dilger : > In ext3/4 the top-level inodes are spread around the filesystem, on the > assumtion that something like /home or / is allocating trees of > unrelated subdirectories at the top level, but that files within those > subdirectories ARE related and should be allocated together. > > Depending on how many files are in your cache, the 256 * {small files} > is likely too big to fit into a single block group (32k inodes, 32k > blocks) so you may want to consider marking the first level of > subdirectories with the "TOPDIR" flag, that indicates the second-level > (256) subdirs should also be spread around the disk. How do I mark subdirectories with the "TOPDIR" flag? -- Ralf Hildebrandt Gesch?ftsbereich IT | Abteilung Netzwerk Charit? - Universit?tsmedizin Berlin Campus Benjamin Franklin Hindenburgdamm 30 | D-12203 Berlin Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962 ralf.hildebrandt at charite.de | http://www.charite.de From Ralf.Hildebrandt at charite.de Mon Dec 6 20:33:27 2010 From: Ralf.Hildebrandt at charite.de (Ralf Hildebrandt) Date: Mon, 6 Dec 2010 21:33:27 +0100 Subject: Squid and first-level subdirectories & second-level subdirectories on ext3/4 In-Reply-To: <20101206193423.GC28967@charite.de> References: <20100813100055.GT16572@charite.de> <6982E360-6DE7-4DEF-8F4D-E5031EB7151B@dilger.ca> <20101206193423.GC28967@charite.de> Message-ID: <20101206203327.GG28967@charite.de> * Ralf Hildebrandt : > * Andreas Dilger : > > > In ext3/4 the top-level inodes are spread around the filesystem, on the > > assumtion that something like /home or / is allocating trees of > > unrelated subdirectories at the top level, but that files within those > > subdirectories ARE related and should be allocated together. > > > > Depending on how many files are in your cache, the 256 * {small files} > > is likely too big to fit into a single block group (32k inodes, 32k > > blocks) so you may want to consider marking the first level of > > subdirectories with the "TOPDIR" flag, that indicates the second-level > > (256) subdirs should also be spread around the disk. > > How do I mark subdirectories with the "TOPDIR" flag? Found it: chattr +T From tytso at mit.edu Mon Dec 6 21:31:02 2010 From: tytso at mit.edu (Ted Ts'o) Date: Mon, 6 Dec 2010 16:31:02 -0500 Subject: SQLite and ext3 journalling mode In-Reply-To: <4CFD2E80.5090203@gmail.com> References: <4CFD2E80.5090203@gmail.com> Message-ID: <20101206213102.GA24607@thunk.org> On Tue, Dec 07, 2010 at 01:42:08AM +0700, Dan Kennedy wrote: > > Are SQLite users that are worried about losing data that has been > committed (fsynced) better off setting data=journal than > data=ordered (or even data=writeback)? Well, they won't be better off a data integrity point of view. Depending on how SQLite is configured, and how many fsync's are issued by SQLite in response to application queries, and depending on your background workload by other applications, using data=journal *might* be a performance win. In general, though, if you have background workloads that are downloading torrents, data=journal is going to hurt a lot. So I don't recommend it except for fairly specialized deployments where there's only one primary user of the file system. - Ted From danielk1977 at gmail.com Wed Dec 8 11:52:32 2010 From: danielk1977 at gmail.com (Dan Kennedy) Date: Wed, 08 Dec 2010 18:52:32 +0700 Subject: SQLite and ext3 journalling mode In-Reply-To: <20101206213102.GA24607@thunk.org> References: <4CFD2E80.5090203@gmail.com> <20101206213102.GA24607@thunk.org> Message-ID: <4CFF7180.1030607@gmail.com> On 12/07/2010 04:31 AM, Ted Ts'o wrote: > On Tue, Dec 07, 2010 at 01:42:08AM +0700, Dan Kennedy wrote: >> >> Are SQLite users that are worried about losing data that has been >> committed (fsynced) better off setting data=journal than >> data=ordered (or even data=writeback)? > > Well, they won't be better off a data integrity point of view. > Depending on how SQLite is configured, and how many fsync's are issued > by SQLite in response to application queries, and depending on your > background workload by other applications, using data=journal *might* > be a performance win. > > In general, though, if you have background workloads that are > downloading torrents, data=journal is going to hurt a lot. So I don't > recommend it except for fairly specialized deployments where there's > only one primary user of the file system. Thanks. But to be clear, is data=ordered better than data=writeback wrt. data integrity following a power failure? Regards, Dan. From ricwheeler at gmail.com Wed Dec 8 16:25:06 2010 From: ricwheeler at gmail.com (Ric Wheeler) Date: Wed, 08 Dec 2010 11:25:06 -0500 Subject: SQLite and ext3 journalling mode In-Reply-To: <4CFF7180.1030607@gmail.com> References: <4CFD2E80.5090203@gmail.com> <20101206213102.GA24607@thunk.org> <4CFF7180.1030607@gmail.com> Message-ID: <4CFFB162.9030306@gmail.com> On 12/08/2010 06:52 AM, Dan Kennedy wrote: > On 12/07/2010 04:31 AM, Ted Ts'o wrote: >> On Tue, Dec 07, 2010 at 01:42:08AM +0700, Dan Kennedy wrote: >>> >>> Are SQLite users that are worried about losing data that has been >>> committed (fsynced) better off setting data=journal than >>> data=ordered (or even data=writeback)? >> >> Well, they won't be better off a data integrity point of view. >> Depending on how SQLite is configured, and how many fsync's are issued >> by SQLite in response to application queries, and depending on your >> background workload by other applications, using data=journal *might* >> be a performance win. >> >> In general, though, if you have background workloads that are >> downloading torrents, data=journal is going to hurt a lot. So I don't >> recommend it except for fairly specialized deployments where there's >> only one primary user of the file system. > > Thanks. But to be clear, is data=ordered better than data=writeback > wrt. data integrity following a power failure? > > Regards, > Dan. > Data integrity can mean a couple of different things. If you are file system meta-data centric (i.e., a file system developer or just worried about having to run fsck after a crash to repair the file system), then both options *should* be equivalent. If you are one of those annoying users who define data integrity to include those annoying details like will my file have garbage in it after a crash that will make my DB or other app puke, then data ordered is clearly more robust. Note that most distributions (including RHEL) support & focus testing only ordered mode.... Hope this helps :) Ric From drh at sqlite.org Wed Dec 8 16:56:21 2010 From: drh at sqlite.org (Richard Hipp) Date: Wed, 8 Dec 2010 11:56:21 -0500 Subject: SQLite and ext3 journalling mode In-Reply-To: <4CFFB162.9030306@gmail.com> References: <4CFD2E80.5090203@gmail.com> <20101206213102.GA24607@thunk.org> <4CFF7180.1030607@gmail.com> <4CFFB162.9030306@gmail.com> Message-ID: On Wed, Dec 8, 2010 at 11:25 AM, Ric Wheeler wrote: > On 12/08/2010 06:52 AM, Dan Kennedy wrote: > >> >> Thanks. But to be clear, is data=ordered better than data=writeback >> wrt. data integrity following a power failure? >> >> Regards, >> Dan. >> >> > Data integrity can mean a couple of different things. > > If you are file system meta-data centric (i.e., a file system developer or > just worried about having to run fsck after a crash to repair the file > system), then both options *should* be equivalent. > > If you are one of those annoying users who define data integrity to include > those annoying details like will my file have garbage in it after a crash > that will make my DB or other app puke, then data ordered is clearly more > robust. > Thanks, Ric. Yes, we are numbered among the "annoying users". Based on what you are telling us, we'll recommend that people use data=ordered, barrier=1 for maximum data reliability in the face of power loss. > > Note that most distributions (including RHEL) support & focus testing only > ordered mode.... > > Hope this helps :) > > Ric > > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users > -- D. Richard Hipp drh at sqlite.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ricwheeler at gmail.com Wed Dec 8 17:07:32 2010 From: ricwheeler at gmail.com (Ric Wheeler) Date: Wed, 08 Dec 2010 12:07:32 -0500 Subject: SQLite and ext3 journalling mode In-Reply-To: References: <4CFD2E80.5090203@gmail.com> <20101206213102.GA24607@thunk.org> <4CFF7180.1030607@gmail.com> <4CFFB162.9030306@gmail.com> Message-ID: <4CFFBB54.8010205@gmail.com> On 12/08/2010 11:56 AM, Richard Hipp wrote: > > > On Wed, Dec 8, 2010 at 11:25 AM, Ric Wheeler > wrote: > > On 12/08/2010 06:52 AM, Dan Kennedy wrote: > > > Thanks. But to be clear, is data=ordered better than data=writeback > wrt. data integrity following a power failure? > > Regards, > Dan. > > > Data integrity can mean a couple of different things. > > If you are file system meta-data centric (i.e., a file system developer or > just worried about having to run fsck after a crash to repair the file > system), then both options *should* be equivalent. > > If you are one of those annoying users who define data integrity to > include those annoying details like will my file have garbage in it after > a crash that will make my DB or other app puke, then data ordered is > clearly more robust. > > > Thanks, Ric. Yes, we are numbered among the "annoying users". Based on what > you are telling us, we'll recommend that people use data=ordered, barrier=1 > for maximum data reliability in the face of power loss. That is what I do as well - there are use cases and users that prefer the lower latency and can accept the trade offs that come with data writeback or non-barrier use, but I certainly think most users would be better using the settings you have above. Good luck! Ric From Mike.Miller at hp.com Wed Dec 8 19:02:13 2010 From: Mike.Miller at hp.com (Miller, Mike (OS Dev)) Date: Wed, 8 Dec 2010 19:02:13 +0000 Subject: SQLite and ext3 journalling mode In-Reply-To: <4CFFBB54.8010205@gmail.com> References: <4CFD2E80.5090203@gmail.com> <20101206213102.GA24607@thunk.org> <4CFF7180.1030607@gmail.com> <4CFFB162.9030306@gmail.com> <4CFFBB54.8010205@gmail.com> Message-ID: <0F5B06BAB751E047AB5C87D1F77A77887D1070C88B@GVW0547EXC.americas.hpqcorp.net> > -----Original Message----- > From: ext3-users-bounces at redhat.com [mailto:ext3-users- > bounces at redhat.com] On Behalf Of Ric Wheeler > Sent: Wednesday, December 08, 2010 11:08 AM > To: Richard Hipp > Cc: ext3-users at redhat.com > Subject: Re: SQLite and ext3 journalling mode > > On 12/08/2010 11:56 AM, Richard Hipp wrote: > > > > > > On Wed, Dec 8, 2010 at 11:25 AM, Ric Wheeler > > wrote: > > > > On 12/08/2010 06:52 AM, Dan Kennedy wrote: > > > > > > Thanks. But to be clear, is data=ordered better than > data=writeback > > wrt. data integrity following a power failure? > > > > Regards, > > Dan. > > > > > > Data integrity can mean a couple of different things. > > > > If you are file system meta-data centric (i.e., a file system > developer or > > just worried about having to run fsck after a crash to repair the > file > > system), then both options *should* be equivalent. > > > > If you are one of those annoying users who define data integrity > to > > include those annoying details like will my file have garbage in > it after > > a crash that will make my DB or other app puke, then data ordered > is > > clearly more robust. > > > > > > Thanks, Ric. Yes, we are numbered among the "annoying users". Based > on what > > you are telling us, we'll recommend that people use data=ordered, > barrier=1 Just as an FYI, not all HW vendors enable the drive write cache especially on array controllers. In those cases barriers do nothing. -- mikem > > for maximum data reliability in the face of power loss. > > That is what I do as well - there are use cases and users that prefer > the lower > latency and can accept the trade offs that come with data writeback or > non-barrier use, but I certainly think most users would be better using > the > settings you have above. > > Good luck! > > Ric > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users From rwheeler at redhat.com Wed Dec 8 13:26:27 2010 From: rwheeler at redhat.com (Ric Wheeler) Date: Wed, 08 Dec 2010 08:26:27 -0500 Subject: SQLite and ext3 journalling mode In-Reply-To: <0F5B06BAB751E047AB5C87D1F77A77887D1070C88B@GVW0547EXC.americas.hpqcorp.net> References: <4CFD2E80.5090203@gmail.com> <20101206213102.GA24607@thunk.org> <4CFF7180.1030607@gmail.com> <4CFFB162.9030306@gmail.com> <4CFFBB54.8010205@gmail.com> <0F5B06BAB751E047AB5C87D1F77A77887D1070C88B@GVW0547EXC.americas.hpqcorp.net> Message-ID: <4CFF8783.60104@redhat.com> On 12/08/2010 02:02 PM, Miller, Mike (OS Dev) wrote: > >> -----Original Message----- >> From: ext3-users-bounces at redhat.com [mailto:ext3-users- >> bounces at redhat.com] On Behalf Of Ric Wheeler >> Sent: Wednesday, December 08, 2010 11:08 AM >> To: Richard Hipp >> Cc: ext3-users at redhat.com >> Subject: Re: SQLite and ext3 journalling mode >> >> On 12/08/2010 11:56 AM, Richard Hipp wrote: >>> >>> On Wed, Dec 8, 2010 at 11:25 AM, Ric Wheeler>> > wrote: >>> >>> On 12/08/2010 06:52 AM, Dan Kennedy wrote: >>> >>> >>> Thanks. But to be clear, is data=ordered better than >> data=writeback >>> wrt. data integrity following a power failure? >>> >>> Regards, >>> Dan. >>> >>> >>> Data integrity can mean a couple of different things. >>> >>> If you are file system meta-data centric (i.e., a file system >> developer or >>> just worried about having to run fsck after a crash to repair the >> file >>> system), then both options *should* be equivalent. >>> >>> If you are one of those annoying users who define data integrity >> to >>> include those annoying details like will my file have garbage in >> it after >>> a crash that will make my DB or other app puke, then data ordered >> is >>> clearly more robust. >>> >>> >>> Thanks, Ric. Yes, we are numbered among the "annoying users". Based >> on what >>> you are telling us, we'll recommend that people use data=ordered, >> barrier=1 > Just as an FYI, not all HW vendors enable the drive write cache especially on array controllers. In those cases barriers do nothing. > > -- mikem > > Right - upstream has been working to make sure that we can default to barriers on and not see a performance hit for devices like arrays that don't need them ... Ric From tytso at mit.edu Wed Dec 8 22:30:15 2010 From: tytso at mit.edu (Ted Ts'o) Date: Wed, 8 Dec 2010 17:30:15 -0500 Subject: SQLite and ext3 journalling mode In-Reply-To: References: <4CFD2E80.5090203@gmail.com> <20101206213102.GA24607@thunk.org> <4CFF7180.1030607@gmail.com> <4CFFB162.9030306@gmail.com> Message-ID: <20101208223015.GD2921@thunk.org> On Wed, Dec 08, 2010 at 11:56:21AM -0500, Richard Hipp wrote: > > Thanks, Ric. Yes, we are numbered among the "annoying users". Based on > what you are telling us, we'll recommend that people use data=ordered, > barrier=1 for maximum data reliability in the face of power loss. Note that given that SQLite uses fsync() properly, data=ordered shouldn't hurt or help --- with respect to the very narrow issue of data integrity in the SQLite database. What data=ordered protects against is the possibility that stale data (from a deleted file, possibly from some other user's) might show up as "garbage" in a file whose newly allocated blocks didn't get written out to disk before a system crash or other unclean shutdown. To the extent that SQLite properly uses fsync() and doesn't pay attention to data in parts of the db file that hasn't been committed yet, this won't matter. From the perspective of users who care about stale data showing up in a file being a potential security issue, they should definitely use data=ordered in ext3. However, if you have a specific use case where the only thing that matters is SQLite, then assuming that fsync() is being used properly, SQLite itself should be OK. I hope this helps, - Ted From florian.weber at bn-paf.de Mon Dec 27 16:38:09 2010 From: florian.weber at bn-paf.de (Florian Weber) Date: Mon, 27 Dec 2010 16:38:09 -0000 Subject: Overwritten beginning of ext3 filesystem. Recovery? Message-ID: <20101227173751.qmvcc2ec8cgwwk0k@mail.bn-paf.de> Hello list I accidentally trashed the first ~10-20GB of a 1TB ext3 filesystem with a heedless RAID1 rebuild (excruciating detail below). I'm now looking for options to get as much as possible of the remaining data back. I've been searching the web for over a day now but all my results are either not what I need (MBR, partition table and superblock are OK) or too lowlevel (revocering many thousands of nameless and structureless mails/jpgs/docs just doesn't cut it here, IMVHO). My main problem is not that I accidentally deleted files, but that basically my / directory just went "poof" and left the rest sitting around. Since the damaged filesystem was clean before my accident, I'm figuring I just might get most of the data back: even much of the directory structure should still be there if I only knew how to get at it. I'd be most grateful for any tips, tools, or even documentation to aid in writing my own tool. Thans in advance for your time, Florian Weber PS: that _was_ my backup :-( Thanks for not mentioning it. ------------- Details: Starting point: --------------- I've been running the following setup on my machine: * Two same-size harddisks, currently 1TB, one big partition each --> sda[1], sdb[1] * Linux software RAID1 consisting of these partitions --> md0 * A single ext3 filesystem, default parameters, reserved blocks lowered to 1% * All system and data inside this single partition, ca. 350-400GB * (Much too) infrequent backups ... yes, yes, I know, I know ... Intentions: ----------- After many years, I wanted to move from Gentoo to KUbuntu. No big deal: * Shutdown PC, pull disk sdb from the RAID * Install Ubuntu on sda as if working on a blank disk (setup as above, with one of the RAID1 disks physically missing during the install) * Boot the new system from sda, still in degraded mode * Treating sdb like a standalone ext3 disk: mount, copy configs and /home, umount * Get the system into working order (config files reconciled, all applications running) * Determine that the "old" stuff is not needed anymore * Put sdb back into the RAID1 and rebuild What went wrong: ---------------- Before the initial shutdown, I did not change the partition type on sdb from 0xFD to 0x83 to prevent RAID autodetection. Booting with sdb reattached (to get at my personal data) would therefore (correctly) have resulted in a RAID rebuild --> very bad. So I figured: I'll attach the disk, boot with "raid=noautodetect" in the kernel commandline, and I'll be fine. But: unlike my previous setup, Ubuntu has a silent bootloader and I missed my chance to enter the commandline. And the RAID instantly started rebuilding itself onto my backup disk :-O I quickly realised what was happening and cleanly shut down my system (incurring some additional damage from the running rebuild, but the worst was already done). Total running time was about 3 minutes, in parallel to the system booting up and shutting down. What I have now: ---------------- * A working, new Ubuntu installation on a degraded RAID1 array, without personal data. I'm currently typing on this system. * A harddisk (sdb) that previously contained a working system with a total of 350-400GB data, but was subject to a RAID1 rebuild for <3-4 minutes at <=100MB/sec. The disc is not connected at the moment. * The MBR on sdb is the new one. That's OK. * The partition table on sdb is the new one. It looks identical to the old one. * The ext3 superblock on sdb1 is the new one. It's basically the same as the old one. I compared it against one of the (old) backup superblocks at the end of the partition. * I have a dd image of partition sdb1 * I can mount the image of sdb1 and do an ls. I see data from the new system. Much content is missing, obviously, since it was not synced over yet * I can "fsck -n" the image of sdb1. Many errors of course ("inode contains invalid block", "too many illegal blocks", "i_size wrong", "i_Blocks wrong"), since much stuff was not synced over yet * At some point, "fsck -n" stops with "illegal indirect block" * I have not yet tried to "fsck -y". That would be my next step. * I have 1TB of free space available and can organise more I do realise this is not for the faint of heart, but I'm done with my fainting for this instance ;-) Still with hope, Florian Weber ------------------------------------------------------- Buergernetz Pfaffenhofen Webmail - http://www.bn-paf.de