From Rarul at novell.com Sun Jun 2 09:36:43 2013 From: Rarul at novell.com (Arul Selvan) Date: Sun, 02 Jun 2013 03:36:43 -0600 Subject: No subject Message-ID: <51AB5F83020000A1000798C7@soto.provo.novell.com> From Rarul at novell.com Mon Jun 3 05:33:39 2013 From: Rarul at novell.com (Arul Selvan) Date: Sun, 02 Jun 2013 23:33:39 -0600 Subject: Write ordering in Ext4 Message-ID: <51AC780B020000A10007996A@soto.provo.novell.com> Greetings. I am Arul Selvan works for Novell. I am exploring the Ext4 architecture, more specifically i would like to understand the write ordering, basically the same blocks is modified more than once, how the write is ordered. Could you point me the doc or the specific source file to look. -------------- next part -------------- An HTML attachment was scrubbed... URL: From adilger at dilger.ca Mon Jun 3 14:47:38 2013 From: adilger at dilger.ca (Andreas Dilger) Date: Mon, 3 Jun 2013 08:47:38 -0600 Subject: Write ordering in Ext4 In-Reply-To: <51AC780B020000A10007996A@soto.provo.novell.com> References: <51AC780B020000A10007996A@soto.provo.novell.com> Message-ID: <7A03DC69-8FCA-445F-AF4E-8722DBE88CA0@dilger.ca> On 2013-06-02, at 23:33, "Arul Selvan" wrote: > Greetings. I am Arul Selvan works for Novell. I am exploring the Ext4 architecture, more specifically i would like to understand the write ordering, basically the same blocks is modified more than once, how the write is ordered. Could you point me the doc or the specific source file to look. Writes in memory to the same file are serialized by i_mutex, but may modify the same page in memory repeatedly. When that page us being written to disk, it will be marked with the page writeback flag, in order to stabilize the content, and allow consistent checksums (e.g. for MD RAID or disks with T10-DIF). This may block any further writes from modifying the same page as it is being submitted to disk, depending on the kernel version and the requirements of the underlying storage. Once the disk write has been finished, the writeback bit is cleared and the page can be modified again. In all cases, the writes to a single page are ordered, but there is no _guarantee_ about writes to different data blocks being ordered. The ext4 journal will in fact impose some order on data writes, by ensuring that the data from all writes associated with a transaction are flushed before the data for the next transaction. Since fsync() of any file commits the current transaction, this has the side-effect that any fsync causes all older writes to be committed. This is NOT required by POSIX, and applications that depend on this behavior are not portable to/safe on other filesystems. Cheers, Andreas From Rarul at novell.com Tue Jun 4 17:17:38 2013 From: Rarul at novell.com (Arul Selvan) Date: Tue, 04 Jun 2013 11:17:38 -0600 Subject: Write ordering in Ext4 In-Reply-To: <7A03DC69-8FCA-445F-AF4E-8722DBE88CA0@dilger.ca> References: <51AC780B020000A10007996A@soto.provo.novell.com> <7A03DC69-8FCA-445F-AF4E-8722DBE88CA0@dilger.ca> Message-ID: <51AE6E8A020000A100079C6F@soto.provo.novell.com> thanks that answered my question. One more question, is it possible to stop the delayed block allocation in ext4 ? >>> Andreas Dilger 6/3/2013 8:17 PM >>> On 2013-06-02, at 23:33, "Arul Selvan" wrote: > Greetings. I am Arul Selvan works for Novell. I am exploring the Ext4 architecture, more specifically i would like to understand the write ordering, basically the same blocks is modified more than once, how the write is ordered. Could you point me the doc or the specific source file to look. Writes in memory to the same file are serialized by i_mutex, but may modify the same page in memory repeatedly. When that page us being written to disk, it will be marked with the page writeback flag, in order to stabilize the content, and allow consistent checksums (e.g. for MD RAID or disks with T10-DIF). This may block any further writes from modifying the same page as it is being submitted to disk, depending on the kernel version and the requirements of the underlying storage. Once the disk write has been finished, the writeback bit is cleared and the page can be modified again. In all cases, the writes to a single page are ordered, but there is no _guarantee_ about writes to different data blocks being ordered. The ext4 journal will in fact impose some order on data writes, by ensuring that the data from all writes associated with a transaction are flushed before the data for the next transaction. Since fsync() of any file commits the current transaction, this has the side-effect that any fsync causes all older writes to be committed. This is NOT required by POSIX, and applications that depend on this behavior are not portable to/safe on other filesystems. Cheers, Andreas -------------- next part -------------- An HTML attachment was scrubbed... URL: From sandeen at redhat.com Tue Jun 4 17:33:25 2013 From: sandeen at redhat.com (Eric Sandeen) Date: Tue, 04 Jun 2013 12:33:25 -0500 Subject: Write ordering in Ext4 In-Reply-To: <51AE6E8A020000A100079C6F@soto.provo.novell.com> References: <51AC780B020000A10007996A@soto.provo.novell.com> <7A03DC69-8FCA-445F-AF4E-8722DBE88CA0@dilger.ca> <51AE6E8A020000A100079C6F@soto.provo.novell.com> Message-ID: <51AE24E5.8050908@redhat.com> On 6/4/13 12:17 PM, Arul Selvan wrote: > thanks that answered my question. One more question, is it possible to stop the delayed block allocation in ext4 ? If you mean turn off delayed allocation, look no further than the mount options documented in the kernel tree, Documentation/filesystems/ext4.txt: nodelalloc Disable delayed allocation. Blocks are allocated when the data is copied from userspace to the page cache, either via the write(2) system call or when an mmap'ed page which was previously unallocated is written for the first time. Out of curiosity, why do you want to turn off delalloc? -Eric >>>> Andreas Dilger 6/3/2013 8:17 PM >>> > On 2013-06-02, at 23:33, "Arul Selvan" wrote: >> Greetings. I am Arul Selvan works for Novell. I am exploring the Ext4 architecture, more specifically i would like to understand the write ordering, basically the same blocks is modified more than once, how the write is ordered. Could you point me the doc or the specific source file to look. > > Writes in memory to the same file are serialized by i_mutex, but may > modify the same page in memory repeatedly. > > When that page us being written to disk, it will be marked with the > page writeback flag, in order to stabilize the content, and allow consistent > checksums (e.g. for MD RAID or disks with T10-DIF). This may block > any further writes from modifying the same page as it is being > submitted to disk, depending on the kernel version and the > requirements of the underlying storage. Once the disk write has been > finished, the writeback bit is cleared and the page can be modified again. > > In all cases, the writes to a single page are ordered, but there is no > _guarantee_ about writes to different data blocks being ordered. > The ext4 journal will in fact impose some order on data writes, > by ensuring that the data from all writes associated with a transaction > are flushed before the data for the next transaction. > > Since fsync() of any file commits the current transaction, this has > the side-effect that any fsync causes all older writes to be committed. This is NOT required by POSIX, and applications that depend on this behavior are not portable to/safe on other filesystems. > > Cheers, Andreas > > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users > From tytso at mit.edu Tue Jun 4 19:08:59 2013 From: tytso at mit.edu (Theodore Ts'o) Date: Tue, 4 Jun 2013 15:08:59 -0400 Subject: Write ordering in Ext4 In-Reply-To: <51AE6E8A020000A100079C6F@soto.provo.novell.com> References: <51AC780B020000A10007996A@soto.provo.novell.com> <7A03DC69-8FCA-445F-AF4E-8722DBE88CA0@dilger.ca> <51AE6E8A020000A100079C6F@soto.provo.novell.com> Message-ID: <20130604190859.GS3030@thunk.org> On Tue, Jun 04, 2013 at 11:17:38AM -0600, Arul Selvan wrote: > thanks that answered my question. One more question, is it possible > to stop the delayed block allocation in ext4 ? The fact that you are asking all of these questions is making me very nervous. Why do you care? Application programmers should ***not*** be depending on low-level file system behavior. If you care about what might happen after a crash, you need to use fsync(). - Ted From folkert.mobiel at gmail.com Tue Jun 25 07:13:20 2013 From: folkert.mobiel at gmail.com (Folkert van Heusden) Date: Tue, 25 Jun 2013 09:13:20 +0200 Subject: removing external journal Message-ID: Hi, I have a system with an ext4 filesystem with its journal on an other device (an SSD). Now this SSD dropped of the sata bus so the filesystem went r/o. I would like to remove the journal but it says it can't because needs_check is set. I can't run fsck because the journal is not reachable. Is there any way to solve this? I understand I lost any pending changes in the journal. regards -- www.vanheusden.com From sandeen at redhat.com Wed Jun 26 15:38:17 2013 From: sandeen at redhat.com (Eric Sandeen) Date: Wed, 26 Jun 2013 11:38:17 -0400 Subject: removing external journal In-Reply-To: References: Message-ID: <51CB0AE9.3000306@redhat.com> On 6/25/13 3:13 AM, Folkert van Heusden wrote: > Hi, > > I have a system with an ext4 filesystem with its journal on an other > device (an SSD). > Now this SSD dropped of the sata bus so the filesystem went r/o. > I would like to remove the journal but it says it can't because > needs_check is set. What does it actually say? there is no needs_check flag AFAIK. > I can't run fsck because the journal is not > reachable. Is there any way to solve this? I understand I lost any > pending changes in the journal. Can you use debugfs to change the state of the fs so that it allows you to do this? Or maybe specifying a zero-filled file as the external journal would allow fsck to proceed. -Eric > > regards > > -- > www.vanheusden.com > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users > From adilger at dilger.ca Wed Jun 26 22:12:22 2013 From: adilger at dilger.ca (Andreas Dilger) Date: Wed, 26 Jun 2013 16:12:22 -0600 Subject: removing external journal In-Reply-To: <51CB0AE9.3000306@redhat.com> References: <51CB0AE9.3000306@redhat.com> Message-ID: On 2013-06-26, at 9:38 AM, Eric Sandeen wrote: > On 6/25/13 3:13 AM, Folkert van Heusden wrote: >> >> I have a system with an ext4 filesystem with its journal on an other >> device (an SSD). >> Now this SSD dropped of the sata bus so the filesystem went r/o. >> I would like to remove the journal but it says it can't because >> needs_check is set. > > What does it actually say? there is no needs_check flag AFAIK. I think he means "needs_recovery" - EXT3_FEATURE_INCOMPAT_RECOVER set when journal recovery is needed after an unclean shutdown. What you want is to run (per the tune2fs(8) man page, maybe something should go into the e2fsck(8) man page as well?): tune2fs -f -O ^has_journal /dev/XXX e2fsck -fp /dev/XXX tune2fs -J device=/dev/YYY /dev/XXX This will remove the journal from the filesystem, run e2fsck to fix whatever problems it finds, then re-add the journal device (if you have one, otherwise just use "tune2fs -j" to add an internal journal). Note that "tune2fs" reports the following for "-f": -f Force the tune2fs operation to complete even in the face of errors. This option is useful when removing the has_journal filesystem feature from a filesystem which has an external journal (or is corrupted such that it appears to have an external journal), but that external journal is not available. WARNING: Removing an external journal from a filesystem which was not cleanly unmounted without first replaying the external journal can result in severe data loss and filesystem corruption. If you can get the journal device back, it would be better, but if not you don't really have any choice. Any corruption found should be not much worse than running without any journal, and be limited to files/dirs that were in the middle of being modified. What is interesting is that the filesystem _should_ be able to handle the loss of the journal device by first dropping the RECOVER feature and instead marking itself dirty as it would for ext2. It needs a full e2fsck after an unclean shutdown in either case, though a clean shutdown would be totally safe, and no further action would be needed if the journal device returned at some later time. The journal device shouldn't make the filesystem _less_ reliable. Cheers, Andreas >> I can't run fsck because the journal is not >> reachable. Is there any way to solve this? I understand I lost any >> pending changes in the journal. > > Can you use debugfs to change the state of the fs so that it allows > you to do this? > > Or maybe specifying a zero-filled file as the external journal would > allow fsck to proceed. > > -Eric > >> >> regards >> >> -- >> www.vanheusden.com >> >> _______________________________________________ >> Ext3-users mailing list >> Ext3-users at redhat.com >> https://www.redhat.com/mailman/listinfo/ext3-users >> > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users Cheers, Andreas From folkert.mobiel at gmail.com Thu Jun 27 07:57:39 2013 From: folkert.mobiel at gmail.com (Folkert van Heusden) Date: Thu, 27 Jun 2013 09:57:39 +0200 Subject: removing external journal In-Reply-To: References: <51CB0AE9.3000306@redhat.com> Message-ID: Eric, Andreas, >>> I have a system with an ext4 filesystem with its journal on an other >>> device (an SSD). >>> Now this SSD dropped of the sata bus so the filesystem went r/o. >>> I would like to remove the journal but it says it can't because >>> needs_check is set. >> >> What does it actually say? there is no needs_check flag AFAIK. > > I think he means "needs_recovery" - EXT3_FEATURE_INCOMPAT_RECOVER > set when journal recovery is needed after an unclean shutdown. Yes, that's the one. > What you want is to run (per the tune2fs(8) man page, maybe something > should go into the e2fsck(8) man page as well?): > tune2fs -f -O ^has_journal /dev/XXX That one was denied because of the needs_recovery. Fortunately I was able to bring the SSD back to life and got it all to recover! The problem was with an intel motherboard with a couple of sata ports. 4 of them are from a marvell chipset and those give lots of problems. I moved all hdds and ssds to the other ports and hopefully that solves the instability. > What is interesting is that the filesystem _should_ be able to > handle the loss of the journal device by first dropping the > RECOVER feature and instead marking itself dirty as it would for > ext2. It needs a full e2fsck after an unclean shutdown in either > case, though a clean shutdown would be totally safe, and no further > action would be needed if the journal device returned at some later > time. The journal device shouldn't make the filesystem _less_ > reliable. Oh funny thing: after powering up the system again, the system said that the filesystem said it was clean while there were changes pending in the journal! After applying those changes, one filesystem was hosed, the other was fine. Hooray for backups. www.vanheusden.com From sandeen at redhat.com Thu Jun 27 19:43:41 2013 From: sandeen at redhat.com (Eric Sandeen) Date: Thu, 27 Jun 2013 15:43:41 -0400 Subject: removing external journal In-Reply-To: References: <51CB0AE9.3000306@redhat.com> Message-ID: <51CC95ED.4070105@redhat.com> On 6/27/13 3:57 AM, Folkert van Heusden wrote: > Eric, Andreas, > >>>> I have a system with an ext4 filesystem with its journal on an other >>>> device (an SSD). >>>> Now this SSD dropped of the sata bus so the filesystem went r/o. >>>> I would like to remove the journal but it says it can't because >>>> needs_check is set. >>> >>> What does it actually say? there is no needs_check flag AFAIK. >> >> I think he means "needs_recovery" - EXT3_FEATURE_INCOMPAT_RECOVER >> set when journal recovery is needed after an unclean shutdown. > > Yes, that's the one. > >> What you want is to run (per the tune2fs(8) man page, maybe something >> should go into the e2fsck(8) man page as well?): >> tune2fs -f -O ^has_journal /dev/XXX > > That one was denied because of the needs_recovery. Even with the "-f" ? Hum, that sounds like it might be a bug then. What version? -Eric From folkert.mobiel at gmail.com Thu Jun 27 20:37:56 2013 From: folkert.mobiel at gmail.com (Folkert van Heusden) Date: Thu, 27 Jun 2013 22:37:56 +0200 Subject: removing external journal In-Reply-To: <51CC95ED.4070105@redhat.com> References: <51CB0AE9.3000306@redhat.com> <51CC95ED.4070105@redhat.com> Message-ID: >>>>> I have a system with an ext4 filesystem with its journal on an other >>>>> device (an SSD). >>>>> Now this SSD dropped of the sata bus so the filesystem went r/o. >>>>> I would like to remove the journal but it says it can't because >>>>> needs_check is set. >>>> >>>> What does it actually say? there is no needs_check flag AFAIK. >>> >>> I think he means "needs_recovery" - EXT3_FEATURE_INCOMPAT_RECOVER >>> set when journal recovery is needed after an unclean shutdown. >> >> Yes, that's the one. >> >>> What you want is to run (per the tune2fs(8) man page, maybe something >>> should go into the e2fsck(8) man page as well?): >>> tune2fs -f -O ^has_journal /dev/XXX >> >> That one was denied because of the needs_recovery. > > Even with the "-f" ? I did not dare to do that. > Hum, that sounds like it might be a bug then. What version? This is util-linux 2.20.1 from debian package 2.20.1-5.4. Folkert -- www.vanheusden.com From sandeen at redhat.com Thu Jun 27 20:41:02 2013 From: sandeen at redhat.com (Eric Sandeen) Date: Thu, 27 Jun 2013 16:41:02 -0400 Subject: removing external journal In-Reply-To: References: <51CB0AE9.3000306@redhat.com> <51CC95ED.4070105@redhat.com> Message-ID: <51CCA35E.9030200@redhat.com> On 6/27/13 4:37 PM, Folkert van Heusden wrote: >>>>>> I have a system with an ext4 filesystem with its journal on an other >>>>>> device (an SSD). >>>>>> Now this SSD dropped of the sata bus so the filesystem went r/o. >>>>>> I would like to remove the journal but it says it can't because >>>>>> needs_check is set. >>>>> >>>>> What does it actually say? there is no needs_check flag AFAIK. >>>> >>>> I think he means "needs_recovery" - EXT3_FEATURE_INCOMPAT_RECOVER >>>> set when journal recovery is needed after an unclean shutdown. >>> >>> Yes, that's the one. >>> >>>> What you want is to run (per the tune2fs(8) man page, maybe something >>>> should go into the e2fsck(8) man page as well?): >>>> tune2fs -f -O ^has_journal /dev/XXX >>> >>> That one was denied because of the needs_recovery. >> >> Even with the "-f" ? > > I did not dare to do that. Ah, ok. Never mind then. :) >> Hum, that sounds like it might be a bug then. What version? > > This is util-linux 2.20.1 from debian package 2.20.1-5.4. Presumably it's tools from e2fsprogs not util-linux, but if you didn't try tune2fs -f and find it to fail, I'm no longer concerned. :) Thanks, -Eric > > Folkert > > -- > www.vanheusden.com >