From patrik at hornik.sk  Fri Apr 18 16:56:57 2014
From: patrik at hornik.sk (=?ISO-8859-1?Q?Patrik_Horn=EDk?=)
Date: Fri, 18 Apr 2014 18:56:57 +0200
Subject: Many orphaned inodes after resize2fs
Message-ID: <CAAOsTSnv-Rt-2BTw+v=sKiOFv74edhv4AUVko9wE3+hHQ1-G+Q@mail.gmail.com>

Hello,

yesterday I experienced following problem with my ext3 filesystem:

- I had ext3 filesystem of the size of a few TB with journal. I correctly
unmounted it and it was marked clean.

- I then ran fsck.etx3 -f on it and it did not find any problem.

- After increasing size of its LVM volume by 1.5 TB I resized the
filesystem by resize2fs lvm_volume and it finished without problem.

- But fsck.ext3 -f immediately after that showed "Inodes that were part of
a corrupted orphan linked list found." and many thousands of "Inode XXX was
part of the orphaned inode list." I did not accepted fix. According to
debugfs all the inodes I check from these reported orphaned inodes (I
checked only some from beginning of list of errors) have size 0.

- When I mount the fs read only the data I was able to check seem OK. (But
I am unable to check everything.)

- I created LVM snapshot and repaired the fs on it with fsck.ext3. After
that there we no files in lost+found. Does it mean that all that orphaned
inodes have size 0? Or when the fsck does not create files in lost+found?

- I am checking the data against various backups but I will not be able to
check everything and some less important data dont have backup. So I would
like to know in what state the fs is and what are best next steps.

- Right now I am planning to use current LVM snapshot as test run and
discard it after data check. Original fs is in the state just after
resize2fs, fsck was run on it after that but I did not accepted any fix and
cancelled the check. I then plan to create backup snapshot, fsck original
fs / LVM volume, check once again against backups and go with it. But this
will not tell me status of all my data and the fs and if it is secure to
use it. Another problem is all operations take long hours.

- I have also some technical specific questions. Orphan inode is valid
inode not found in any directory, right? What exactly is CORRUPTED orphan
linked list? What can cause such problem? Is it known problem? How can
orphaned inodes and corrupted orphan linked list can be created by
resize2fs or why was it not detected by fsck.ext3 before that? Can it be
serious and can it be symptom of some data loss? Can fixing it by fsck.ext3
corrupt other data which are OK now, when I mount the fs read-only?

- The platform used was latest stable Debian with
kernel linux-image-3.2.0-4-amd64 version 3.2.46-1+deb7u1
and e2fsprogs 1.42.5-1.1. After the incident I started
using linux-image-3.13-1-amd64 version 3.13.7-1 (from the point of
snapshot's creation and running fsck for real on snapshot) and thinking
about going to e2fsprogs 1.42.9 from sources.

Thank you very much.

Patrik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20140418/dcda7ab7/attachment.htm>

From tytso at mit.edu  Fri Apr 18 20:20:30 2014
From: tytso at mit.edu (tytso at mit.edu)
Date: Fri, 18 Apr 2014 20:20:30 +0000
Subject: Many orphaned inodes after resize2fs
In-Reply-To: <CAAOsTSnv-Rt-2BTw+v=sKiOFv74edhv4AUVko9wE3+hHQ1-G+Q@mail.gmail.com>
References: <CAAOsTSnv-Rt-2BTw+v=sKiOFv74edhv4AUVko9wE3+hHQ1-G+Q@mail.gmail.com>
Message-ID: <20140418202030.GB13642@thunk.org>

On Fri, Apr 18, 2014 at 06:56:57PM +0200, Patrik Horn?k wrote:
> 
> yesterday I experienced following problem with my ext3 filesystem:
> 
> - I had ext3 filesystem of the size of a few TB with journal. I correctly
> unmounted it and it was marked clean.
> 
> - I then ran fsck.etx3 -f on it and it did not find any problem.
> 
> - After increasing size of its LVM volume by 1.5 TB I resized the
> filesystem by resize2fs lvm_volume and it finished without problem.
> 
> - But fsck.ext3 -f immediately after that showed "Inodes that were part of
> a corrupted orphan linked list found." and many thousands of "Inode XXX was
> part of the orphaned inode list." I did not accepted fix. According to
> debugfs all the inodes I check from these reported orphaned inodes (I
> checked only some from beginning of list of errors) have size 0.

Can you send the output of dumpe2fs -h?  I'm curious how many inodes
you had after the resize, and what file system features might have
been enabled on your file system.

If the only file system corruption errors that you saw were from about
the corrupted orphan inode list, then things are probably OK.

What this error message means is that there are d_time values which
look like they belong to inode numbers (as opposed to number of
seconds since January 1, 1970).  So if you ran the system where the
clock was set incorrectly, so that the time was January 1, 1970, and
you delete a lot of files, you can run into this error --- it's
basically a sanity check that we put in a long time ago to catch
potential file system bugs caused by a corrupted orphan inode list.

I'm thinking that we should turn off this check if the e2fsck.conf
"broken_system_lock" is enabled, since if the system has a busted
system clock, this can end up triggering a bunch of scary warnings.

In any case, when you grew the size of the file system, this also
increased the number of inodes, which means it would increase the
sensitivity of hitting this bug.  It's also possible that if you
created your file system with the number of inodes per block group
close to the maximum (assuming an average file size 4k, which would be
highly wasteful of space, so it' s not the default), that you ended up
with the maximum number of inodes exceeding 1.2 or 1.3 billion inodes,
at which point it would trigger a false positive.  (And indeed, I
should probably put in a fix to e2fsprogs so that if a file system
does have more than 1.2 billion inodes, to disable this check.)

Cheers,

						- Ted


From patrik at hornik.sk  Fri Apr 18 23:20:40 2014
From: patrik at hornik.sk (=?ISO-8859-1?Q?Patrik_Horn=EDk?=)
Date: Sat, 19 Apr 2014 01:20:40 +0200
Subject: Many orphaned inodes after resize2fs
In-Reply-To: <20140418202030.GB13642@thunk.org>
References: <CAAOsTSnv-Rt-2BTw+v=sKiOFv74edhv4AUVko9wE3+hHQ1-G+Q@mail.gmail.com>
	<20140418202030.GB13642@thunk.org>
Message-ID: <CAAOsTSkEG14zmyt-etkS58YR9oR8fdggbXfdE68DXrc3BphuUA@mail.gmail.com>

Hi,

it seems you got it right! I don't know if you read email I sent you before
posting to the mailing list, but I accidentally diagnosed the cause... :)
I've noticed that inodes fsck warned me about, at least ones that I
checked, all have all four timestamps latest in 2010...

The filesystem has maximum 1281998848 inodes, which is timestamp in august
2010. I don't know how it got that big, I think I did not specified big
value initially. But I've resized it couple of times. BTW what is default
of group size / inode count ratio? Mine ratio is not at the maximum you
mentioned, but it is not that far.

So almost sure it is false positive by the code / bug in e2fsck/pass1.c
around line 1070 in current version. I want to be sure that all these
errors were caused by this, so can you please send me promptly patched
version? I can easily patch it myself by some fixed condition, but I don't
want miss something important... BTW maybe you can compare i_dtime with
filesystem creation timestamp, so you dont have to put fixed number there.

BTW I dont know specifics of ext3, I just looked at sources of kernel
driver and e2fsprogs now. But what indicates that inode is / was created
and valid ? (I did not need it to find problematic test you mentioned, did
not see it in part of code I look at and it is not apparent to me from
definition of struct ext3_inode).

Thanks.

Patrik


2014-04-18 22:20 GMT+02:00 <tytso at mit.edu>:

> On Fri, Apr 18, 2014 at 06:56:57PM +0200, Patrik Horn?k wrote:
> >
> > yesterday I experienced following problem with my ext3 filesystem:
> >
> > - I had ext3 filesystem of the size of a few TB with journal. I correctly
> > unmounted it and it was marked clean.
> >
> > - I then ran fsck.etx3 -f on it and it did not find any problem.
> >
> > - After increasing size of its LVM volume by 1.5 TB I resized the
> > filesystem by resize2fs lvm_volume and it finished without problem.
> >
> > - But fsck.ext3 -f immediately after that showed "Inodes that were part
> of
> > a corrupted orphan linked list found." and many thousands of "Inode XXX
> was
> > part of the orphaned inode list." I did not accepted fix. According to
> > debugfs all the inodes I check from these reported orphaned inodes (I
> > checked only some from beginning of list of errors) have size 0.
>
> Can you send the output of dumpe2fs -h?  I'm curious how many inodes
> you had after the resize, and what file system features might have
> been enabled on your file system.
>
> If the only file system corruption errors that you saw were from about
> the corrupted orphan inode list, then things are probably OK.
>
> What this error message means is that there are d_time values which
> look like they belong to inode numbers (as opposed to number of
> seconds since January 1, 1970).  So if you ran the system where the
> clock was set incorrectly, so that the time was January 1, 1970, and
> you delete a lot of files, you can run into this error --- it's
> basically a sanity check that we put in a long time ago to catch
> potential file system bugs caused by a corrupted orphan inode list.
>
> I'm thinking that we should turn off this check if the e2fsck.conf
> "broken_system_lock" is enabled, since if the system has a busted
> system clock, this can end up triggering a bunch of scary warnings.
>
> In any case, when you grew the size of the file system, this also
> increased the number of inodes, which means it would increase the
> sensitivity of hitting this bug.  It's also possible that if you
> created your file system with the number of inodes per block group
> close to the maximum (assuming an average file size 4k, which would be
> highly wasteful of space, so it' s not the default), that you ended up
> with the maximum number of inodes exceeding 1.2 or 1.3 billion inodes,
> at which point it would trigger a false positive.  (And indeed, I
> should probably put in a fix to e2fsprogs so that if a file system
> does have more than 1.2 billion inodes, to disable this check.)
>
> Cheers,
>
>                                                 - Ted
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20140419/2c1bab94/attachment.htm>

From patrik at hornik.sk  Sat Apr 19 15:45:49 2014
From: patrik at hornik.sk (=?ISO-8859-1?Q?Patrik_Horn=EDk?=)
Date: Sat, 19 Apr 2014 17:45:49 +0200
Subject: Many orphaned inodes after resize2fs
In-Reply-To: <CAAOsTSkEG14zmyt-etkS58YR9oR8fdggbXfdE68DXrc3BphuUA@mail.gmail.com>
References: <CAAOsTSnv-Rt-2BTw+v=sKiOFv74edhv4AUVko9wE3+hHQ1-G+Q@mail.gmail.com>
	<20140418202030.GB13642@thunk.org>
	<CAAOsTSkEG14zmyt-etkS58YR9oR8fdggbXfdE68DXrc3BphuUA@mail.gmail.com>
Message-ID: <CAAOsTSnT5KaebtAuKT=q0MHFUAT=z1s4Xe=isZ1fiub414KTAA@mail.gmail.com>

OK,

so I patched it myself and confirmed that all errors were caused by this,
because patched version does not warn about any inode and about anything.
Thus it seems resize2fs did not harm the filesystem at all and it was all
because of the false positive in e2fsck.

I patched it by considering inode part of suspected corrupted orphan list
only if i_dtime is lower than specific constant around 1.1 bilion. This is
some time before creation of my filesystem. You can find patch against
1.42.9 attached.

Please confirm that this is fully correct solution (for my purpose, not
elegant clean way for official fix) and it has no negative consequences. It
seems that way but I did not analyze all code paths the fixed code is in.

BTW were there any other negative consequences of this bug in e2fsck except
changing i_dtime of inodes to current time?

Thanks.

Patrik

2014-04-19 1:20 GMT+02:00 Patrik Horn?k <patrik at hornik.sk>:

> Hi,
>
> it seems you got it right! I don't know if you read email I sent you
> before posting to the mailing list, but I accidentally diagnosed the
> cause... :) I've noticed that inodes fsck warned me about, at least ones
> that I checked, all have all four timestamps latest in 2010...
>
> The filesystem has maximum 1281998848 inodes, which is timestamp in august
> 2010. I don't know how it got that big, I think I did not specified big
> value initially. But I've resized it couple of times. BTW what is default
> of group size / inode count ratio? Mine ratio is not at the maximum you
> mentioned, but it is not that far.
>
> So almost sure it is false positive by the code / bug in e2fsck/pass1.c
> around line 1070 in current version. I want to be sure that all these
> errors were caused by this, so can you please send me promptly patched
> version? I can easily patch it myself by some fixed condition, but I don't
> want miss something important... BTW maybe you can compare i_dtime with
> filesystem creation timestamp, so you dont have to put fixed number there.
>
> BTW I dont know specifics of ext3, I just looked at sources of kernel
> driver and e2fsprogs now. But what indicates that inode is / was created
> and valid ? (I did not need it to find problematic test you mentioned, did
> not see it in part of code I look at and it is not apparent to me from
> definition of struct ext3_inode).
>
> Thanks.
>
> Patrik
>
>
> 2014-04-18 22:20 GMT+02:00 <tytso at mit.edu>:
>
> On Fri, Apr 18, 2014 at 06:56:57PM +0200, Patrik Horn?k wrote:
>> >
>> > yesterday I experienced following problem with my ext3 filesystem:
>> >
>> > - I had ext3 filesystem of the size of a few TB with journal. I
>> correctly
>> > unmounted it and it was marked clean.
>> >
>> > - I then ran fsck.etx3 -f on it and it did not find any problem.
>> >
>> > - After increasing size of its LVM volume by 1.5 TB I resized the
>> > filesystem by resize2fs lvm_volume and it finished without problem.
>> >
>> > - But fsck.ext3 -f immediately after that showed "Inodes that were part
>> of
>> > a corrupted orphan linked list found." and many thousands of "Inode XXX
>> was
>> > part of the orphaned inode list." I did not accepted fix. According to
>> > debugfs all the inodes I check from these reported orphaned inodes (I
>> > checked only some from beginning of list of errors) have size 0.
>>
>> Can you send the output of dumpe2fs -h?  I'm curious how many inodes
>> you had after the resize, and what file system features might have
>> been enabled on your file system.
>>
>> If the only file system corruption errors that you saw were from about
>> the corrupted orphan inode list, then things are probably OK.
>>
>> What this error message means is that there are d_time values which
>> look like they belong to inode numbers (as opposed to number of
>> seconds since January 1, 1970).  So if you ran the system where the
>> clock was set incorrectly, so that the time was January 1, 1970, and
>> you delete a lot of files, you can run into this error --- it's
>> basically a sanity check that we put in a long time ago to catch
>> potential file system bugs caused by a corrupted orphan inode list.
>>
>> I'm thinking that we should turn off this check if the e2fsck.conf
>> "broken_system_lock" is enabled, since if the system has a busted
>> system clock, this can end up triggering a bunch of scary warnings.
>>
>> In any case, when you grew the size of the file system, this also
>> increased the number of inodes, which means it would increase the
>> sensitivity of hitting this bug.  It's also possible that if you
>> created your file system with the number of inodes per block group
>> close to the maximum (assuming an average file size 4k, which would be
>> highly wasteful of space, so it' s not the default), that you ended up
>> with the maximum number of inodes exceeding 1.2 or 1.3 billion inodes,
>> at which point it would trigger a false positive.  (And indeed, I
>> should probably put in a fix to e2fsprogs so that if a file system
>> does have more than 1.2 billion inodes, to disable this check.)
>>
>> Cheers,
>>
>>                                                 - Ted
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20140419/edfeaa6d/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: big-fs-fsck-fix.patch
Type: application/octet-stream
Size: 584 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20140419/edfeaa6d/attachment.obj>

From tytso at mit.edu  Sat Apr 19 15:48:25 2014
From: tytso at mit.edu (Theodore Ts'o)
Date: Sat, 19 Apr 2014 11:48:25 -0400
Subject: Many orphaned inodes after resize2fs
In-Reply-To: <CAAOsTSn8v3XmG-ri7Om+=em8zZVQjecdf5A0ng8jhZsntAWfCg@mail.gmail.com>
References: <CAAOsTSnv-Rt-2BTw+v=sKiOFv74edhv4AUVko9wE3+hHQ1-G+Q@mail.gmail.com>
	<20140418202030.GB13642@thunk.org>
	<CAAOsTSkEG14zmyt-etkS58YR9oR8fdggbXfdE68DXrc3BphuUA@mail.gmail.com>
	<CAAOsTSn8v3XmG-ri7Om+=em8zZVQjecdf5A0ng8jhZsntAWfCg@mail.gmail.com>
Message-ID: <20140419154825.GC31552@thunk.org>

On Sat, Apr 19, 2014 at 05:42:12PM +0200, Patrik Horn?k wrote:
> 
> Please confirm that this is fully correct solution (for my purpose, not
> elegant clean way for official fix) and it has no negative consequences. It
> seems that way but I did not analyze all code paths the fixed code is in.

Yes, that's a fine solution.  What I'll probably do is disable the
check if s_inodes_count is greater than s_mkfs_time minus some fudge
value, or if the broken system clock boolean is set.

> BTW were there any other negative consequences of this bug in e2fsck except
> changing i_dtime of inodes to current time?

Nope, that would be the only consequence --- if you don't the system
administrator's anxiety that was induced by the false positive!

Thanks for pointing out this problem.  I'll make sure it gets fixed in
the next maintenance release of e2fsprogs.

					- Ted


From patrik at hornik.sk  Sat Apr 19 16:54:03 2014
From: patrik at hornik.sk (=?ISO-8859-1?Q?Patrik_Horn=EDk?=)
Date: Sat, 19 Apr 2014 18:54:03 +0200
Subject: Many orphaned inodes after resize2fs
In-Reply-To: <20140419154825.GC31552@thunk.org>
References: <CAAOsTSnv-Rt-2BTw+v=sKiOFv74edhv4AUVko9wE3+hHQ1-G+Q@mail.gmail.com>
	<20140418202030.GB13642@thunk.org>
	<CAAOsTSkEG14zmyt-etkS58YR9oR8fdggbXfdE68DXrc3BphuUA@mail.gmail.com>
	<CAAOsTSn8v3XmG-ri7Om+=em8zZVQjecdf5A0ng8jhZsntAWfCg@mail.gmail.com>
	<20140419154825.GC31552@thunk.org>
Message-ID: <CAAOsTSktc2faYp5x_-3OcT3_DU81mHSQaeXTn1EoixmOLNX98A@mail.gmail.com>

2014-04-19 17:48 GMT+02:00 Theodore Ts'o <tytso at mit.edu>:

> On Sat, Apr 19, 2014 at 05:42:12PM +0200, Patrik Horn?k wrote:
> >
> > Please confirm that this is fully correct solution (for my purpose, not
> > elegant clean way for official fix) and it has no negative consequences.
> It
> > seems that way but I did not analyze all code paths the fixed code is in.
>
> Yes, that's a fine solution.  What I'll probably do is disable the
> check if s_inodes_count is greater than s_mkfs_time minus some fudge
> value, or if the broken system clock boolean is set.
>
> > BTW were there any other negative consequences of this bug in e2fsck
> except
> > changing i_dtime of inodes to current time?
>
> Nope, that would be the only consequence --- if you don't the system
> administrator's anxiety that was induced by the false positive!
>

Indeed it was no fun first couple of hours until I confirmed that data seem
OK by comparing some of it to backup :)

>From now on we will resize and fsck fs only with backup LVM snapshots. How
much data is approximately overwritten / moved when resizing fs?


> Thanks for pointing out this problem.  I'll make sure it gets fixed in
> the next maintenance release of e2fsprogs.
>
>                                         - Ted


Thanks for your prompt assistance.

Patrik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20140419/abb7df82/attachment.htm>

From patrik at dsl.sk  Fri Apr 18 16:37:21 2014
From: patrik at dsl.sk (=?ISO-8859-1?Q?Patrik_Horn=EDk?=)
Date: Fri, 18 Apr 2014 18:37:21 +0200
Subject: Many orphaned inodes after resize2fs
Message-ID: <CAAOsTSmaS8J_zkMKGFaCSmuw5eQhR1r1dduhD=bwv6ntjS5evQ@mail.gmail.com>

Hello,

yesterday I experienced following problem with my ext3 filesystem:

- I had ext3 filesystem of the size of a few TB with journal. I correctly
unmounted it and it was marked clean.

- I then ran fsck.etx3 -f on it and it did not find any problem.

- After increasing size of its LVM volume by 1.5 TB I resized the
filesystem by resize2fs lvm_volume and it finished without problem.

- But fsck.ext3 -f immediately after that showed "Inodes that were part of
a corrupted orphan linked list found." and many thousands of "Inode XXX was
part of the orphaned inode list." I did not accepted fix. According to
debugfs all the inodes I check from these reported orphaned inodes (I
checked only some from beginning of list of errors) have size 0.

- When I mount the fs read only the data I was able to check seem OK. (But
I am unable to check everything.)

- I created LVM snapshot and repaired the fs on it with fsck.ext3. After
that there we no files in lost+found. Does it mean that all that orphaned
inodes have size 0? Or when the fsck does not create files in lost+found?

- I am checking the data against various backups but I will not be able to
check everything and some less important data dont have backup. So I would
like to know in what state the fs is and what are best next steps.

- Right now I am planning to use current LVM snapshot as test run and
discard it after data check. Original fs is in the state just after
resize2fs, fsck was run on it after that but I did not accepted any fix and
cancelled the check. I then plan to create backup snapshot, fsck original
fs / LVM volume, check once again against backups and go with it. But this
will not tell me status of all my data and the fs and if it is secure to
use it. Another problem is all operations take long hours.

- I have also some technical specific questions. Orphan inode is valid
inode not found in any directory, right? What exactly is CORRUPTED orphan
linked list? What can cause such problem? Is it known problem? How can
orphaned inodes and corrupted orphan linked list can be created by
resize2fs or why was it not detected by fsck.ext3 before that? Can it be
serious and can it be symptom of some data loss? Can fixing it by fsck.ext3
corrupt other data which are OK now, when I mount the fs read-only?

- The platform used was latest stable Debian with
kernel linux-image-3.2.0-4-amd64 version 3.2.46-1+deb7u1
and e2fsprogs 1.42.5-1.1. After the incident I started
using linux-image-3.13-1-amd64 version 3.13.7-1 (from the point of
snapshot's creation and running fsck for real on snapshot) and thinking
about going to e2fsprogs 1.42.9 from sources.

Thank you very much.

Patrik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20140418/cfe42890/attachment.htm>

From patrik at dsl.sk  Sat Apr 19 15:42:12 2014
From: patrik at dsl.sk (=?ISO-8859-1?Q?Patrik_Horn=EDk?=)
Date: Sat, 19 Apr 2014 17:42:12 +0200
Subject: Many orphaned inodes after resize2fs
In-Reply-To: <CAAOsTSkEG14zmyt-etkS58YR9oR8fdggbXfdE68DXrc3BphuUA@mail.gmail.com>
References: <CAAOsTSnv-Rt-2BTw+v=sKiOFv74edhv4AUVko9wE3+hHQ1-G+Q@mail.gmail.com>
	<20140418202030.GB13642@thunk.org>
	<CAAOsTSkEG14zmyt-etkS58YR9oR8fdggbXfdE68DXrc3BphuUA@mail.gmail.com>
Message-ID: <CAAOsTSn8v3XmG-ri7Om+=em8zZVQjecdf5A0ng8jhZsntAWfCg@mail.gmail.com>

OK,

so I patched it myself and confirmed that all errors were caused by this,
because patched version does not warn about any inode and about anything.
Thus it seems resize2fs did not harm the filesystem at all and it was all
because of the false positive in e2fsck.

I patched it by considering inode part of suspected corrupted orphan list
only if i_dtime is lower than specific constant around 1.1 bilion. This is
some time before creation of my filesystem. You can find patch against
1.42.9 attached.

Please confirm that this is fully correct solution (for my purpose, not
elegant clean way for official fix) and it has no negative consequences. It
seems that way but I did not analyze all code paths the fixed code is in.

BTW were there any other negative consequences of this bug in e2fsck except
changing i_dtime of inodes to current time?

Thanks.

Patrik


2014-04-19 1:20 GMT+02:00 Patrik Horn?k <patrik at hornik.sk>:

> Hi,
>
> it seems you got it right! I don't know if you read email I sent you
> before posting to the mailing list, but I accidentally diagnosed the
> cause... :) I've noticed that inodes fsck warned me about, at least ones
> that I checked, all have all four timestamps latest in 2010...
>
> The filesystem has maximum 1281998848 inodes, which is timestamp in august
> 2010. I don't know how it got that big, I think I did not specified big
> value initially. But I've resized it couple of times. BTW what is default
> of group size / inode count ratio? Mine ratio is not at the maximum you
> mentioned, but it is not that far.
>
> So almost sure it is false positive by the code / bug in e2fsck/pass1.c
> around line 1070 in current version. I want to be sure that all these
> errors were caused by this, so can you please send me promptly patched
> version? I can easily patch it myself by some fixed condition, but I don't
> want miss something important... BTW maybe you can compare i_dtime with
> filesystem creation timestamp, so you dont have to put fixed number there.
>
> BTW I dont know specifics of ext3, I just looked at sources of kernel
> driver and e2fsprogs now. But what indicates that inode is / was created
> and valid ? (I did not need it to find problematic test you mentioned, did
> not see it in part of code I look at and it is not apparent to me from
> definition of struct ext3_inode).
>
> Thanks.
>
> Patrik
>
>
> 2014-04-18 22:20 GMT+02:00 <tytso at mit.edu>:
>
> On Fri, Apr 18, 2014 at 06:56:57PM +0200, Patrik Horn?k wrote:
>> >
>> > yesterday I experienced following problem with my ext3 filesystem:
>> >
>> > - I had ext3 filesystem of the size of a few TB with journal. I
>> correctly
>> > unmounted it and it was marked clean.
>> >
>> > - I then ran fsck.etx3 -f on it and it did not find any problem.
>> >
>> > - After increasing size of its LVM volume by 1.5 TB I resized the
>> > filesystem by resize2fs lvm_volume and it finished without problem.
>> >
>> > - But fsck.ext3 -f immediately after that showed "Inodes that were part
>> of
>> > a corrupted orphan linked list found." and many thousands of "Inode XXX
>> was
>> > part of the orphaned inode list." I did not accepted fix. According to
>> > debugfs all the inodes I check from these reported orphaned inodes (I
>> > checked only some from beginning of list of errors) have size 0.
>>
>> Can you send the output of dumpe2fs -h?  I'm curious how many inodes
>> you had after the resize, and what file system features might have
>> been enabled on your file system.
>>
>> If the only file system corruption errors that you saw were from about
>> the corrupted orphan inode list, then things are probably OK.
>>
>> What this error message means is that there are d_time values which
>> look like they belong to inode numbers (as opposed to number of
>> seconds since January 1, 1970).  So if you ran the system where the
>> clock was set incorrectly, so that the time was January 1, 1970, and
>> you delete a lot of files, you can run into this error --- it's
>> basically a sanity check that we put in a long time ago to catch
>> potential file system bugs caused by a corrupted orphan inode list.
>>
>> I'm thinking that we should turn off this check if the e2fsck.conf
>> "broken_system_lock" is enabled, since if the system has a busted
>> system clock, this can end up triggering a bunch of scary warnings.
>>
>> In any case, when you grew the size of the file system, this also
>> increased the number of inodes, which means it would increase the
>> sensitivity of hitting this bug.  It's also possible that if you
>> created your file system with the number of inodes per block group
>> close to the maximum (assuming an average file size 4k, which would be
>> highly wasteful of space, so it' s not the default), that you ended up
>> with the maximum number of inodes exceeding 1.2 or 1.3 billion inodes,
>> at which point it would trigger a false positive.  (And indeed, I
>> should probably put in a fix to e2fsprogs so that if a file system
>> does have more than 1.2 billion inodes, to disable this check.)
>>
>> Cheers,
>>
>>                                                 - Ted
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20140419/83a4f4a3/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: big-fs-fsck-fix.patch
Type: application/octet-stream
Size: 584 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20140419/83a4f4a3/attachment.obj>