From daytooner at gmail.com  Fri Jan 17 16:32:48 2014
From: daytooner at gmail.com (Ken Bass)
Date: Fri, 17 Jan 2014 08:32:48 -0800
Subject: Very long delay for first write to big filesystem
Message-ID: <CAH5g026grVCzJf1MSBuVrep9PguuCtkV0LKj=GqrLcGYvjnriQ@mail.gmail.com>

I asked about this a while back. It seems that this problem is getting much
worse.

The problem/issue: there is a very long delay when my system does a write
to the filesystem. The delay now is over 5 minutes (yes: minutes). This
only happens on the first write after booting up the system, and only for
large files - 1GB or more. This can be a serious problem since all access
to any hard disk is blocked and will hang until the first write begins
again.

The prevailing thought at the time was this was associated with loading
into memory the directory information looking for free space, which I would
believe now.

The filesystem in question is 7.5TB, with about 4TB used. There are over
250,000 files. I also have another system with 1TB total and 400GB used,
with 65,000 files. This system, the smaller one, is beginning to show
delays as well, although only a few seconds.

This problem seems to involve several factors: the total size of the
system; the current "fragmentation" of that system; and finally the amount
of physical memory available.

As to the last factor, the 7.5TB system has only 2GB of memory (I didn't
think that it would need a lot since it is mostly being used as a file
server). The "fragmentation" factor (I am only guessing here) occurs with
having many files written and deleted over time.

So my questions are: is there a solution or work around for this; and is
this a bug, or perhaps an undesirable feature. If the latter, should this
be reported (somewhere)?

Any suggestions, tips, etc. greatly appreciated.

TIA

ken
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20140117/e2c3528f/attachment.htm>

From lakshmipathi.g at gmail.com  Sat Jan 18 12:13:11 2014
From: lakshmipathi.g at gmail.com (Lakshmipathi.G)
Date: Sat, 18 Jan 2014 17:43:11 +0530
Subject: File System corruption tool
Message-ID: <CAKuJGC8dSugiaTSxUW+qdcP8b6=6Qj+aEqCp+iz3S8LBsG-42A@mail.gmail.com>

Hi -

I'm searching for file system corruption tool, say it inject disk-errors
like
multiply owned blocks etc. Later an integrity scan process (like e2fsck)
will
verify on-disk layout and fix these errors.

I'd like to read/understand such tools before writing one for an proprietary
on-disk file system.

Do we have such tools for ext{2,3,4}fs ? Thanks for any help or pointers!

-- 
----
Cheers,
Lakshmipathi.G
FOSS Programmer.
www.giis.co.in
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20140118/e4720acf/attachment.htm>

From ricwheeler at gmail.com  Sat Jan 18 12:40:28 2014
From: ricwheeler at gmail.com (Ric Wheeler)
Date: Sat, 18 Jan 2014 07:40:28 -0500
Subject: File System corruption tool
In-Reply-To: <CAKuJGC8dSugiaTSxUW+qdcP8b6=6Qj+aEqCp+iz3S8LBsG-42A@mail.gmail.com>
References: <CAKuJGC8dSugiaTSxUW+qdcP8b6=6Qj+aEqCp+iz3S8LBsG-42A@mail.gmail.com>
Message-ID: <52DA763C.1090505@gmail.com>

On 01/18/2014 07:13 AM, Lakshmipathi.G wrote:
> Hi -
>
> I'm searching for file system corruption tool, say it inject disk-errors like
> multiply owned blocks etc. Later an integrity scan process (like e2fsck) will
> verify on-disk layout and fix these errors.
>
> I'd like to read/understand such tools before writing one for an proprietary
> on-disk file system.
>
> Do we have such tools for ext{2,3,4}fs ? Thanks for any help or pointers!
>
>

For s-ata drives, you can use hdparm to create a bad sector that will cause an 
IO error on read. (Write will fix it)

Ric


From adilger at dilger.ca  Sat Jan 18 17:09:20 2014
From: adilger at dilger.ca (Andreas Dilger)
Date: Sat, 18 Jan 2014 10:09:20 -0700
Subject: Very long delay for first write to big filesystem
In-Reply-To: <CAH5g026grVCzJf1MSBuVrep9PguuCtkV0LKj=GqrLcGYvjnriQ@mail.gmail.com>
References: <CAH5g026grVCzJf1MSBuVrep9PguuCtkV0LKj=GqrLcGYvjnriQ@mail.gmail.com>
Message-ID: <9263807E-9BD9-41B0-AC1E-E7D4CBD4CA04@dilger.ca>

On Jan 17, 2014, at 9:32, Ken Bass <daytooner at gmail.com> wrote:
> 
> The problem/issue: there is a very long delay when my system does a write to the filesystem. The delay now is over 5 minutes (yes: minutes). This only happens on the first write after booting up the system, and only for large files - 1GB or more. This can be a serious problem since all access to any hard disk is blocked and will hang until the first write begins again.
> 
> The prevailing thought at the time was this was associated with loading into memory the directory information looking for free space, which I would believe now.

It isn't actually directory information that is being loaded, but rather the
block bitmaps from each group, and each one needs a seek to read. 
This will take up to 7.5 TB / 128 MB/group / 100 seeks/sec = 600s
if the filesystem is nearly full. After this point, the bitmaps are cached
In memory and allocation is faster. 

> The filesystem in question is 7.5TB, with about 4TB used. There are over 250,000 files. I also have another system with 1TB total and 400GB used, with 65,000 files. This system, the smaller one, is beginning to show delays as well, although only a few seconds.
> 
> This problem seems to involve several factors: the total size of the system; the current "fragmentation" of that system; and finally the amount of physical memory available.
> 
> As to the last factor, the 7.5TB system has only 2GB of memory (I didn't think that it would need a lot since it is mostly being used as a file server). The "fragmentation" factor (I am only guessing here) occurs with having many files written and deleted over time.
> 
> So my questions are: is there a solution or work around for this; and is this a bug, or perhaps an undesirable feature. If the latter, should this be reported (somewhere)?

You might consider mounting the filesystem as ext4 instead of ext3.
It will do a slightly better job of finding contiguous free space
and avoid loading bitmaps that do not have enough space, but the
physics of seeking to read bitmaps is still the same. 

If you format a new filesystem as ext4 (as opposed to just mounting the
existing filesystem as ext4) you can use a new feature "flex_bg" that
locates the block and inode bitmaps together so that they can be read
without so much seeking. You'd need a spare disk to format and copy
the data over to.

Using ext4 is also more resistant to fragmentation over time. 

Cheers, Andreas

> Any suggestions, tips, etc. greatly appreciated.
> 
> TIA
> 
> ken
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users


From adilger at dilger.ca  Sat Jan 18 22:29:09 2014
From: adilger at dilger.ca (Andreas Dilger)
Date: Sat, 18 Jan 2014 15:29:09 -0700
Subject: File System corruption tool
In-Reply-To: <CAKuJGC8dSugiaTSxUW+qdcP8b6=6Qj+aEqCp+iz3S8LBsG-42A@mail.gmail.com>
References: <CAKuJGC8dSugiaTSxUW+qdcP8b6=6Qj+aEqCp+iz3S8LBsG-42A@mail.gmail.com>
Message-ID: <3698A346-9586-4656-8010-F4CDE0D3319E@dilger.ca>

We have a script that adds corruption to ext2/3/4 filesystems and runs 
e2fsck on it. It definitely could be improved, but it still catches some
occasional errors:

http://git.whamcloud.com/?p=tools/e2fsprogs.git;a=commit;h=aee44c669bebe29bfdb8a1c86da443234f8bc257

It tries to format the filesystem with different features and options, then
adds corruption from both random data and copying parts of the
filesystem internally to some other part of the filesystem. It might be useful to corrupt some random bits and bytes in the filesystem also,
but it doesn't do that today.

There is also fsfuzzer, which writes random data to the filesystem and
tries to mount it, but I don't know if that has been tried with e2fsck. 

The other major question I have is why you are trying to create a new
proprietary filesystem?  That is really a ten year effort, and you would
be much better off to use one of the many existing filesystems. If
the current ones don't meet your exact needs, add the missing
features you need instead of creating a whole new one from scratch.

While I'm a big fan of ext4, there are many other good filesystems out
there - XFS, Btrfs, ZFS, and several flash filesystems. 

Cheers, Andreas

> On Jan 18, 2014, at 5:13, "Lakshmipathi.G" <lakshmipathi.g at gmail.com> wrote:
> 
> Hi - 
> 
> I'm searching for file system corruption tool, say it inject disk-errors like
> multiply owned blocks etc. Later an integrity scan process (like e2fsck) will
> verify on-disk layout and fix these errors. 
> 
> I'd like to read/understand such tools before writing one for an proprietary
> on-disk file system. 
> 
> Do we have such tools for ext{2,3,4}fs ? Thanks for any help or pointers!
> 
> -- 
> ----
> Cheers,
> Lakshmipathi.G
> FOSS Programmer.
> www.giis.co.in
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20140118/e0e2699e/attachment.htm>

From lakshmipathi.g at gmail.com  Sun Jan 19 14:24:28 2014
From: lakshmipathi.g at gmail.com (Lakshmipathi.G)
Date: Sun, 19 Jan 2014 19:54:28 +0530
Subject: File System corruption tool
In-Reply-To: <3698A346-9586-4656-8010-F4CDE0D3319E@dilger.ca>
References: <CAKuJGC8dSugiaTSxUW+qdcP8b6=6Qj+aEqCp+iz3S8LBsG-42A@mail.gmail.com>
	<3698A346-9586-4656-8010-F4CDE0D3319E@dilger.ca>
Message-ID: <CAKuJGC9LgpaQ-yV8a8jv1jxp3j_5K7wtU5D+zzi-ZNZ2GQV5KA@mail.gmail.com>

> For s-ata drives, you can use hdparm to create a bad sector that will
cause an IO error on read. (Write will fix it)

Thanks Ric Wheeler. Just looked into hdparm, it has nice option to
corrupt sectors but I need to manipulate disk-entries also (like
add invalid nlink value in an inode structure etc).

> We have a script that adds corruption to ext2/3/4 filesystems and runs
> e2fsck on it. It definitely could be improved, but it still catches some

Thanks Andreas. I was going the script, liked idea to create file system
image based on random files. It has minimal option as now of. Will look
into fsfuzzer.

> The other major question I have is why you are trying to create a new
> proprietary filesystem?

Sorry, I should have been clearer. I was assigned with a task of creating a
framework to corrupt disk-layout on an existing FreeBSD based closed source
file system. We have e2fsck like integrity checker, but corrupting disk
script
needs to be added. Thanks!


-- 
----
Cheers,
Lakshmipathi.G
FOSS Programmer.
www.giis.co.in
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20140119/c5ffc90f/attachment.htm>

From daytooner at gmail.com  Mon Jan 20 02:07:54 2014
From: daytooner at gmail.com (Ken Bass)
Date: Sun, 19 Jan 2014 18:07:54 -0800
Subject: Very long delay for first write to big filesystem
In-Reply-To: <9263807E-9BD9-41B0-AC1E-E7D4CBD4CA04@dilger.ca>
References: <CAH5g026grVCzJf1MSBuVrep9PguuCtkV0LKj=GqrLcGYvjnriQ@mail.gmail.com>
	<9263807E-9BD9-41B0-AC1E-E7D4CBD4CA04@dilger.ca>
Message-ID: <CAH5g025LQE-BjmaLOATOZ=GPtzsBRwpEaiFWhqLULJ-PSOwp4g@mail.gmail.com>

Thx Andreas.

re: block bitmaps - yes that is what I really meant. My experience with
filesystems is mainly from CPM/BDOS, where "directory" and block mapping
are essentially synonymous.

And now I understand about the timing. Makes sense when you describe it
that way.

My system is ext4, although I doubt that I used "flex_bg" option, since
this was first created awhile back. I did try to run e4defrag. It simply
said that no defrag was needed.

So, now I'm only left in  need of a work-around. Perhaps a way to have the
system load the bitmaps at boot time in the background? It would need to be
done in such a way that it would not block any other access to that system.
Or, is there a better filesystem format that would not have this problem?
(Not a really great solution, since I would need to somehow/somewhere
backup my 7.5TB system first.)

It does seem strange that this hasn't become a more serious issue, as
typical filesystems are getting bigger now. And I can't imagine a really
large network server (10TB+) having to deal with this.

Again, thx for the response.

ken


On Sat, Jan 18, 2014 at 9:09 AM, Andreas Dilger <adilger at dilger.ca> wrote:

> On Jan 17, 2014, at 9:32, Ken Bass <daytooner at gmail.com> wrote:
> >
> > The problem/issue: there is a very long delay when my system does a
> write to the filesystem. The delay now is over 5 minutes (yes: minutes).
> This only happens on the first write after booting up the system, and only
> for large files - 1GB or more. This can be a serious problem since all
> access to any hard disk is blocked and will hang until the first write
> begins again.
> >
> > The prevailing thought at the time was this was associated with loading
> into memory the directory information looking for free space, which I would
> believe now.
>
> It isn't actually directory information that is being loaded, but rather
> the
> block bitmaps from each group, and each one needs a seek to read.
> This will take up to 7.5 TB / 128 MB/group / 100 seeks/sec = 600s
> if the filesystem is nearly full. After this point, the bitmaps are cached
> In memory and allocation is faster.
>
> > The filesystem in question is 7.5TB, with about 4TB used. There are over
> 250,000 files. I also have another system with 1TB total and 400GB used,
> with 65,000 files. This system, the smaller one, is beginning to show
> delays as well, although only a few seconds.
> >
> > This problem seems to involve several factors: the total size of the
> system; the current "fragmentation" of that system; and finally the amount
> of physical memory available.
> >
> > As to the last factor, the 7.5TB system has only 2GB of memory (I didn't
> think that it would need a lot since it is mostly being used as a file
> server). The "fragmentation" factor (I am only guessing here) occurs with
> having many files written and deleted over time.
> >
> > So my questions are: is there a solution or work around for this; and is
> this a bug, or perhaps an undesirable feature. If the latter, should this
> be reported (somewhere)?
>
> You might consider mounting the filesystem as ext4 instead of ext3.
> It will do a slightly better job of finding contiguous free space
> and avoid loading bitmaps that do not have enough space, but the
> physics of seeking to read bitmaps is still the same.
>
> If you format a new filesystem as ext4 (as opposed to just mounting the
> existing filesystem as ext4) you can use a new feature "flex_bg" that
> locates the block and inode bitmaps together so that they can be read
> without so much seeking. You'd need a spare disk to format and copy
> the data over to.
>
> Using ext4 is also more resistant to fragmentation over time.
>
> Cheers, Andreas
>
> > Any suggestions, tips, etc. greatly appreciated.
> >
> > TIA
> >
> > ken
> >
> > _______________________________________________
> > Ext3-users mailing list
> > Ext3-users at redhat.com
> > https://www.redhat.com/mailman/listinfo/ext3-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20140119/2bd5f865/attachment.htm>

From adilger at dilger.ca  Mon Jan 20 21:46:01 2014
From: adilger at dilger.ca (Andreas Dilger)
Date: Mon, 20 Jan 2014 14:46:01 -0700
Subject: Very long delay for first write to big filesystem
In-Reply-To: <CAH5g025LQE-BjmaLOATOZ=GPtzsBRwpEaiFWhqLULJ-PSOwp4g@mail.gmail.com>
References: <CAH5g026grVCzJf1MSBuVrep9PguuCtkV0LKj=GqrLcGYvjnriQ@mail.gmail.com>
	<9263807E-9BD9-41B0-AC1E-E7D4CBD4CA04@dilger.ca>
	<CAH5g025LQE-BjmaLOATOZ=GPtzsBRwpEaiFWhqLULJ-PSOwp4g@mail.gmail.com>
Message-ID: <805DD079-E4FD-4C4B-B418-9C165A76328C@dilger.ca>

On Jan 19, 2014, at 7:07 PM, Ken Bass <daytooner at gmail.com> wrote:
> re: block bitmaps - yes that is what I really meant. My experience with filesystems is mainly from CPM/BDOS, where "directory" and block mapping are essentially synonymous.
> 
> And now I understand about the timing. Makes sense when you describe it that way.
> 
> My system is ext4, although I doubt that I used "flex_bg" option, since this was first created awhile back. I did try to run e4defrag. It simply said that no defrag was needed.

Use "dumpe2fs -h /dev/XXX | grep feature" to see if it is listed.

> So, now I'm only left in  need of a work-around. Perhaps a way to have the system load the bitmaps at boot time in the background? It would need to be done in such a way that it would not block any other access to that system.

We had a similar problem in the past.  Run "dumpe2fs /dev/XXX > /dev/null"
at startup time (can be before or after mount) to start it reading the
block and inode allocation bitmaps.

>  Or, is there a better filesystem format that would not have this problem? (Not a really great solution, since I would need to somehow/somewhere backup my 7.5TB system first.)

Yes, formatting with "mke2fs -t ext4" should enable flex_bg by default.

> It does seem strange that this hasn't become a more serious issue, as typical filesystems are getting bigger now. And I can't imagine a really large network server (10TB+) having to deal with this.

That's why the flex_bg feature was added to ext4 in the first place.

Cheers, Andreas

> Again, thx for the response.
> 
> ken
> 
> 
> On Sat, Jan 18, 2014 at 9:09 AM, Andreas Dilger <adilger at dilger.ca> wrote:
> On Jan 17, 2014, at 9:32, Ken Bass <daytooner at gmail.com> wrote:
> >
> > The problem/issue: there is a very long delay when my system does a write to the filesystem. The delay now is over 5 minutes (yes: minutes). This only happens on the first write after booting up the system, and only for large files - 1GB or more. This can be a serious problem since all access to any hard disk is blocked and will hang until the first write begins again.
> >
> > The prevailing thought at the time was this was associated with loading into memory the directory information looking for free space, which I would believe now.
> 
> It isn't actually directory information that is being loaded, but rather the
> block bitmaps from each group, and each one needs a seek to read.
> This will take up to 7.5 TB / 128 MB/group / 100 seeks/sec = 600s
> if the filesystem is nearly full. After this point, the bitmaps are cached
> In memory and allocation is faster.
> 
> > The filesystem in question is 7.5TB, with about 4TB used. There are over 250,000 files. I also have another system with 1TB total and 400GB used, with 65,000 files. This system, the smaller one, is beginning to show delays as well, although only a few seconds.
> >
> > This problem seems to involve several factors: the total size of the system; the current "fragmentation" of that system; and finally the amount of physical memory available.
> >
> > As to the last factor, the 7.5TB system has only 2GB of memory (I didn't think that it would need a lot since it is mostly being used as a file server). The "fragmentation" factor (I am only guessing here) occurs with having many files written and deleted over time.
> >
> > So my questions are: is there a solution or work around for this; and is this a bug, or perhaps an undesirable feature. If the latter, should this be reported (somewhere)?
> 
> You might consider mounting the filesystem as ext4 instead of ext3.
> It will do a slightly better job of finding contiguous free space
> and avoid loading bitmaps that do not have enough space, but the
> physics of seeking to read bitmaps is still the same.
> 
> If you format a new filesystem as ext4 (as opposed to just mounting the
> existing filesystem as ext4) you can use a new feature "flex_bg" that
> locates the block and inode bitmaps together so that they can be read
> without so much seeking. You'd need a spare disk to format and copy
> the data over to.
> 
> Using ext4 is also more resistant to fragmentation over time.
> 
> Cheers, Andreas
> 
> > Any suggestions, tips, etc. greatly appreciated.
> >
> > TIA
> >
> > ken
> >
> > _______________________________________________
> > Ext3-users mailing list
> > Ext3-users at redhat.com
> > https://www.redhat.com/mailman/listinfo/ext3-users
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users


Cheers, Andreas


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20140120/2840aae3/attachment.sig>

From vyshakh.krishnan at gmail.com  Wed Jan 29 04:41:05 2014
From: vyshakh.krishnan at gmail.com (VYSHAKH KRISHNAN CH)
Date: Wed, 29 Jan 2014 04:41:05 +0000 (UTC)
Subject: Invitation to connect on LinkedIn
Message-ID: <541068172.1450798.1390970465684.JavaMail.app@ela4-app0086.prod>

LinkedIn
------------


I'd like to add you to my professional network on LinkedIn.

- VYSHAKH

VYSHAKH KRISHNAN CH
Software Engineer at Ericsson
Bengaluru Area, India

Confirm that you know VYSHAKH KRISHNAN CH:
https://www.linkedin.com/e/-v6wknk-hr03wgb1-6h/isd/19706316761/43cr4Cgy/?hs=false&tok=2T55fcOOLVLm41

--
You are receiving Invitation to Connect emails. Click to unsubscribe:
http://www.linkedin.com/e/-v6wknk-hr03wgb1-6h/qxeZrUjp1nwyb3Pfqiy3okSt-vmyBELIgd/goo/ext3-users%40redhat%2Ecom/20061/I6368510451_1/?hs=false&tok=0HqWpZGkfVLm41

(c) 2012 LinkedIn Corporation. 2029 Stierlin Ct, Mountain View, CA 94043, USA.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20140129/f02528d8/attachment.htm>