From smccauliff at mail.arc.nasa.gov  Thu Aug  2 01:55:53 2007
From: smccauliff at mail.arc.nasa.gov (Sean McCauliff)
Date: Wed, 01 Aug 2007 18:55:53 -0700
Subject: Poor Performance WhenNumber of Files > 1M
Message-ID: <46B139A9.8070808@mail.arc.nasa.gov>

Hi all,

I plan on having about 100M files totaling about 8.5TiBytes.   To see 
how ext3 would perform with large numbers of files I've written a test 
program which creates a configurable number of files into a configurable 
number of directories, reads from those files, lists them and then 
deletes them.  Even up to 1M files ext3 seems to perform well and scale 
linearly; the time to execute the program on 1M files is about double 
the time it takes it to execute on .5M files.  But past 1M files it 
seems to have n^2 scalability.  Test details appear below.

Looking at the various options for ext3 nothing jumps out as the obvious 
one to use to improve performance.

Any recommendations?

Thanks!
Sean

------
Parameter one is number of files, parameter two is number of directories 
to write into.

Dell MD3000 + LVM2. 2x7 10k rpm SAS disks 128k stripe RAID-0 for 3.8 
TiBytes of total storage.  Fedora Core 6 x86_64.  2xQuad Core Xeon. 
Default mount and ext3 options used.

[root at galaxy filestore]# time /soc/abyss/test/fileSystemTest.pl 10000 1000

real    0m1.054s
user    0m0.128s
sys     0m0.382s

[root at galaxy filestore]# time /soc/abyss/test/fileSystemTest.pl 1000000 1000

real    1m0.938s
user    0m12.203s
sys     0m40.358s
[root at galaxy filestore]# time /soc/abyss/test/fileSystemTest.pl 10000000 
1000

real    13m39.881s
user    2m6.645s
sys     7m26.665s
[root at galaxy filestore]# time /soc/abyss/test/fileSystemTest.pl 20000000 
1000

real    44m46.359s
user    4m22.911s
sys     17m2.792s




From davids at webmaster.com  Thu Aug  2 04:42:28 2007
From: davids at webmaster.com (David Schwartz)
Date: Wed, 1 Aug 2007 21:42:28 -0700
Subject: Poor Performance WhenNumber of Files > 1M
In-Reply-To: <46B139A9.8070808@mail.arc.nasa.gov>
Message-ID: <MDEHLPKNGKAHNMBLJOLKKEHFFNAC.davids@webmaster.com>


> Hi all,
>
> I plan on having about 100M files totaling about 8.5TiBytes.   To see
> how ext3 would perform with large numbers of files I've written a test
> program which creates a configurable number of files into a configurable
> number of directories, reads from those files, lists them and then
> deletes them.  Even up to 1M files ext3 seems to perform well and scale
> linearly; the time to execute the program on 1M files is about double
> the time it takes it to execute on .5M files.  But past 1M files it
> seems to have n^2 scalability.  Test details appear below.
>
> Looking at the various options for ext3 nothing jumps out as the obvious
> one to use to improve performance.
>
> Any recommendations?

If you want performance that's not O(n^2), the number of directory levels
must go up one each time the order of magnitude of the number of files goes
up. That is, the number of files per directory must be constant.

Suppose you have a directory of N files. To locate each file requires N
location  operations each requiring looking at an average of N/2 files. So
it is O(N*(N2)), which is O(N^2).

Add another level of directories each time you increase the number of files
by a factor of 10.

DS




From darkonc at gmail.com  Thu Aug  2 05:11:29 2007
From: darkonc at gmail.com (Stephen Samuel)
Date: Wed, 1 Aug 2007 22:11:29 -0700
Subject: Poor Performance WhenNumber of Files > 1M
In-Reply-To: <46B139A9.8070808@mail.arc.nasa.gov>
References: <46B139A9.8070808@mail.arc.nasa.gov>
Message-ID: <6cd50f9f0708012211j3e065301nf6db714125d89538@mail.gmail.com>

Searching for directories (to ensure no duplicates, etc) is going to
be order N^2.

Size of the directory is likely to be a limiting factor.

Try increasing to 10000 directories (in two layors of 100 each).  I'll
bet you that the result will be a pretty good increase in speed
(getting back to the speeds that you had with 1M directories).


On 8/1/07, Sean McCauliff <smccauliff at mail.arc.nasa.gov> wrote:
> Hi all,
>
> I plan on having about 100M files totaling about 8.5TiBytes.   To see
> how ext3 would perform with large numbers of files I've written a test
> program which creates a configurable number of files into a configurable
> number of directories, reads from those files, lists them and then
> deletes them.  Even up to 1M files ext3 seems to perform well and scale
> linearly; the time to execute the program on 1M files is about double
> the time it takes it to execute on .5M files.  But past 1M files it
> seems to have n^2 scalability.  Test details appear below.
>
> Looking at the various options for ext3 nothing jumps out as the obvious
> one to use to improve performance.
>
> Any recommendations?
>
-- 
Stephen Samuel http://www.bcgreen.com
778-861-7641



From ext3-users at harkless.org  Thu Aug  2 16:01:34 2007
From: ext3-users at harkless.org (Dan Harkless)
Date: Thu, 02 Aug 2007 09:01:34 -0700
Subject: "htree_dirblock_to_tree: bad entry in directory" error
Message-ID: <200708021601.l72G1YxC029439@www.harkless.org>


Hi.  I woke up this morning to find a ton of waiting emails complaining that
some cron jobs on my system couldn't run because one of my filesystems (ext3
on software RAID 1) was suddenly mounted read-only.  Always nice when you're
away from the server due to travel.  ;^>  I investigated in the logs and
found:

    2007-08-02 04:02:25 kern.crit www kernel: EXT3-fs error (device md2): htree_dirblock_to_tree: bad entry in directory #3616894: rec_len is too small for name_len - offset=103576, inode=3619715, rec_len=12, name_len=132
    2007-08-02 04:02:25 kern.err www kernel: Aborting journal on device md2.
    2007-08-02 04:02:25 kern.crit www kernel: ext3_abort called.
    2007-08-02 04:02:25 kern.crit www kernel: EXT3-fs error (device md2): ext3_journal_start_sb: Detected aborted journal
    2007-08-02 04:02:25 kern.crit www kernel: Remounting filesystem read-only

I unmounted the filesystem and ran fsck, but though it detected that the
filesystem had errors, it didn't report any findings during the check:

    fsck 1.35 (28-Feb-2004)
    e2fsck 1.35 (28-Feb-2004)
    /dev/md2: recovering journal
    /dev/md2 contains a file system with errors, check forced.
    Pass 1: Checking inodes, blocks, and sizes
    Pass 2: Checking directory structure
    Pass 3: Checking directory connectivity
    Pass 4: Checking reference counts
    Pass 5: Checking group summary information
    /dev/md2: 200231/5248992 files (1.4% non-contiguous), 1563304/10492432 blocks

I remounted the filesystem and all *seems* to be okay now.  I was curious
what "directory #3616894" (inode 3619715) was, so I did 'find / -inum
3619715 -exec ls -dioF {} \;', but the output showed that that was a
non-directory file created and last modified in 2004.  How could this be?

And what would cause an error like the above?  Am I out of the woods now, or
is there more checking of some kind that I should do to make sure this isn't
going to be happening again?

Thank you for your time!

-- 
Dan Harkless
http://harkless.org/dan/



From ulrich.windl at rz.uni-regensburg.de  Thu Aug  2 09:55:53 2007
From: ulrich.windl at rz.uni-regensburg.de (Ulrich Windl)
Date: Thu, 02 Aug 2007 11:55:53 +0200
Subject: kernel: EXT3-fs: Unsupported filesystem blocksize 8192 on md0.
Message-ID: <46B1C648.3202.1058EE1A@Ulrich.Windl.rkdvmks1.ngate.uni-regensburg.de>

Hi,

I made an ext3 filesystem with 8kB block size:
# mkfs.ext3 -T largefile -v -b 8192 /dev/md0
Warning: blocksize 8192 not usable on most systems.
mke2fs 1.38 (30-Jun-2005)
Filesystem label=
OS type: Linux
Block size=8192 (log=3)
Fragment size=8192 (log=3)
148480 inodes, 18940704 blocks
947035 blocks (5.00%) reserved for the super user
First data block=0
290 block groups
65528 blocks per group, 65528 fragments per group
512 inodes per group
Superblock backups stored on blocks:
        65528, 196584, 327640, 458696, 589752, 1638200, 1769256, 3210872,
        5307768, 8191000, 15923304

Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 22 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

When mounting it as ext2 (by mistake), the kernel says:
EXT2-fs warning (device md0): ext2_fill_super: mounting ext3 filesystem as ext2

When finally mounting the filesystem as ext3, the kernel says:
kernel: EXT3-fs: Unsupported filesystem blocksize 8192 on md0.

I don't quite understand: About any larger filesystem of today can support blocks 
> 4kB. What is the problem here?

Kernel is 2.6.16.46-0.12-default (SUSE SLES10 SP1) on IA64

Regards,
Ulrich



From adilger at clusterfs.com  Fri Aug  3 16:49:42 2007
From: adilger at clusterfs.com (Andreas Dilger)
Date: Fri, 3 Aug 2007 10:49:42 -0600
Subject: "htree_dirblock_to_tree: bad entry in directory" error
In-Reply-To: <200708021601.l72G1YxC029439@www.harkless.org>
References: <200708021601.l72G1YxC029439@www.harkless.org>
Message-ID: <20070803164942.GP6142@schatzie.adilger.int>

On Aug 02, 2007  09:01 -0700, Dan Harkless wrote:
>     2007-08-02 04:02:25 kern.crit www kernel: EXT3-fs error (device md2): htree_dirblock_to_tree: bad entry in directory #3616894: rec_len is too small for name_len - offset=103576, inode=3619715, rec_len=12, name_len=132
> 
> I unmounted the filesystem and ran fsck, but though it detected that the
> filesystem had errors, it didn't report any findings during the check:
> 
>     fsck 1.35 (28-Feb-2004)
>     e2fsck 1.35 (28-Feb-2004)
>     /dev/md2: recovering journal
>     /dev/md2 contains a file system with errors, check forced.
>     Pass 1: Checking inodes, blocks, and sizes
>     Pass 2: Checking directory structure
>     Pass 3: Checking directory connectivity
>     Pass 4: Checking reference counts
>     Pass 5: Checking group summary information
>     /dev/md2: 200231/5248992 files (1.4% non-contiguous), 1563304/10492432 blocks
> 
> I remounted the filesystem and all *seems* to be okay now.  I was curious
> what "directory #3616894" (inode 3619715) was, so I did 'find / -inum
> 3619715 -exec ls -dioF {} \;', but the output showed that that was a
> non-directory file created and last modified in 2004.  How could this be?

Note that the DIRECTORY is 3616894, and the entry within that directory
that was corrupted is 3619715.

> And what would cause an error like the above?  Am I out of the woods now, or
> is there more checking of some kind that I should do to make sure this isn't
> going to be happening again?

Given that there is no corruption on disk, I would put this toward some
kind of memory corruption.  It might be a single-bit error though, because
12 = 0xc and 132 = 0x84 so if you clear bit 0x80 from the name_len (leaving
a name_len = 4) it would be correct for a rec_len of 12.  Is the filename
4 characters long?

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



From coyli at suse.de  Fri Aug  3 17:54:48 2007
From: coyli at suse.de (Coly Li)
Date: Sat, 04 Aug 2007 01:54:48 +0800
Subject: Poor Performance WhenNumber of Files > 1M
In-Reply-To: <46B139A9.8070808@mail.arc.nasa.gov>
References: <46B139A9.8070808@mail.arc.nasa.gov>
Message-ID: <f8vqfi$2sm$1@sea.gmane.org>

How about the file size ? If the size is small, another performance kill should
be on disk inode layout. Because the order of access dentries of dir is probably
different from the order of allocating inodes in inode tables. This will make
much time wast on hard disk seeking for the first time.

Just FYI.

Coly

Sean McCauliff wrote:
> Hi all,
> 
> I plan on having about 100M files totaling about 8.5TiBytes.   To see
> how ext3 would perform with large numbers of files I've written a test
> program which creates a configurable number of files into a configurable
> number of directories, reads from those files, lists them and then
> deletes them.  Even up to 1M files ext3 seems to perform well and scale
> linearly; the time to execute the program on 1M files is about double
> the time it takes it to execute on .5M files.  But past 1M files it
> seems to have n^2 scalability.  Test details appear below.
> 
> Looking at the various options for ext3 nothing jumps out as the obvious
> one to use to improve performance.
> 
> Any recommendations?
> 
> Thanks!
> Sean
> 
> ------
> Parameter one is number of files, parameter two is number of directories
> to write into.
> 
> Dell MD3000 + LVM2. 2x7 10k rpm SAS disks 128k stripe RAID-0 for 3.8
> TiBytes of total storage.  Fedora Core 6 x86_64.  2xQuad Core Xeon.
> Default mount and ext3 options used.
> 
> [root at galaxy filestore]# time /soc/abyss/test/fileSystemTest.pl 10000 1000
> 
> real    0m1.054s
> user    0m0.128s
> sys     0m0.382s
> 
> [root at galaxy filestore]# time /soc/abyss/test/fileSystemTest.pl 1000000
> 1000
> 
> real    1m0.938s
> user    0m12.203s
> sys     0m40.358s
> [root at galaxy filestore]# time /soc/abyss/test/fileSystemTest.pl 10000000
> 1000
> 
> real    13m39.881s
> user    2m6.645s
> sys     7m26.665s
> [root at galaxy filestore]# time /soc/abyss/test/fileSystemTest.pl 20000000
> 1000
> 
> real    44m46.359s
> user    4m22.911s
> sys     17m2.792s



From smccauliff at mail.arc.nasa.gov  Fri Aug  3 18:15:37 2007
From: smccauliff at mail.arc.nasa.gov (Sean McCauliff)
Date: Fri, 03 Aug 2007 11:15:37 -0700
Subject: Poor Performance WhenNumber of Files > 1M
In-Reply-To: <6cd50f9f0708012211j3e065301nf6db714125d89538@mail.gmail.com>
References: <46B139A9.8070808@mail.arc.nasa.gov>
	<6cd50f9f0708012211j3e065301nf6db714125d89538@mail.gmail.com>
Message-ID: <46B370C9.8080801@mail.arc.nasa.gov>

It seems like my code has a bug where it would create ndirs = nfiles / 
nFilesPerDir.  So it was making more directories.   I thought that with 
dir_index option directory entry look up would be more like O(1) so that 
  scale up should be completely linear.

Multi level directory schemes seem to degrade performance more.

Sean

Stephen Samuel wrote:
> Searching for directories (to ensure no duplicates, etc) is going to
> be order N^2.
> 
> Size of the directory is likely to be a limiting factor.
> 
> Try increasing to 10000 directories (in two layors of 100 each).  I'll
> bet you that the result will be a pretty good increase in speed
> (getting back to the speeds that you had with 1M directories).
> 
> 
> On 8/1/07, Sean McCauliff <smccauliff at mail.arc.nasa.gov> wrote:
>> Hi all,
>>
>> I plan on having about 100M files totaling about 8.5TiBytes.   To see
>> how ext3 would perform with large numbers of files I've written a test
>> program which creates a configurable number of files into a configurable
>> number of directories, reads from those files, lists them and then
>> deletes them.  Even up to 1M files ext3 seems to perform well and scale
>> linearly; the time to execute the program on 1M files is about double
>> the time it takes it to execute on .5M files.  But past 1M files it
>> seems to have n^2 scalability.  Test details appear below.
>>
>> Looking at the various options for ext3 nothing jumps out as the obvious
>> one to use to improve performance.
>>
>> Any recommendations?
>>



From coyli at suse.de  Sat Aug  4 03:03:01 2007
From: coyli at suse.de (Coly Li)
Date: Sat, 04 Aug 2007 11:03:01 +0800
Subject: kernel: EXT3-fs: Unsupported filesystem blocksize 8192 on md0.
In-Reply-To: <46B1C648.3202.1058EE1A@Ulrich.Windl.rkdvmks1.ngate.uni-regensburg.de>
References: <46B1C648.3202.1058EE1A@Ulrich.Windl.rkdvmks1.ngate.uni-regensburg.de>
Message-ID: <f90qjg$m0d$1@sea.gmane.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

In ext3, it is better to make the filesystem block not large than page size. On
x86 the typical page size is 4KB.

Coly

Ulrich Windl wrote:
> Hi,
> 
> I made an ext3 filesystem with 8kB block size:
> # mkfs.ext3 -T largefile -v -b 8192 /dev/md0
> Warning: blocksize 8192 not usable on most systems.
> mke2fs 1.38 (30-Jun-2005)
> Filesystem label=
> OS type: Linux
> Block size=8192 (log=3)
> Fragment size=8192 (log=3)
> 148480 inodes, 18940704 blocks
> 947035 blocks (5.00%) reserved for the super user
> First data block=0
> 290 block groups
> 65528 blocks per group, 65528 fragments per group
> 512 inodes per group
> Superblock backups stored on blocks:
>         65528, 196584, 327640, 458696, 589752, 1638200, 1769256, 3210872,
>         5307768, 8191000, 15923304
> 
> Writing inode tables: done
> Creating journal (32768 blocks): done
> Writing superblocks and filesystem accounting information: done
> 
> This filesystem will be automatically checked every 22 mounts or
> 180 days, whichever comes first.  Use tune2fs -c or -i to override.
> 
> When mounting it as ext2 (by mistake), the kernel says:
> EXT2-fs warning (device md0): ext2_fill_super: mounting ext3 filesystem as ext2
> 
> When finally mounting the filesystem as ext3, the kernel says:
> kernel: EXT3-fs: Unsupported filesystem blocksize 8192 on md0.
> 
> I don't quite understand: About any larger filesystem of today can support blocks 
>> 4kB. What is the problem here?
> 
> Kernel is 2.6.16.46-0.12-default (SUSE SLES10 SP1) on IA64
> 
> Regards,
> Ulrich

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFGs+xOuTp8cyZ5lTERAt73AKC9oxc+H1zEH3A/iaSNAst8qYA/KACglDB4
nNKY6K8+cDbknOwIBECNvFQ=
=h4EV
-----END PGP SIGNATURE-----



From rsalmon74 at gmail.com  Wed Aug  8 20:21:05 2007
From: rsalmon74 at gmail.com (Rene Salmon)
Date: Wed, 8 Aug 2007 15:21:05 -0500
Subject: Poor ext3 performance on RAID array
Message-ID: <ea07cdee0708081321j52de4c45i2d3f45396097c748@mail.gmail.com>

Hi list,


I am having some strange performance issues with ext3 and I am hoping to
get some advice/hints on how to make this better.  First some background
on the setup.

We have a RAID 5 array 10+1+1 with one LUN.  That is 10 SATA drives one
parity drive and one dedicated spare.  The LUN is about 6.5TB.

Using a 2Gbit/sec fiber channel card I can do some dd writes to the raw
device and get speeds close to 200Mbytes/sec which is more or less the
max the card can do.

Next I create an xfs file system on the LUN and do a dd to xfs and get
speeds close to 150Mbytes/sec.

I want to use ext3 not xfs so next I put ext3 on the lun.  Now when I do
the dd to the ext3 lun I get 25-50Mbytes/sec depending on whether I have
the Raid controller cache turned on or off. Getting 50MBytes/sec with
the raid controller cache turned off.

I know that ext3 should perform better so I must be doing something
wrong.  Here is my mkfs.ext3

mkfs.ext3 -b4096 -Tlagefile4 /dev/dm-0

Thanks in advanced for any help on this.

Rene
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20070808/ae7020b2/attachment.htm>

From Ionel.Gardais at tech-advantage.com  Wed Aug  8 20:31:40 2007
From: Ionel.Gardais at tech-advantage.com (GARDAIS Ionel)
Date: Wed, 8 Aug 2007 22:31:40 +0200
Subject: =?iso-8859-1?q?RE=A0=3A_Poor_ext3_performance_on_RAID_array?=
References: <ea07cdee0708081321j52de4c45i2d3f45396097c748@mail.gmail.com>
Message-ID: <F0A97025ACD0234A928AF05B74060E1807CB93@MAILSERV.beicip.fr>

Hi Rene,

You should try to add the "-E stride=X" option to the mkfs command line.
Where X is expalined in the man page.

This will basically map ext3 "blocks" on the RAID stripe size.

Ionel


-------- Message d'origine--------
De: ext3-users-bounces at redhat.com de la part de Rene Salmon
Date: mer. 08/08/2007 22:21
?: ext3-users at redhat.com
Objet : Poor ext3 performance on RAID array
 
Hi list,


I am having some strange performance issues with ext3 and I am hoping to
get some advice/hints on how to make this better.  First some background
on the setup.

We have a RAID 5 array 10+1+1 with one LUN.  That is 10 SATA drives one
parity drive and one dedicated spare.  The LUN is about 6.5TB.

Using a 2Gbit/sec fiber channel card I can do some dd writes to the raw
device and get speeds close to 200Mbytes/sec which is more or less the
max the card can do.

Next I create an xfs file system on the LUN and do a dd to xfs and get
speeds close to 150Mbytes/sec.

I want to use ext3 not xfs so next I put ext3 on the lun.  Now when I do
the dd to the ext3 lun I get 25-50Mbytes/sec depending on whether I have
the Raid controller cache turned on or off. Getting 50MBytes/sec with
the raid controller cache turned off.

I know that ext3 should perform better so I must be doing something
wrong.  Here is my mkfs.ext3

mkfs.ext3 -b4096 -Tlagefile4 /dev/dm-0

Thanks in advanced for any help on this.

Rene

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20070808/90c9b82d/attachment.htm>

From rsalmon74 at gmail.com  Wed Aug  8 20:53:43 2007
From: rsalmon74 at gmail.com (Rene Salmon)
Date: Wed, 8 Aug 2007 15:53:43 -0500
Subject: Poor ext3 performance on RAID array
Message-ID: <ea07cdee0708081353o6c45aa7alf0c590feb939b61d@mail.gmail.com>

Hi,

Thanks for the reply. I tried using the -E stride=X  option as follows:

mkfs.ext3 -b4096 -Tlargefile4 -E stride=640 /dev/dm-1

and got the same results around 50 MBytes/sec.  Maybe I have the wrong
number for stride so here is the math for that:

                          stride=stripe-size
                          Configure the filesystem  for  a  RAID  array
with
                          stripe-size filesystem blocks per stripe.


Here is some more detail on the RAID array.

RAID level : 5 (10 drives + 1 parity)
Chunk Size : 256 KB
Stripe Size : 2560 KB (10 drives * 256KB)

stride=640 * 4096(byte blocks) = 2560KB

I will try other stride options but they don't seem to change much.

Thanks
Rene




On 8/8/07, GARDAIS Ionel <Ionel.Gardais at tech-advantage.com> wrote:
>
>  Hi Rene,
>
> You should try to add the "-E stride=X" option to the mkfs command line.
> Where X is expalined in the man page.
>
> This will basically map ext3 "blocks" on the RAID stripe size.
>
> Ionel
>
>
> -------- Message d'origine--------
> De: ext3-users-bounces at redhat.com de la part de Rene Salmon
> Date: mer. 08/08/2007 22:21
> ?: ext3-users at redhat.com
> Objet : Poor ext3 performance on RAID array
>
> Hi list,
>
>
> I am having some strange performance issues with ext3 and I am hoping to
> get some advice/hints on how to make this better.  First some background
> on the setup.
>
> We have a RAID 5 array 10+1+1 with one LUN.  That is 10 SATA drives one
> parity drive and one dedicated spare.  The LUN is about 6.5TB.
>
> Using a 2Gbit/sec fiber channel card I can do some dd writes to the raw
> device and get speeds close to 200Mbytes/sec which is more or less the
> max the card can do.
>
> Next I create an xfs file system on the LUN and do a dd to xfs and get
> speeds close to 150Mbytes/sec.
>
> I want to use ext3 not xfs so next I put ext3 on the lun.  Now when I do
> the dd to the ext3 lun I get 25-50Mbytes/sec depending on whether I have
> the Raid controller cache turned on or off. Getting 50MBytes/sec with
> the raid controller cache turned off.
>
> I know that ext3 should perform better so I must be doing something
> wrong.  Here is my mkfs.ext3
>
> mkfs.ext3 -b4096 -Tlagefile4 /dev/dm-0
>
> Thanks in advanced for any help on this.
>
> Rene
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20070808/80d23dae/attachment.htm>

From Ionel.Gardais at tech-advantage.com  Wed Aug  8 20:58:33 2007
From: Ionel.Gardais at tech-advantage.com (GARDAIS Ionel)
Date: Wed, 8 Aug 2007 22:58:33 +0200
Subject: =?iso-8859-1?q?RE=A0=3A_RE_=3A_Poor_ext3_performance_on_RAID_arr?=
	=?iso-8859-1?q?ay?=
References: <ea07cdee0708081353o6c45aa7alf0c590feb939b61d@mail.gmail.com>
Message-ID: <F0A97025ACD0234A928AF05B74060E1807CB94@MAILSERV.beicip.fr>

stride is the chunk size of your raid whatever the number of physical disks composing the RAID array.
So for 256k chunks with a block size of 4k, stride should be 256/4 = 64 instead of 640.

Maybe it will help.

Ionel


-------- Message d'origine--------
De: Rene Salmon [mailto:rsalmon74 at gmail.com]
Date: mer. 08/08/2007 22:53
?: GARDAIS Ionel
Cc: ext3-users at redhat.com
Objet : Re: RE : Poor ext3 performance on RAID array
 
Hi,

Thanks for the reply. I tried using the -E stride=X  option as follows:

mkfs.ext3 -b4096 -Tlargefile4 -E stride=640 /dev/dm-1

and got the same results around 50 MBytes/sec.  Maybe I have the wrong
number for stride so here is the math for that:

                          stride=stripe-size
                          Configure the filesystem  for  a  RAID  array
with
                          stripe-size filesystem blocks per stripe.


Here is some more detail on the RAID array.

RAID level : 5 (10 drives + 1 parity)
Chunk Size : 256 KB
Stripe Size : 2560 KB (10 drives * 256KB)

stride=640 * 4096(byte blocks) = 2560KB

I will try other stride options but they don't seem to change much.

Thanks
Rene




On 8/8/07, GARDAIS Ionel <Ionel.Gardais at tech-advantage.com> wrote:
>
>  Hi Rene,
>
> You should try to add the "-E stride=X" option to the mkfs command line.
> Where X is expalined in the man page.
>
> This will basically map ext3 "blocks" on the RAID stripe size.
>
> Ionel
>
>
> -------- Message d'origine--------
> De: ext3-users-bounces at redhat.com de la part de Rene Salmon
> Date: mer. 08/08/2007 22:21
> ?: ext3-users at redhat.com
> Objet : Poor ext3 performance on RAID array
>
> Hi list,
>
>
> I am having some strange performance issues with ext3 and I am hoping to
> get some advice/hints on how to make this better.  First some background
> on the setup.
>
> We have a RAID 5 array 10+1+1 with one LUN.  That is 10 SATA drives one
> parity drive and one dedicated spare.  The LUN is about 6.5TB.
>
> Using a 2Gbit/sec fiber channel card I can do some dd writes to the raw
> device and get speeds close to 200Mbytes/sec which is more or less the
> max the card can do.
>
> Next I create an xfs file system on the LUN and do a dd to xfs and get
> speeds close to 150Mbytes/sec.
>
> I want to use ext3 not xfs so next I put ext3 on the lun.  Now when I do
> the dd to the ext3 lun I get 25-50Mbytes/sec depending on whether I have
> the Raid controller cache turned on or off. Getting 50MBytes/sec with
> the raid controller cache turned off.
>
> I know that ext3 should perform better so I must be doing something
> wrong.  Here is my mkfs.ext3
>
> mkfs.ext3 -b4096 -Tlagefile4 /dev/dm-0
>
> Thanks in advanced for any help on this.
>
> Rene
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20070808/066c3151/attachment.htm>

From rsalmon74 at gmail.com  Wed Aug  8 21:31:14 2007
From: rsalmon74 at gmail.com (Rene Salmon)
Date: Wed, 8 Aug 2007 16:31:14 -0500
Subject: Poor ext3 performance on RAID array
Message-ID: <ea07cdee0708081431pd290230i8f859ab40d845358@mail.gmail.com>

Hi,

I think I found part of the problem.  Our RAID array vendor explained to me
that they have a problem of not getting enough data from ext3 to do full
stripe writes when ext3 issues an cache flush command.

Basically something to do with the journaling and cache flushing.  To test
this I replaced ext3 with ext2 and now I get the expected results:

mkfs.ext2 -Tlargefile4 -b4096 /dev/dm-1

dd if=/dev/zero of=/mnt/test bs=1024K count=1024 conv=fsync
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 6.85545 seconds, 157 MB/s

So now I think I will just try to just have ext3 journal to a different
device.  See if that helps any.

Thanks
Rene




On 8/8/07, GARDAIS Ionel <Ionel.Gardais at tech-advantage.com> wrote:
>
>  stride is the chunk size of your raid whatever the number of physical
> disks composing the RAID array.
> So for 256k chunks with a block size of 4k, stride should be 256/4 = 64
> instead of 640.
>
> Maybe it will help.
>
> Ionel
>
>
> -------- Message d'origine--------
> De: Rene Salmon [mailto:rsalmon74 at gmail.com <rsalmon74 at gmail.com>]
> Date: mer. 08/08/2007 22:53
> ?: GARDAIS Ionel
> Cc: ext3-users at redhat.com
> Objet : Re: RE : Poor ext3 performance on RAID array
>
> Hi,
>
> Thanks for the reply. I tried using the -E stride=X  option as follows:
>
> mkfs.ext3 -b4096 -Tlargefile4 -E stride=640 /dev/dm-1
>
> and got the same results around 50 MBytes/sec.  Maybe I have the wrong
> number for stride so here is the math for that:
>
>                           stride=stripe-size
>                           Configure the filesystem  for  a  RAID  array
> with
>                           stripe-size filesystem blocks per stripe.
>
>
> Here is some more detail on the RAID array.
>
> RAID level : 5 (10 drives + 1 parity)
> Chunk Size : 256 KB
> Stripe Size : 2560 KB (10 drives * 256KB)
>
> stride=640 * 4096(byte blocks) = 2560KB
>
> I will try other stride options but they don't seem to change much.
>
> Thanks
> Rene
>
>
>
>
> On 8/8/07, GARDAIS Ionel <Ionel.Gardais at tech-advantage.com> wrote:
> >
> >  Hi Rene,
> >
> > You should try to add the "-E stride=X" option to the mkfs command line.
> > Where X is expalined in the man page.
> >
> > This will basically map ext3 "blocks" on the RAID stripe size.
> >
> > Ionel
> >
> >
> > -------- Message d'origine--------
> > De: ext3-users-bounces at redhat.com de la part de Rene Salmon
> > Date: mer. 08/08/2007 22:21
> > ?: ext3-users at redhat.com
> > Objet : Poor ext3 performance on RAID array
> >
> > Hi list,
> >
> >
> > I am having some strange performance issues with ext3 and I am hoping to
> > get some advice/hints on how to make this better.  First some background
> > on the setup.
> >
> > We have a RAID 5 array 10+1+1 with one LUN.  That is 10 SATA drives one
> > parity drive and one dedicated spare.  The LUN is about 6.5TB.
> >
> > Using a 2Gbit/sec fiber channel card I can do some dd writes to the raw
> > device and get speeds close to 200Mbytes/sec which is more or less the
> > max the card can do.
> >
> > Next I create an xfs file system on the LUN and do a dd to xfs and get
> > speeds close to 150Mbytes/sec.
> >
> > I want to use ext3 not xfs so next I put ext3 on the lun.  Now when I do
> > the dd to the ext3 lun I get 25-50Mbytes/sec depending on whether I have
> > the Raid controller cache turned on or off. Getting 50MBytes/sec with
> > the raid controller cache turned off.
> >
> > I know that ext3 should perform better so I must be doing something
> > wrong.  Here is my mkfs.ext3
> >
> > mkfs.ext3 -b4096 -Tlagefile4 /dev/dm-0
> >
> > Thanks in advanced for any help on this.
> >
> > Rene
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20070808/ad8b8b46/attachment.htm>

From adilger at clusterfs.com  Wed Aug  8 23:21:21 2007
From: adilger at clusterfs.com (Andreas Dilger)
Date: Wed, 8 Aug 2007 17:21:21 -0600
Subject: Poor ext3 performance on RAID array
In-Reply-To: <ea07cdee0708081321j52de4c45i2d3f45396097c748@mail.gmail.com>
References: <ea07cdee0708081321j52de4c45i2d3f45396097c748@mail.gmail.com>
Message-ID: <20070808232121.GW6689@schatzie.adilger.int>

On Aug 08, 2007  15:21 -0500, Rene Salmon wrote:
> I am having some strange performance issues with ext3 and I am hoping to
> get some advice/hints on how to make this better.  First some background
> on the setup.
> 
> We have a RAID 5 array 10+1+1 with one LUN.  That is 10 SATA drives one
> parity drive and one dedicated spare.  The LUN is about 6.5TB.
> 
> Using a 2Gbit/sec fiber channel card I can do some dd writes to the raw
> device and get speeds close to 200Mbytes/sec which is more or less the
> max the card can do.
> 
> Next I create an xfs file system on the LUN and do a dd to xfs and get
> speeds close to 150Mbytes/sec.
> 
> I want to use ext3 not xfs so next I put ext3 on the lun.  Now when I do
> the dd to the ext3 lun I get 25-50Mbytes/sec depending on whether I have
> the Raid controller cache turned on or off. Getting 50MBytes/sec with
> the raid controller cache turned off.
> 
> I know that ext3 should perform better so I must be doing something
> wrong.  Here is my mkfs.ext3
> 
> mkfs.ext3 -b4096 -Tlagefile4 /dev/dm-0

You could also try out ext4, that's where the real performance improvements
are...

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



From bruno at wolff.to  Thu Aug  9 03:52:41 2007
From: bruno at wolff.to (Bruno Wolff III)
Date: Wed, 8 Aug 2007 22:52:41 -0500
Subject: Poor Performance WhenNumber of Files > 1M
In-Reply-To: <46B139A9.8070808@mail.arc.nasa.gov>
References: <46B139A9.8070808@mail.arc.nasa.gov>
Message-ID: <20070809035241.GB26169@wolff.to>

On Wed, Aug 01, 2007 at 18:55:53 -0700,
  Sean McCauliff <smccauliff at mail.arc.nasa.gov> wrote:
> Hi all,
> 
> I plan on having about 100M files totaling about 8.5TiBytes.   To see 
> how ext3 would perform with large numbers of files I've written a test 
> program which creates a configurable number of files into a configurable 
> number of directories, reads from those files, lists them and then 
> deletes them.  Even up to 1M files ext3 seems to perform well and scale 
> linearly; the time to execute the program on 1M files is about double 
> the time it takes it to execute on .5M files.  But past 1M files it 
> seems to have n^2 scalability.  Test details appear below.
> 
> Looking at the various options for ext3 nothing jumps out as the obvious 
> one to use to improve performance.
> 
> Any recommendations?

Did you make sure directory indexing is available? I think that is the
default now for ext3, but maybe it wasn't turned on for your test.



From smccauliff at mail.arc.nasa.gov  Thu Aug  9 19:02:31 2007
From: smccauliff at mail.arc.nasa.gov (Sean McCauliff)
Date: Thu, 09 Aug 2007 12:02:31 -0700
Subject: Poor Performance WhenNumber of Files > 1M
In-Reply-To: <20070809035241.GB26169@wolff.to>
References: <46B139A9.8070808@mail.arc.nasa.gov>
	<20070809035241.GB26169@wolff.to>
Message-ID: <46BB64C7.7090802@mail.arc.nasa.gov>

dumpe2fs reports that dir_index option is enabled.  But thank you for 
the suggestion.

Sean

Bruno Wolff III wrote:
> On Wed, Aug 01, 2007 at 18:55:53 -0700,
>   Sean McCauliff <smccauliff at mail.arc.nasa.gov> wrote:
>> Hi all,
>>
>> I plan on having about 100M files totaling about 8.5TiBytes.   To see 
>> how ext3 would perform with large numbers of files I've written a test 
>> program which creates a configurable number of files into a configurable 
>> number of directories, reads from those files, lists them and then 
>> deletes them.  Even up to 1M files ext3 seems to perform well and scale 
>> linearly; the time to execute the program on 1M files is about double 
>> the time it takes it to execute on .5M files.  But past 1M files it 
>> seems to have n^2 scalability.  Test details appear below.
>>
>> Looking at the various options for ext3 nothing jumps out as the obvious 
>> one to use to improve performance.
>>
>> Any recommendations?
> 
> Did you make sure directory indexing is available? I think that is the
> default now for ext3, but maybe it wasn't turned on for your test.
> 



From adilger at clusterfs.com  Thu Aug  9 19:51:55 2007
From: adilger at clusterfs.com (Andreas Dilger)
Date: Thu, 9 Aug 2007 13:51:55 -0600
Subject: Poor Performance WhenNumber of Files > 1M
In-Reply-To: <46BB64C7.7090802@mail.arc.nasa.gov>
References: <46B139A9.8070808@mail.arc.nasa.gov>
	<20070809035241.GB26169@wolff.to>
	<46BB64C7.7090802@mail.arc.nasa.gov>
Message-ID: <20070809195155.GZ6689@schatzie.adilger.int>

Sean McCauliff <smccauliff at mail.arc.nasa.gov> wrote:
>I plan on having about 100M files totaling about 8.5TiBytes.   To see 
>how ext3 would perform with large numbers of files I've written a test 
>program which creates a configurable number of files into a configurable 
>number of directories, reads from those files, lists them and then 
>deletes them.  Even up to 1M files ext3 seems to perform well and scale 
>linearly; the time to execute the program on 1M files is about double 
>the time it takes it to execute on .5M files.  But past 1M files it 
>seems to have n^2 scalability.  Test details appear below.
>
>Looking at the various options for ext3 nothing jumps out as the obvious 
>one to use to improve performance.

Try increasing your journal size (mke2fs -J size=400), and having a lot
of RAM.

When you say "having about 100M files", does that mean "need to be
constantly accessing 100M files" or just "need to store a total of
100M files in this filesystem"?

The former means you need to keep the whole working set in RAM for
maximum performance, about 100M * (128 + 32) = 19GB of RAM.  The
latter is no problem, we have ext3 filesystems with > 250M files
in them.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



From smccauliff at mail.arc.nasa.gov  Thu Aug  9 20:04:03 2007
From: smccauliff at mail.arc.nasa.gov (Sean McCauliff)
Date: Thu, 09 Aug 2007 13:04:03 -0700
Subject: Poor Performance WhenNumber of Files > 1M
In-Reply-To: <20070809195155.GZ6689@schatzie.adilger.int>
References: <46B139A9.8070808@mail.arc.nasa.gov>
	<20070809035241.GB26169@wolff.to>
	<46BB64C7.7090802@mail.arc.nasa.gov>
	<20070809195155.GZ6689@schatzie.adilger.int>
Message-ID: <46BB7333.3010602@mail.arc.nasa.gov>


> Try increasing your journal size (mke2fs -J size=400), and having a lot
> of RAM.
> 
> When you say "having about 100M files", does that mean "need to be
> constantly accessing 100M files" or just "need to store a total of
> 100M files in this filesystem"?
Likely only 10M will be accessed at any time.

> 
> The former means you need to keep the whole working set in RAM for
> maximum performance, about 100M * (128 + 32) = 19GB of RAM.  The
> latter is no problem, we have ext3 filesystems with > 250M files
> in them.
The system has 16G of RAM; getting 32G in the future is a possibility. 
  Where do you get 128 + 32 from?  Is 128 the inode size?  this is 
running a 64bit os.  Does that change the memory requirements?

Thanks, I will try your suggestion.

Sean




From adilger at clusterfs.com  Thu Aug  9 22:12:49 2007
From: adilger at clusterfs.com (Andreas Dilger)
Date: Thu, 9 Aug 2007 16:12:49 -0600
Subject: Poor Performance WhenNumber of Files > 1M
In-Reply-To: <46BB7333.3010602@mail.arc.nasa.gov>
References: <46B139A9.8070808@mail.arc.nasa.gov>
	<20070809035241.GB26169@wolff.to>
	<46BB64C7.7090802@mail.arc.nasa.gov>
	<20070809195155.GZ6689@schatzie.adilger.int>
	<46BB7333.3010602@mail.arc.nasa.gov>
Message-ID: <20070809221249.GC6689@schatzie.adilger.int>

On Aug 09, 2007  13:04 -0700, Sean McCauliff wrote:
> >When you say "having about 100M files", does that mean "need to be
> >constantly accessing 100M files" or just "need to store a total of
> >100M files in this filesystem"?
> Likely only 10M will be accessed at any time.

If you can structure it so the 10M files that will be accessed together
are stored to disk together, then your application will work better,
no matter what the filesystem.

> >The former means you need to keep the whole working set in RAM for
> >maximum performance, about 100M * (128 + 32) = 19GB of RAM.  The
> >latter is no problem, we have ext3 filesystems with > 250M files
> >in them.

> The system has 16G of RAM; getting 32G in the future is a possibility. 
> Where do you get 128 + 32 from?  Is 128 the inode size?  this is 
> running a 64bit os.  Does that change the memory requirements?

128 = inode size, 32 = directory entry size.  there will be other overhead
as well, but this will get you into the right ballpark.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



From rd at powerset.com  Tue Aug 14 18:08:28 2007
From: rd at powerset.com (Ryan Dooley)
Date: Tue, 14 Aug 2007 11:08:28 -0700
Subject: unlink performance
Message-ID: <84E2AE771361E9419DD0EFBD31F09C4D48356CF8B0@EXVMBX015-1.exch015.msoutlookonline.net>

Greetings,

I'm looking for clues about speeding up unlink performance.  I have an ext3 file system (well, across several machines) that have a temp directory.  The application needs to clean up the temp directory before the next run of the application begins and the engineers want to clean out this temp directory  "quickly".

I have no hard numbers yet on what they are seeing but the contents of the directory involves "many files" and "many directories" of various sizes.

The file system in question is mounted with noatime,nodiratime with the following filesystem features:

 has_journal, resize_inode, dir_index, filetype, needs_recovery, sparse_super and large_file

The operating system is Fedora Core 6 with fedora's 2.6.20-1.2933 kernel.  The mounted file system is about 1.2TB in size and is a software raid-5 over four, 7200 rpm SATA disks.  The disks were formatted with all the defaults.

Pre-loading the file system cache (a la "find /path/to/temp -type f -print >/dev/null") followed by an "rm -rf /path/to/temp" seems to be pretty speedy to me.

Any other suggestions of things I can experiment with to build performance numbers?

Cheers,
Ryan



From mnalis-ml at voyager.hr  Tue Aug 14 23:43:21 2007
From: mnalis-ml at voyager.hr (Matija Nalis)
Date: Wed, 15 Aug 2007 01:43:21 +0200
Subject: unlink performance
In-Reply-To: <84E2AE771361E9419DD0EFBD31F09C4D48356CF8B0@EXVMBX015-1.exch015.msoutlookonline.net>
References: <84E2AE771361E9419DD0EFBD31F09C4D48356CF8B0@EXVMBX015-1.exch015.msoutlookonline.net>
Message-ID: <20070814234321.GA4179@eagle102.home.lan>

On Tue, Aug 14, 2007 at 11:08:28AM -0700, Ryan Dooley wrote:
> I'm looking for clues about speeding up unlink performance.  I have an
> ext3 file system (well, across several machines) that have a temp
> directory.  The application needs to clean up the temp directory before
> the next run of the application begins and the engineers want to clean out
> this temp directory "quickly".

Depending on your usage (and not dependant of specifics of filesystem), you 
might be able to get away with:

mv /path/to/temp /path/to/temp.old
mkdir /path/to/temp
rm -rf /path/to/temp.old &

as a workaround. in that way, the new run which fills the '/path/to/temp'
can commence practically without any delay even while old data is still
being removed, without any clash.

> The file system in question is mounted with noatime,nodiratime with the
> following filesystem features:
> 
>  has_journal, resize_inode, dir_index, filetype, needs_recovery, sparse_super and large_file

What is the journal size ? (echo 'stat <8>' | debugfs /dev/md0)
You can try increasing it. How much RAM is in the machine ?

> The operating system is Fedora Core 6 with fedora's 2.6.20-1.2933 kernel. 
> The mounted file system is about 1.2TB in size and is a software raid-5
> over four, 7200 rpm SATA disks.  The disks were formatted with all the
> defaults.

RAID5 would NOT be fastest choice for writes (including deletes), especially
depending on the stripe size, and is dependant on "mke2fs -E stride" option.

> Pre-loading the file system cache (a la "find /path/to/temp -type f -print
> >/dev/null") followed by an "rm -rf /path/to/temp" seems to be pretty
> speedy to me.

Then there is no problem, yes ? :)

-- 
Opinions above are GNU-copylefted.



From nigel.metheringham at dev.intechnology.co.uk  Wed Aug 15 08:13:27 2007
From: nigel.metheringham at dev.intechnology.co.uk (Nigel Metheringham)
Date: Wed, 15 Aug 2007 09:13:27 +0100
Subject: unlink performance
In-Reply-To: <84E2AE771361E9419DD0EFBD31F09C4D48356CF8B0@EXVMBX015-1.exch015.msoutlookonline.net>
References: <84E2AE771361E9419DD0EFBD31F09C4D48356CF8B0@EXVMBX015-1.exch015.msoutlookonline.net>
Message-ID: <2D70CC67-EE54-484A-96E0-51C6CF1B6590@dev.intechnology.co.uk>


On 14 Aug 2007, at 19:08, Ryan Dooley wrote:
> Pre-loading the file system cache (a la "find /path/to/temp -type f  
> -print >/dev/null") followed by an "rm -rf /path/to/temp" seems to  
> be pretty speedy to me.

Do you mean that:-
	find /path/to/temp -type f -print >/dev/null
	rm -rf /path/to/temp
is faster than just
	rm -rf /path/to/temp

or do you mean that you have arranged to do the find before the point  
where you want to delete?  If the former, that surprises me somewhat.

	Nigel.
--
[ Nigel Metheringham           Nigel.Metheringham at InTechnology.co.uk ]
[ - Comments in this message are my own and not ITO opinion/policy - ]




From rd at powerset.com  Wed Aug 15 17:12:31 2007
From: rd at powerset.com (Ryan Dooley)
Date: Wed, 15 Aug 2007 10:12:31 -0700
Subject: unlink performance
In-Reply-To: <2D70CC67-EE54-484A-96E0-51C6CF1B6590@dev.intechnology.co.uk>
References: <84E2AE771361E9419DD0EFBD31F09C4D48356CF8B0@EXVMBX015-1.exch015.msoutlookonline.net>
	<2D70CC67-EE54-484A-96E0-51C6CF1B6590@dev.intechnology.co.uk>
Message-ID: <84E2AE771361E9419DD0EFBD31F09C4D48356CFA4D@EXVMBX015-1.exch015.msoutlookonline.net>

Actually I was curious if it would make any difference at all.

The only two "benchmarks" (really they are not) that I have are two different attempts:

  find /path/to/temp -type f -exec rm {} \;

This in about 18.39 seconds.

  rm -rf /path/to/temp

That finished in 16.43 seconds so you're assumption that the find didn't actually help anything is true.

What I don't have handy is how big those temp directories were or how many files were included (and what size those files were).

Now probably wondering why I can't wait 18-20 seconds but this was just a small test case.  The data set will be much larger in normal cases.

Cheers,
Ryan

-----Original Message-----
From: Nigel Metheringham [mailto:nigel.metheringham at dev.intechnology.co.uk]
Sent: Wednesday, August 15, 2007 1:13 AM
To: Ryan Dooley
Cc: ext3-users at redhat.com
Subject: Re: unlink performance


On 14 Aug 2007, at 19:08, Ryan Dooley wrote:
> Pre-loading the file system cache (a la "find /path/to/temp -type f
> -print >/dev/null") followed by an "rm -rf /path/to/temp" seems to
> be pretty speedy to me.

Do you mean that:-
        find /path/to/temp -type f -print >/dev/null
        rm -rf /path/to/temp
is faster than just
        rm -rf /path/to/temp

or do you mean that you have arranged to do the find before the point
where you want to delete?  If the former, that surprises me somewhat.

        Nigel.
--
[ Nigel Metheringham           Nigel.Metheringham at InTechnology.co.uk ]
[ - Comments in this message are my own and not ITO opinion/policy - ]





From darkonc at gmail.com  Thu Aug 16 14:10:22 2007
From: darkonc at gmail.com (Stephen Samuel)
Date: Thu, 16 Aug 2007 07:10:22 -0700
Subject: unlink performance
In-Reply-To: <84E2AE771361E9419DD0EFBD31F09C4D48356CFA4D@EXVMBX015-1.exch015.msoutlookonline.net>
References: <84E2AE771361E9419DD0EFBD31F09C4D48356CF8B0@EXVMBX015-1.exch015.msoutlookonline.net>
	<2D70CC67-EE54-484A-96E0-51C6CF1B6590@dev.intechnology.co.uk>
	<84E2AE771361E9419DD0EFBD31F09C4D48356CFA4D@EXVMBX015-1.exch015.msoutlookonline.net>
Message-ID: <6cd50f9f0708160710o1bfbfee4ob97251e59f12d7b9@mail.gmail.com>

I think that the mv / mkdir is going to be your best move...
If you can't move the whole directory, try:


mkdir /tmp2
mv /tmp/{*,.??*} /tmp2
(restart the application)
rm -rf /tmp2/* /tmp2/.??*

(it depends, of course, on the two directories being on the same partition).

That approach especially works if the bulk of your files are in
sub-directories  You'll only be doing work on the inodes directly in the
main directory, and then you can take your time deleting the subdirectories
from /tmp2 while your app runs.


I'm looking for clues about speeding up unlink performance.  I have an ext3
> file system (well, across several machines) that have a temp directory.  The
> application needs to clean up the temp directory before the next run of the
> application begins and the engineers want to clean out this temp directory
>  "quickly".
>

-- 
Stephen Samuel http://www.bcgreen.com
778-861-7641
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20070816/90de8c62/attachment.htm>

From rd at powerset.com  Mon Aug 20 15:11:20 2007
From: rd at powerset.com (Ryan Dooley)
Date: Mon, 20 Aug 2007 08:11:20 -0700
Subject: unlink performance
In-Reply-To: <6cd50f9f0708160710o1bfbfee4ob97251e59f12d7b9@mail.gmail.com>
Message-ID: <C2EEFD28.F71E%rd@powerset.com>

On 8/16/07 7:10 AM, "Stephen Samuel" <darkonc at gmail.com> wrote:

I think that the mv / mkdir is going to be your best move...
If you can't move the whole directory, try:


mkdir /tmp2
mv /tmp/{*,.??*} /tmp2
(restart the application)
rm -rf /tmp2/* /tmp2/.??*

(it depends, of course, on the two directories being on the same partition).

That approach especially works if the bulk of your files are in sub-directories  You'll only be doing work on the inodes directly in the main directory, and then you can take your time deleting the subdirectories from /tmp2 while your app runs.

I'll give that a shot and see what happens.  Thanks!

Cheers,
Ryan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20070820/4c450191/attachment.htm>

From beevis at libero.it  Tue Aug 28 13:29:46 2007
From: beevis at libero.it (beevis at libero.it)
Date: Tue, 28 Aug 2007 15:29:46 +0200
Subject: Reserved space
Message-ID: <JNHK5M$7D4F60356F3D6F01639338098F6E441C@libero.it>

Hello list,
I have a doubt. When creating an ext3 fs, 5% of its space is reserved to the superuser.
I understand this should be important for /, maybe /var and /tmp.
But is it compulsory for other fs, like, say, an external disk with data?
Or it's just a heritage, no more needed? Could one safely reclaim this 5%?
I understand that no other fs (jfs, xfs, reiser) reserves some space.
Thanks for clarification



From tytso at mit.edu  Tue Aug 28 14:28:42 2007
From: tytso at mit.edu (Theodore Tso)
Date: Tue, 28 Aug 2007 10:28:42 -0400
Subject: Reserved space
In-Reply-To: <JNHK5M$7D4F60356F3D6F01639338098F6E441C@libero.it>
References: <JNHK5M$7D4F60356F3D6F01639338098F6E441C@libero.it>
Message-ID: <20070828142842.GC31120@thunk.org>

On Tue, Aug 28, 2007 at 03:29:46PM +0200, beevis at libero.it wrote:
> Hello list,
> I have a doubt. When creating an ext3 fs, 5% of its space is reserved to the superuser.
> I understand this should be important for /, maybe /var and /tmp.
> But is it compulsory for other fs, like, say, an external disk with data?
> Or it's just a heritage, no more needed? Could one safely reclaim this 5%?
> I understand that no other fs (jfs, xfs, reiser) reserves some space.

You can, but the performance of the filesystem will go down as you use
the last 5%, especially if the filesystem is dynamic and constantly
changing, since it will cause the files to become very badly
fragmented.  UFS historically used 10% for its anti-fragmentation
reserve.  With ext3 we decreased it to 5%.

If the filesystem is going to be essentially static after you fill it
up, sure you can reduce it down to 0%.  But if the filesystem is going
to be continuously active, you will get better performance by buying a
bigger hard drive and using a filesystem with an average utilization
of 50-80% than one which is hovering between 99-100%.  Aside from
spending 100-200 Euro's on extra memory, speading 100-200 Euro's on a
newer, bigger hard drive can be one of the easist and cheapest way to
improve the performance of your system.

Regards,

					- Ted



From beevis at libero.it  Tue Aug 28 18:49:37 2007
From: beevis at libero.it (beevis at libero.it)
Date: Tue, 28 Aug 2007 20:49:37 +0200
Subject: Reserved space
Message-ID: <JNHYYP$21A780D1278344EA71ECEE5A50BFD855@libero.it>

> Hello list,
> I have a doubt. When creating an ext3 fs, 5% of its space is reserved to the superuser.
> I understand this should be important for /, maybe /var and /tmp.
> But is it compulsory for other fs, like, say, an external disk with data?
> Or it's just a heritage, no more needed? Could one safely reclaim this 5%?
> I understand that no other fs (jfs, xfs, reiser) reserves some space.

You can, but the performance of the filesystem will go down as you use
the last 5%, especially if the filesystem is dynamic and constantly
changing, since it will cause the files to become very badly
fragmented.  UFS historically used 10% for its anti-fragmentation
reserve.  With ext3 we decreased it to 5%.

If the filesystem is going to be essentially static after you fill it
up, sure you can reduce it down to 0%.  But if the filesystem is going
to be continuously active, you will get better performance by buying a
bigger hard drive and using a filesystem with an average utilization
of 50-80% than one which is hovering between 99-100%.  Aside from
spending 100-200 Euro's on extra memory, speading 100-200 Euro's on a
newer, bigger hard drive can be one of the easist and cheapest way to
improve the performance of your system.

Regards,

                                        - Ted

Thanks for clarification. So I understand this 5% is reserved in order to prevent fragmentation.



From tytso at mit.edu  Tue Aug 28 21:59:44 2007
From: tytso at mit.edu (Theodore Tso)
Date: Tue, 28 Aug 2007 17:59:44 -0400
Subject: Reserved space
In-Reply-To: <JNHYYP$21A780D1278344EA71ECEE5A50BFD855@libero.it>
References: <JNHYYP$21A780D1278344EA71ECEE5A50BFD855@libero.it>
Message-ID: <20070828215944.GD31120@thunk.org>

On Tue, Aug 28, 2007 at 08:49:37PM +0200, beevis at libero.it wrote:
> Thanks for clarification. So I understand this 5% is reserved in
> order to prevent fragmentation.

If you want to be pedantic, to allow ext3's anti-fragmentation
algorithsm to work more efficiently.  It will not (by a long shot!)
completely remove fragmentation, but rather, fragmentation will
increase as you use last 5-10% of the filesystem.  We arbitrarily set
5% as the reserve; UFS (as used in Solaris, BSD, and many other
historical Unix systems) set the reserve at 10%.

	   		     	 	 - Ted




From jpiszcz at lucidpixels.com  Tue Aug 28 22:45:56 2007
From: jpiszcz at lucidpixels.com (Justin Piszcz)
Date: Tue, 28 Aug 2007 18:45:56 -0400 (EDT)
Subject: Reserved space
In-Reply-To: <20070828215944.GD31120@thunk.org>
References: <JNHYYP$21A780D1278344EA71ECEE5A50BFD855@libero.it>
	<20070828215944.GD31120@thunk.org>
Message-ID: <Pine.LNX.4.64.0708281845300.11171@p34.internal.lan>



On Tue, 28 Aug 2007, Theodore Tso wrote:

> On Tue, Aug 28, 2007 at 08:49:37PM +0200, beevis at libero.it wrote:
>> Thanks for clarification. So I understand this 5% is reserved in
>> order to prevent fragmentation.
>
> If you want to be pedantic, to allow ext3's anti-fragmentation
> algorithsm to work more efficiently.  It will not (by a long shot!)
> completely remove fragmentation, but rather, fragmentation will
> increase as you use last 5-10% of the filesystem.  We arbitrarily set
> 5% as the reserve; UFS (as used in Solaris, BSD, and many other
> historical Unix systems) set the reserve at 10%.
>
> 	   		     	 	 - Ted
>
>
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users
>

Does anyone have any metrics as to how bad it gets for 1% 2% 3% 4% and 5% 
reservations?  The difference between 1% and 5% can be a lot with 1TB 
drives etc.

Justin.



From raghuprasath at yahoo.com  Wed Aug 29 10:02:06 2007
From: raghuprasath at yahoo.com (Kannan Raghuprasath)
Date: Wed, 29 Aug 2007 03:02:06 -0700 (PDT)
Subject: ext3-fs error with RAID 5 Array.
Message-ID: <564417.24036.qm@web34711.mail.mud.yahoo.com>

Hi,

I have a Fedora core 4 machine (kernel-
2.6.11-1.1369_FC4smp) conneted to external DAS using
Ultra 320 SCSI controller card. The DAS is configured
as RAID 5 of 3.5 TB. I partitioned this array into two
partitions of size 1.9TB and 1.6 TB. These are
assigned to 2 different LUNS so that these two appear
as two partitions in the my Linux machine. 

I use this DAS for backup and after backing up for 30
hours i observe that partition gets remounted as
read-only with following error message in dmesg,

EXT3-fs error (device sdb1): ext3_add_entry: bad entry
in directory #2: directory entry across blocks -
offset=1080, inode=135216, rec_len=4132, name_len=25
Aborting journal on device sdb1.
ext3_abort called.
EXT3-fs error (device sdb1): ext3_journal_start_sb:
Detected aborted journal
Remounting filesystem read-only
EXT3-fs error (device sdb1) in start_transaction:
Journal has aborted
EXT3-fs error (device sdb1) in ext3_create: IO failure
EXT3-fs error (device sdb1): ext3_readdir: bad entry
in directory #2: directory entry across blocks -
offset=1080, inode=135216, rec_len=4132, name_len=25
EXT3-fs error (device sdb1): ext3_readdir: bad entry
in directory #2: directory entry across blocks -
offset=1080, inode=135216, rec_len=4132, name_len=25
EXT3-fs error (device sdb1): ext3_readdir: bad entry
in directory #2: directory entry across blocks -
offset=1080, inode=135216, rec_len=4132, name_len=25
EXT3-fs error (device sdb1): ext3_readdir: bad entry
in directory #2: directory entry across blocks -
offset=1080, inode=135216, rec_len=4132, name_len=25
EXT3-fs error (device sdb1): ext3_readdir: bad entry
in directory #2: directory entry across blocks -
offset=1080, inode=135216, rec_len=4132, name_len=25


Any suggestions on how to solve this problem?

Best regards and thanks in advance for your help,

Raghu



       
____________________________________________________________________________________
Building a website is a piece of cake. Yahoo! Small Business gives you all the tools to get online.
http://smallbusiness.yahoo.com/webhosting