[Linux-cluster] Directories with >100K files
nick at javacat.f2s.com
nick at javacat.f2s.com
Wed Jan 21 10:32:02 UTC 2009
Quoting Steven Whitehouse <swhiteho at redhat.com>:
> On Tue, 2009-01-20 at 22:32 -0500, Jeff Sturm wrote:
> > > -----Original Message-----
> > > From: linux-cluster-bounces at redhat.com
> > > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of
> > > nick at javacat.f2s.com
> > > Sent: Tuesday, January 20, 2009 5:19 AM
> > > To: linux-cluster at redhat.com
> > > Subject: [Linux-cluster] Directories with >100K files
> > >
> > > We have a GFS filesystem mounted over iSCSI. When doing an
> > > 'ls' on directories with several thousand files it takes
> > > around 10 minutes to get a response back -
> > You don't say how many nodes you have, or anything about your
> > networking.
> > Some general pointers:
> > - A plain "ls" is probably much faster any variant that fetches inode
> > metatdata, e.g. "ls -l". The latter performs a stat() on each
> > individual file which in turn triggers locking activity of some sort.
> > This is known to be slow on GFS1. (I've heard reports that GFS2 is/will
> > be better.)
> The latest gfs1 is also much better. It is a tricky thing to do
> efficiently, and not doing the stats is a good plan.
> > - You want a fast, reliable low-latency network for your cluster. Intel
> > GigE cards and a fast switch are a good bet.
> > - Unless your application needs access times or quota support, mounting
> > with "noquota,noatime" is a good idea. Maybe also "nodiratime".
> > > Can anyone recommend any GFS tunables to help us out here ?
> > You could try bumping demote_secs up from its default of 5 minutes.
> > That'll cause locks to be held longer so they may not need to be
> > reacquired so often. It won't help with the initial directory listing,
> > but should help on subsequent invocations.
> > In your case, with "ls" taking 8 minutes to run, some locks initially
> > acuired during execution of the command have already been demoted once
> > complete.
> Also the question to ask is how many nodes are accessing this
> filesystem? If more than one node is accessing the same directory and at
> least one of those does a write (i.e. inode create/delete) within the
> demote_secs time, then the demote_secs time will not make much
> difference since the locks will be pushed out by the other node's access
We all 4 nodes in our test env and 5 in our prod env.
The directory structure is as follows:
[root at finapp4 ~]# cd /apps/prod/prodcomn/admin/
[root at finapp4 admin]# ls
inbound install log out outbound scripts trace
[root at finapp4 admin]# ls log/ out/
PROD_finapp1 PROD_finapp2 PROD_finapp3 PROD_finapp4 PROD_finapp5 WFSC_oracleprod
o14679499.out o14798714.out PROD_finapp2 PROD_finapp4 WFSC_oracleprod
o14698655.out PROD_finapp1 PROD_finapp3 PROD_finapp5
The WFSC_oracleprod dirs in both the log/ and the out/ directories each contain over 120,000 small files.
This WFSC_oracleprod dir will be accessed by all cluster members for both read and write operations.
If it help to make it any clearer these servers are clustered Oracle Applications servers running concurrent managers.
> > > Should we set statfs_fast to 1 ?
> > Probably good to set this, regardless.
> > > What about glock_purge ?
> > Glock_purge helps limit CPU time consumed by gfs_scand when a large
> > number of unused glocks are present. See
> > http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4
> > . This may make your system run better but I'm not sure it's going to
> > help with listing your giant directories.
> Better to disable this altogether unless there is a very good reason to
> use it. It generally has the effect of pushing things out of cache early
> so is to be avoided.
> > > Here is the fstab entry for the GFS filesystem:
> > > /dev/vggfs/lvol00 /apps gfs
> > > _netdev 1 2
> > Try "noatime,noquota" here.
We also the the Oracle DB server accessing the GFS /apps directory from one of the Oracle Application servers via NFS, which I reckon is not helping
performance. I'm trying to get the DBA's to give me a list of directories to export instead of exporting the whole /apps partition.
Doing testing I can set statfs_fast to 1 and it makes no difference at all on an ls of any of the WFSC_oraclprod directories.
I am making tuning changes 1 at a time and seeing what happens ...
This really does seem to be harder than it should be.
More information about the Linux-cluster