[Linux-cluster] dlm and IO speed problem <er, might wanna get a coffee first ; )>
Kadlecsik Jozsef
kadlec at sunserv.kfki.hu
Thu Apr 10 13:00:40 UTC 2008
On Wed, 9 Apr 2008, Wendy Cheng wrote:
> > What led me to suspect clashing in the hash (or some other lock-creating
> > issue) was the simple test I made on our five node cluster: on one node I
> > ran
> >
> > find /gfs -type f -exec cat {} > /dev/null \;
> >
> > and on another one just started an editor, naming a non-existent file.
> > It took multiple seconds while the editor "opened" the file. What else than
> > creating the lock could delay the process so long?
> >
>
> Not knowing how "find" is implemented, I would guess this is caused by
> directory locks. Creating a file needs a directory lock. Your exclusive write
> lock (file create) can't be granted until the "find" releases the directory
> lock. It doesn't look like a lock query performance issue to me.
As /gfs is a large directory structure with hundreds of user home
directories, somehow I don't think I could pick the same directory which
was just processed by "find".
But this is a good clue to what might bite us most! Our GFS cluster is an
almost mail-only cluster for users with Maildir. When the users experience
temporary hangups for several seconds (even when writing a new mail), it
might be due to the concurrent scanning for a new mail on one node by the
MUA and the delivery to the Maildir in another node by the MTA.
What is really strange (and distrurbing) that such "hangups" can take
10-20 seconds which is just too much for the users.
In order to look at the possible tuning options and the side effects, I
list what I have learned so far:
- Increasing glock_purge (percent, default 0) helps to trim back the
unused glocks by gfs_scand itself. Otherwise glocks can accumulate and
gfs_scand eats more and more time at scanning the larger and
larger table of glocks.
- gfs_scand wakes up every scand_secs (default 5s) to scan the glocks,
looking for work to do. By increasing scand_secs one can lessen the load
produced by gfs_scand, but it'll hurt because flushing data can be
delayed.
- Decreasing demote_secs (seconds, default 300) helps to flush cached data
more often by moving write locks into less restricted states. Flushing
often helps to avoid burstiness *and* to prolong another nodes'
lock access. Question is, what are the side effects of small
demote_secs values? (Probably there is no much point to choose
smaller demote_secs value than scand_secs.)
Currently we are running with 'glock_purge = 20' and 'demote_secs = 30'.
> > But 'flushing when releasing glock' looks as a side effect. I mean, isn't
> > there a more direct way to control the flushing?
>
> To make long story short, I did submit a direct cache flush patch first,
> instead of this final version of lock trimming patch. Unfortunately, it was
> *rejected*.
I see. Another question, just out of curiosity: why don't you use kernel
timers for every glock instead of gfs_scand? The hash bucket id of the
glock should be added to struct gfs_glock, but the timer function could be
almost identical with scan_glock. As far as I see the only drawback were
that it'd be equivalent with 'glock_purge = 100' and it'd be tricky to
emulate glock_purge != 100 settings.
Best regards,
Jozsef
--
E-mail : kadlec at mail.kfki.hu, kadlec at blackhole.kfki.hu
PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address: KFKI Research Institute for Particle and Nuclear Physics
H-1525 Budapest 114, POB. 49, Hungary
More information about the Linux-cluster
mailing list