[Linux-cluster] clvmd without GFS?
Matt Mitchell
mmitchell at virtualproperties.com
Thu Oct 28 15:35:43 UTC 2004
Michael Conrad Tadpol Tilstra wrote:
> For one node, it should be pretty fast, even for large directories.
> Also, you should make sure that you're not being bitten by ls. For many
> people, by default, ls is also stat-ing every entry in the directory.
> (for colors or the extra char at the end of the name) As well as ls
> typically reads all of teh entries, sorts them, then formats and
> displays.
I do know about the issues with ls in general. In loading up the GFS
partition (as I referenced earlier) I noticed some interesting behavior.
I have a script that is copying one file at a time from the source
drive running on one of the cluster hosts, and I'm doing a 'strace ls'
on the other. The getdents64 syscalls are taking an average of about
1/3 of a second to return, which isn't that bad I suppose given the
contention. What's interesting about it is that the copying and the
getdents64 seem to finish at the same time, such that the two windows
scroll in more-or-less lockstep. It's hard to quantify, but it seems
like the nodes are spending a lot of time wrangling over the directory
lock. The two machines are clustered over their own network segment on
their secondary interfaces (which is, incidentally, made difficult by
cman's insistence on believing cluster.conf instead of the
command-line). At least, they are supposed to be. There's still a lot
of network traffic on the primary interfaces. Is there a way to ensure
that the cluster chatter stays on one interface or the other?
Immediately after loading the directory I did this:
hudson:/mnt/xs_media# time sh -c 'ls 100032/mls/fmls_stills | wc -l'
298407
real 7m40.726s
user 0m5.541s
sys 1m58.229s
In any event, I unmounted and remounted the GFS partition to clear
state, then started another time sh -c 'ls 100032/mls/fmls_stills | wc
-l' on that node. There is still another (idle) node with the disks
mounted.
It's been going for about an hour now. If I strace the ls I can see it
moving (getdents64 are returning). top shows dlm_sendd taking up 95% of
the cpu and load average is over 8. Based on ls's memory usage from the
previous run above, on these same files, I think it is about 1/3 of the
way done.
Despite my attempts to control the cluster interface usage it looks like
they are chattering over both interfaces (sar showing a steady 8-10kb
per second for each).
I must have something screwed up here. GFS gurus, please enlighten me.
Is it time to update the gfs code? I am using a fairly old version
(9/19 or thereabouts).
-m
More information about the Linux-cluster
mailing list