[Linux-cluster] performance bottleneck on 36-disk GFS/NFS cluster
Riaan van Niekerk
riaan at obsidian.co.za
Wed Jul 19 15:33:07 UTC 2006
We have a 2.5 TB GFS (6.1) with 2 TB of data spread via RAID10 metaLUNs
on an EMC CX500. The GFS is running on 4 nodes of varying size (some
have 24CPU 2GB RAM. others 2CPU 1GB RAM), exported via NFS to 10 NFS
clients which are POP/IMAP/SMTP servers in an ISP environment. The IPs
are managed by rgmanager.
The data is a couple of hundred thousand mailboxes (in MailDir) format.
Some performance metrics:
- Load average for NFS servers is about 8 - 16 per NFS client mounted.
The big servers have 4 clients each (32 - 64 load average). The smaller
servers have 1 client each (8 - 16 load average).
- dlm_recvd is by far the busiest process in top (10 - 20%), followed by
nfsd processes and gfs_inoded, lock_dlm, gfs_scand.
Here is the output of one of the outputs of "iostat -x dm-0 5":
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
dm-1 0.00 0.00 2043.23 970.91 16345.86 7767.27 8172.93
3883.64 8.00 35.62 11.86 0.34 101.01
Some notable numbers are 101.01 % utilization, reads per sec and writes
per sec in the thousands.
My questions are:
1 (Taking a guestimate) Is my problem lack of spindles, or the
inefficiencies of NFS via GFS (I have heard a number of others on the
list complain often about NFS on GFS performance).
2 What would be the best way to improve performance? We have a couple of
a) Collapse/remove the NFS layer. Make a large number of mail servers
SAN-attached GFS nodes. (perhaps not all, but 4 to 6. by converting the
GFS nodes into mail server)
b) add more spindles. We are in the process of adding 24 more spindles
(the CX is already taking almost a day to restripe the metaLUN. We might
be able to add 24 more spindles and restripe again.
3 Is having dlm_recvd as the top process normal/typical for an I/O bound
GFS cluster? Even though the MailDir mail store consists of million of
files, nodes should very rarely write into the same directory at the
same time (meaning that directory lock contention should be avoided)
thank you in advance
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 310 bytes
Desc: not available
More information about the Linux-cluster