[Linux-cluster] optimising DLM speed?

Alan Brown ajb2 at mssl.ucl.ac.uk
Tue Feb 15 17:59:08 UTC 2011

After lots of headbanging, I'm slowly realising that limits on GFS2 lock 
rates and totem message passing appears to be the main inhibitor of 
cluster performance.

Even on disks which are only mounted on one node (using lock_dlm), the 
ping_pong rate is - quite frankly - appalling, at about 5000 
locks/second, falling off to single digits when 3 nodes are active on 
the same directory.

totem's defaults are pretty low:

(from man openais.conf)

max messages/second = 17
window_size = 50
encryption = on
encryption/decryption threads = 1
netmtu = 1500

I suspect tuning these would have a marked effect on performance

gfs_controld and dlm_controld aren't even appearing in the CPU usage 
tables (24Gb dual 5560CPUs)

We have 2 GFS clusters, 2 nodes (imap) and 3 nodes (fileserving)

The imap system has around 2.5-3 million small files in the Maildir imap 
tree, whilst the fileserver cluster has ~90 1Tb filesystems of 1-4 
million files apiece (fileserver total is around 150 million files)

When things get busy or when users get silly and drop 10,000 files in a 
directory, performance across the entire cluster goes downhill badly - 
not just in the affected disk or directory.

Even worse: backups - it takes 20-28 hours to run a 0 file incremental 
backup of a 2.1million file system (ext4 takes about 8 minutes for the 
same file set!)

All heartbeat/lock traffic is handled across a dedicated Gb switch with 
each cluster in its own vlan to ensure no external cruft gets in to 
cause problems.

I'm seeing heartbeat/lock lan traffic peak out at about 120kb/s and 
4000pps per node at the moment. Clearly the switch isn't the problem - 
and using hardware acclerated igb devices I'm pretty sure the 
networking's fine too.

SAN side, there are 4 8Gb Qlogic cards facing the fabric and right now 
the whole mess talks to a Nexsan atabeast (which is slow, but seldom 
gets its commmand queue maxed out.)

Has anyone played much with the totem message timings? if so what 
results have you had?

As a comparison, the same hardware using EXT4 on a standalone system can 
trivially max out multiple 1Gb/s interfaces while transferring 1-2Mb/s 
files and gives lock rates of 1.8-2.5 million locks/second even with 
multiple ping_pong processes running.

More information about the Linux-cluster mailing list