[Linux-cluster] High DLM CPU usage - low GFS/iSCSI performance

Thu Feb 24 10:02:41 UTC 2011

Hi,

On Thu, 2011-02-24 at 10:34 +0100, Martijn Storck wrote:
> Hello everyone,
> 
> 
> We currently have the following RHCS cluster in operation:
> 
> 
> - 3 nodes, Xeon CPU, 12 GB hardware etc.
> - 100mbit network between the cluster nodes
> - Dell MD3200i iSCSI SAN, with 4 Gbit links (dm-multipath) to each
> server (through two switches), 5 15k RPM spindles
> - 1 GFS1 file system on the above mentioned SAN
> 
> 
> 2 of the nodes share a single GFS file system, which is used for
> hosting virtual machine containers (for web serving, mail and light
> database work). We've noticed that performance is suboptimal so we've
> started to investigate. The load is not high (we previously ran the
> same containers on a single, much cheaper server using local 7200rpm
> disks and ext3 fs without issues), but there is a lot of small block
> I/O.
> 
> 
> When I run iptraf (only monitoring the iSCSI traffic) and top side by
> side on a single server I often see dlm_send using 100% CPU. During
> this time I/O to our gfs filesystem seems to be blocked and container
> performance goes down the drain.
> 
Can you take a netstat -t while the cpu usage is at 100%, that will tell
us whether there is queued data at that point in time.

> 
> My question is: what causes dlm_send to use 100% CPU and is this wat
> causes the low GFS performance? Based on what the servers are doing
> I'm not expecting any deadlocks (they're mostly accessing separate
> parts of the filesystem), so I'm suspecting some other kind of
> limitation here. Could it be the 100Mbit network?
> 
Well, that depends on how much traffic there is... have you measured the
traffic when the problem is occurring?

> 
> I've looked into the waiters queue using the debug fs and it varies
> between 0 and 60 entries which doesn't seem to bad to me. The locks
> table has some 30.000 locks. All DLM and GFS settings are defaults.
> Any hints on where to look are appreciated!
> 
It does sounds like a performance issue, and it shouldn't be too hard to
get to the bottom of what is going on,

Steve.

> 
> Regards,
> 
> 
> Martijn Storck
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster