[Linux-cluster] I/O scheduler and performance

Wed Jul 5 08:06:16 UTC 2006

Wendy Cheng wrote:
> On Wed, 2006-07-05 at 00:16 +0200, Ramon van Alteren wrote:
>> I wondered if it mattered what default I/O scheduler I choose in my
>> kernel setup for gfs performance.
>> I've looking /sys/block but can't find any place to set the I/O
>> scheduler for my devices I run gfs off and looking at the docs it seems
>> that gfs uses the VFS layer in the linux kernel for it's reading & writing.
>>
>> If I'm correct that should mean that the scheduler influences
>> performance right? If so what one would be benficial for gfs performance
>> in general (if such a statement can be made) and what scheduler would be
>> benificial for a workload that consists of mostly writes and fairly few
>> reads (90% vs 10%)

> The io scheduler does influence GFS performance and the general rules of
> linux IO and filesystem tuning can be applied to GFS - e.g. if you have
> lots of random writes all over the disk partitions, try to avoid the
> schedulers that will attempt to do merging and sorting. In reality, I've
> never found one single io scheduler that can outperform all others in
> all types of IO workloads, even in mostly-write or mostly-read cases.
> The performance is very much dependent on individual workloads (random
> write, sequential writes, file size, directory setup), system
> configurations (memory size, disk array types, etc), and sometimes
> cluster and disk layouts. You have to actually experiment with or
> benchmark your workload before you can be sure of the choice.

OK, thanks for the fast answer.
The annoying problem is that I can't find a way to switch schedulers on
runtime for the gfs based storage (coraids connected with ata over
ethernet) So I suspect that I need to change the default scheduler
compiled in the kernel and reboot, or build all schedulers as modules
and load/unload the modules and retest.

Getting a workload to relyable test with is another problem.
I'm currently using bonnie++ from multiple hosts to test throughput.

If anyone on list is aware of another benchmark tool to generate write
workloads I'd be gratefull for a pointer.

> On the other hand, be aware of the cluster filesystem nature of GFS -
> that is, if you try to access the same file (or directory) from
> different nodes, the inter-nodes locking and sync issues must be
> considered. For example, if you do frequent writes on one node and
> mingle the writes with immediate reads (with the same file) on another
> node, you may see performance drop significantly. This is because the
> write node has to obtain an exclusive lock, write the file, and sync the
> changes into the disk before it can be read by other node, comparing
> with single node filesystem where no inter-node locking (network
> latency) is involved and the read could obtain its data from memory
> cache without actual disk IOs. 

This brings up a second question. While researching last night I found
some documents on the net that seem to indicate that gfs uses (or used
to use) directory based locking for writing between the nodes.
E.g. in order to write a file the nodes pass around a directory lock.
However much of the documentation floating around on the internet is
outdated and seems to refer to older versions of gfs.

I haven't found any docs describing the locking process with the latest
gfs code and the dlm.

I'm currently seeing a significant drop in throughput between a xfs
filesystem on the shared storage mounted on a single host and a gfs
filesystem on the shared storage mounted on a single host.

I'm getting roughly 75Mb/s throughput on the "normal" fs and 27Mb/s on a
gfs fs.

Any pointers on additional info and/or advice would be very much
appreciated.

Ramon