[Linux-cluster] I/O scheduler and performance

Wed Jul 5 15:02:58 UTC 2006

On Wed, 2006-07-05 at 10:06 +0200, Ramon van Alteren wrote:

> The annoying problem is that I can't find a way to switch schedulers on
> runtime for the gfs based storage (coraids connected with ata over
> ethernet) So I suspect that I need to change the default scheduler
> compiled in the kernel and reboot, or build all schedulers as modules
> and load/unload the modules and retest.

Which version of kernel are you running on ? For RHEL 4 (2.6.9 based),
it is just a matter of specifying boot time parameter and reboot - no
need to recompile kernel and/or modules. Newer versions of community
kernel in kernel.org (say 2.4.17) may have even more flexible methods.

> 
> Getting a workload to relyable test with is another problem.
> I'm currently using bonnie++ from multiple hosts to test throughput.
> 
> If anyone on list is aware of another benchmark tool to generate write
> workloads I'd be gratefull for a pointer.

http://www.iozone.org/ (iozone)

> 
> This brings up a second question. While researching last night I found
> some documents on the net that seem to indicate that gfs uses (or used
> to use) directory based locking for writing between the nodes.
> E.g. in order to write a file the nodes pass around a directory lock.
> However much of the documentation floating around on the internet is
> outdated and seems to refer to older versions of gfs.

If your write has something to do with directory (say "create"), then
directory lock is required. Otherwise, the lock obtained is only
associated with the file itself.

> 
> I haven't found any docs describing the locking process with the latest
> gfs code and the dlm.
> 
> I'm currently seeing a significant drop in throughput between a xfs
> filesystem on the shared storage mounted on a single host and a gfs
> filesystem on the shared storage mounted on a single host.
> 
> I'm getting roughly 75Mb/s throughput on the "normal" fs and 27Mb/s on a
> gfs fs.
> 

GFS in general doesn't perform well under bonnie++ due to the extensive
usage of "stat()" system call. This is because bonnie++ doesn't know the
exact file size during the runs so it has to do a "stat()" to retrieve
the size to decide how to allocate its read/write buffer before *each*
read- write. The "stat()" system call happens to be very expensive in
GFS.

So check out your IO calls - do you really need to do lots of "stat()"
system call ? Otherwise, switching to other benchmarks (such as IOZONE)
and you may find the numbers differ greatly.

-- Wendy