[Linux-cluster] I/O scheduler and performance
Ramon van Alteren
ramon at vanalteren.nl
Thu Jul 6 09:31:04 UTC 2006
Wendy Cheng wrote:
>On Wed, 2006-07-05 at 10:06 +0200, Ramon van Alteren wrote:
>>The annoying problem is that I can't find a way to switch schedulers on
>>runtime for the gfs based storage (coraids connected with ata over
>>ethernet) So I suspect that I need to change the default scheduler
>>compiled in the kernel and reboot, or build all schedulers as modules
>>and load/unload the modules and retest.
>Which version of kernel are you running on ? For RHEL 4 (2.6.9 based),
>it is just a matter of specifying boot time parameter and reboot - no
>need to recompile kernel and/or modules. Newer versions of community
>kernel in kernel.org (say 2.4.17) may have even more flexible methods.
I'm running 2.6.16 kernel with gentoo patches and the latest stable
cluster sources build in as modules.
I have reconfigured my grub setup so it just takes a reboot to change
For local (fixed) disks I can change the scheduler without a reboot by
writing to /sys/block/sda/queue/scheduler
Sadly this isn't possible for shared storage so I need a reboot.
>>This brings up a second question. While researching last night I found
>>some documents on the net that seem to indicate that gfs uses (or used
>>to use) directory based locking for writing between the nodes.
>>E.g. in order to write a file the nodes pass around a directory lock.
>>However much of the documentation floating around on the internet is
>>outdated and seems to refer to older versions of gfs.
>If your write has something to do with directory (say "create"), then
>directory lock is required. Otherwise, the lock obtained is only
>associated with the file itself.
We write lots of files in the same directory, such locking would have
been a pretty disaster
>>I haven't found any docs describing the locking process with the latest
>>gfs code and the dlm.
>>I'm currently seeing a significant drop in throughput between a xfs
>>filesystem on the shared storage mounted on a single host and a gfs
>>filesystem on the shared storage mounted on a single host.
>>I'm getting roughly 75Mb/s throughput on the "normal" fs and 27Mb/s on a
>GFS in general doesn't perform well under bonnie++ due to the extensive
>usage of "stat()" system call. This is because bonnie++ doesn't know the
>exact file size during the runs so it has to do a "stat()" to retrieve
>the size to decide how to allocate its read/write buffer before *each*
>read- write. The "stat()" system call happens to be very expensive in
>So check out your IO calls - do you really need to do lots of "stat()"
>system call ? Otherwise, switching to other benchmarks (such as IOZONE)
>and you may find the numbers differ greatly.
OK, I reran the tests with iozone and it shows a difference but not much.
roughly 75Mb/s throughput with a "local" fs and 30Mb/s throughput on gfs.
I still need to do the conncurrent write test (writing over gfs from
multiple hosts in the cluster)
And I'm running tests with different schedulers.
To be stupid and selfish and to have good health are the three requirements for happiness, though if stupidity is lacking, the others are useless.
More information about the Linux-cluster