[Linux-cluster] I/O scheduler and performance

Thu Jul 6 09:31:04 UTC 2006

Wendy Cheng wrote:

>On Wed, 2006-07-05 at 10:06 +0200, Ramon van Alteren wrote:
>  
>
>>The annoying problem is that I can't find a way to switch schedulers on
>>runtime for the gfs based storage (coraids connected with ata over
>>ethernet) So I suspect that I need to change the default scheduler
>>compiled in the kernel and reboot, or build all schedulers as modules
>>and load/unload the modules and retest.
>>    
>>
>
>Which version of kernel are you running on ? For RHEL 4 (2.6.9 based),
>it is just a matter of specifying boot time parameter and reboot - no
>need to recompile kernel and/or modules. Newer versions of community
>kernel in kernel.org (say 2.4.17) may have even more flexible methods.
>  
>
I'm running 2.6.16 kernel with gentoo patches and the latest stable 
cluster sources build in as modules.
I have reconfigured my grub setup so it just takes a reboot to change 
I/O scheduler.

For local (fixed) disks I can change the scheduler without a reboot by 
writing to /sys/block/sda/queue/scheduler
Sadly this isn't possible for shared storage so I need a reboot.

>>This brings up a second question. While researching last night I found
>>some documents on the net that seem to indicate that gfs uses (or used
>>to use) directory based locking for writing between the nodes.
>>E.g. in order to write a file the nodes pass around a directory lock.
>>However much of the documentation floating around on the internet is
>>outdated and seems to refer to older versions of gfs.
>>    
>>
>
>If your write has something to do with directory (say "create"), then
>directory lock is required. Otherwise, the lock obtained is only
>associated with the file itself.
>  
>
OK, thanks.
We write lots of files in the same directory, such locking would have 
been a pretty disaster

>>I haven't found any docs describing the locking process with the latest
>>gfs code and the dlm.
>>
>>I'm currently seeing a significant drop in throughput between a xfs
>>filesystem on the shared storage mounted on a single host and a gfs
>>filesystem on the shared storage mounted on a single host.
>>
>>I'm getting roughly 75Mb/s throughput on the "normal" fs and 27Mb/s on a
>>gfs fs.
>>
>>    
>>
>
>GFS in general doesn't perform well under bonnie++ due to the extensive
>usage of "stat()" system call. This is because bonnie++ doesn't know the
>exact file size during the runs so it has to do a "stat()" to retrieve
>the size to decide how to allocate its read/write buffer before *each*
>read- write. The "stat()" system call happens to be very expensive in
>GFS.
>
>So check out your IO calls - do you really need to do lots of "stat()"
>system call ? Otherwise, switching to other benchmarks (such as IOZONE)
>and you may find the numbers differ greatly.
>  
>
OK, I reran the tests with iozone and it shows a difference but not much.
roughly 75Mb/s throughput with a "local" fs and 30Mb/s throughput on gfs.

I still need to do the conncurrent write test (writing over gfs from 
multiple hosts in the cluster)
And I'm running tests with different schedulers.

Grtz Ramon

-- 
To be stupid and selfish and to have good health are the three requirements for happiness, though if stupidity is lacking, the others are useless.

Gustave Flaubert