[Linux-cluster] GFS + CORAID Performance Problem
Wendy Cheng
wcheng at redhat.com
Mon Dec 11 03:38:52 UTC 2006
Wendy Cheng wrote:
> bigendian+gfs at gmail.com wrote:
>
>> I've just set up a new two-node GFS cluster on a CORAID sr1520
>> ATA-over-Ethernet. My nodes are each quad dual-core Opteron CPU
>> systems with 32GB RAM each. The CORAID unit exports a 1.6TB block
>> device that I have a GFS file system on.
>>
>> I seem to be having performance issues where certain read system
>> calls take up to three seconds to complete. My test app is bonnie++,
>> and the slow-downs appear to be happen in the "Rewriting" portion of
>> the test, though I'm not sure if this is exclusive. If I watch top
>> and iostat for the device in question, I see activity on the device,
>> then long (up to three second) periods of no apparent I/O. During
>> the periods of no I/O the bonnie++ process is blocked on disk I/O, so
>> it seems that the system it trying to do something. Network traces
>> seem to show that the host machine is not waiting on the RAID array,
>> and the packet following the dead-period seems to always be sent from
>> the host to the coraid device. Unfortunately, I don't know how to
>> dig in any deeper to figure out what the problem is.
>
Wait ... sorry, I didn't read carefully... now I see that 3 seconds in
the strace. That doesn't look like a bonnie++ issue.... Does bonnie++
run on single node ? Or you dispatch them on both nodes (on different
directories) ? This is more complicated than that I originally expected
(since this is a network block device ?). Need to think how to catch the
culprit... could be memory issue though. Could you try to run bonnie++
on 4G of memory to see how whether you can see there are 3 seconds read
delay ?
-- Wendy
>>
> I think we know about this issue. Note that bonnie++ doesn't keep the
> file size within the benchmark's local memory, it always invokes a
> "stat" system call to poll for the file size before it can do read and
> rewrite. GFS1 has a known performance issue with stat system call
> (that we hope it can be addressed by GFS2) and since file size in
> bonnie++ tend to be small, the stat() call overhead becomes very
> obvious. It will be worse in your case due to the filesystem size.
>
hmm, wait ... I didn't check your strace carefully. Now I see that 3
seconds delay ...
More information about the Linux-cluster
mailing list