[Linux-cluster] Node lag
Frank Schliefer
f_schliefer at vcc.de
Mon Feb 13 10:04:02 UTC 2006
Hi,
yeah your right, wrong file size:
Here are the test results with a 2048MB file size. The Raid itself holds
1024MB Cache in RAM.
# tiotest -f 2048
Tiotest results for 4 concurrent io threads:
,----------------------------------------------------------------------.
| Item | Time | Rate | Usr CPU | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write 8192 MBs | 58.6 s | 139.801 MB/s | 28.0 % | 987.1 % |
| Random Write 16 MBs | 1.9 s | 8.435 MB/s | 0.9 % | 29.6 % |
| Read 8192 MBs | 54.9 s | 149.176 MB/s | 16.6 % | 171.7 % |
| Random Read 16 MBs | 10.4 s | 1.509 MB/s | 0.2 % | 4.1 % |
`----------------------------------------------------------------------'
Tiotest latency results:
,-------------------------------------------------------------------------.
| Item | Average latency | Maximum latency | % >2 sec | % >10 sec |
+--------------+-----------------+-----------------+----------+-----------+
| Write | 0.108 ms | 980.220 ms | 0.00000 | 0.00000 |
| Random Write | 1.237 ms | 198.483 ms | 0.00000 | 0.00000 |
| Read | 0.104 ms | 185.499 ms | 0.00000 | 0.00000 |
| Random Read | 10.178 ms | 116.995 ms | 0.00000 | 0.00000 |
|--------------+-----------------+-----------------+----------+-----------|
| Total | 0.117 ms | 980.220 ms | 0.00000 | 0.00000 |
`--------------+-----------------+-----------------+----------+-----------'
So the IO is mainly the same on all nodes with the right file size, but
the question still is why is the random read/write performance so bad !!
More Infos about the Systems as:
Each Node got 2048MB RAM and dual Xeon CPU.
As FC Controller we are using are QLogic Corp. QLA2312
As a Switch and for fencing the Qlogic 5202.
The Raid itself is an easyRAID Q16+ with 16 Disk and it performance very
well under eg XFS.
Any further hints ?
--
----
Frank Schliefer
Kovacs, Corey J. schrieb:
> Also, I think it might be interesting to see what happens when you use data
> sizes that
> will overrun any cacheing being done. I've seen great performance using a
> simple MSA1000
> as long as there is a lot of cache available on the SAN itself. As soon as I
> run tests with
> data sets larger then the cache size, the performance falls to the floor.
> Unless your over
> loading the cache, you might not be getting a true metric of whats really
> getting written
> to disk.
>
> Maybe the slow node is getting hit by cache overhead from the SAN?
>
>
> Just a thought
>
>
> Corey
>
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Patrick Caulfield
> Sent: Thursday, February 09, 2006 9:18 AM
> To: linux clustering
> Subject: Re: [Linux-cluster] Node lag
>
> Frank Schliefer wrote:
>
>>Hi,
>>
>>after setting up an four node cluster we have one node that is way
>>slower than the other 3 nodes.
>>
>>We using eg. tiotest for benchmarking the GFS.
>>
>>Normal Node:
>>Tiotest results for 4 concurrent io threads:
>>,----------------------------------------------------------------------.
>>| Item | Time | Rate | Usr CPU | Sys CPU |
>>+-----------------------+----------+--------------+----------+---------+
>>| Write 40 MBs | 0.2 s | 227.426 MB/s | 36.4 % | 384.4 % |
>>| Random Write 16 MBs | 0.1 s | 143.405 MB/s | 58.7 % | 146.9 % |
>>| Read 40 MBs | 0.0 s | 2558.199 MB/s | 307.0 % | 1228.0 % |
>>| Random Read 16 MBs | 0.0 s | 2685.169 MB/s | 550.0 % | 1374.9 % |
>>`----------------------------------------------------------------------'
>>
>>
>>Slow Node:
>>Tiotest results for 4 concurrent io threads:
>>,----------------------------------------------------------------------.
>>| Item | Time | Rate | Usr CPU | Sys CPU |
>>+-----------------------+----------+--------------+----------+---------+
>>| Write 40 MBs | 1.4 s | 27.687 MB/s | 2.2 % | 121.8 % |
>>| Random Write 16 MBs | 4.2 s | 3.695 MB/s | 0.0 % | 7.9 % |
>>| Read 40 MBs | 0.0 s | 2228.288 MB/s | 89.1 % | 1337.1 % |
>>| Random Read 16 MBs | 0.0 s | 2252.739 MB/s | 230.7 % | 692.1 % |
>>`----------------------------------------------------------------------'
>>
>>any hints why this could happen ??
>>
>>Using kernel 2.6.15.2 (sorry no RH)
>
>
> It would be helpful if you could give us more information about your
> installation: disk topology, lock manager in use (and which nodes are
> lockservers if using GULM) and whether it matters which nodes are started
> first or not.
>
More information about the Linux-cluster
mailing list