[Linux-cluster] Node lag

Mon Feb 13 10:04:02 UTC 2006

Hi,

yeah your right, wrong file size:

Here are the test results with a 2048MB file size. The Raid itself holds 
1024MB Cache in RAM.

# tiotest -f 2048
Tiotest results for 4 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write        8192 MBs |   58.6 s | 139.801 MB/s |  28.0 %  | 987.1 % |
| Random Write   16 MBs |    1.9 s |   8.435 MB/s |   0.9 %  |  29.6 % |
| Read         8192 MBs |   54.9 s | 149.176 MB/s |  16.6 %  | 171.7 % |
| Random Read    16 MBs |   10.4 s |   1.509 MB/s |   0.2 %  |   4.1 % |
`----------------------------------------------------------------------'
Tiotest latency results:
,-------------------------------------------------------------------------.
| Item         | Average latency | Maximum latency | % >2 sec | % >10 sec |
+--------------+-----------------+-----------------+----------+-----------+
| Write        |        0.108 ms |      980.220 ms |  0.00000 |   0.00000 |
| Random Write |        1.237 ms |      198.483 ms |  0.00000 |   0.00000 |
| Read         |        0.104 ms |      185.499 ms |  0.00000 |   0.00000 |
| Random Read  |       10.178 ms |      116.995 ms |  0.00000 |   0.00000 |
|--------------+-----------------+-----------------+----------+-----------|
| Total        |        0.117 ms |      980.220 ms |  0.00000 |   0.00000 |
`--------------+-----------------+-----------------+----------+-----------'

So the IO is mainly the same on all nodes with the right file size, but 
the question still is why is the random read/write performance so bad !!

More Infos about the Systems as:
Each Node got 2048MB RAM and dual Xeon CPU.
As FC Controller we are using are QLogic Corp. QLA2312
As a Switch and for fencing the Qlogic 5202.
The Raid itself is an easyRAID Q16+ with 16 Disk and it performance very 
well under eg XFS.

Any further hints ?

-- 
----

Frank Schliefer

Kovacs, Corey J. schrieb:
> Also, I think it might be interesting to see what happens when you use data
> sizes that
> will overrun any cacheing being done. I've seen great performance using a
> simple MSA1000
> as long as there is a lot of cache available on the SAN itself. As soon as I
> run tests with
> data sets larger then the cache size, the performance falls to the floor.
> Unless your over
> loading the cache, you might not be getting a true metric of whats really
> getting written 
> to disk.
> 
> Maybe the slow node is getting hit by cache overhead from the SAN? 
> 
> 
> Just a thought
> 
> 
> Corey
> 
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Patrick Caulfield
> Sent: Thursday, February 09, 2006 9:18 AM
> To: linux clustering
> Subject: Re: [Linux-cluster] Node lag
> 
> Frank Schliefer wrote:
> 
>>Hi,
>>
>>after setting up an four node cluster we have one node that is way 
>>slower than the other 3 nodes.
>>
>>We using eg. tiotest for benchmarking the GFS.
>>
>>Normal Node:
>>Tiotest results for 4 concurrent io threads:
>>,----------------------------------------------------------------------.
>>| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
>>+-----------------------+----------+--------------+----------+---------+
>>| Write          40 MBs |    0.2 s | 227.426 MB/s |  36.4 %  | 384.4 % |
>>| Random Write   16 MBs |    0.1 s | 143.405 MB/s |  58.7 %  | 146.9 % |
>>| Read           40 MBs |    0.0 s | 2558.199 MB/s | 307.0 %  | 1228.0 % |
>>| Random Read    16 MBs |    0.0 s | 2685.169 MB/s | 550.0 %  | 1374.9 % |
>>`----------------------------------------------------------------------'
>>
>>
>>Slow Node:
>>Tiotest results for 4 concurrent io threads:
>>,----------------------------------------------------------------------.
>>| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
>>+-----------------------+----------+--------------+----------+---------+
>>| Write          40 MBs |    1.4 s |  27.687 MB/s |   2.2 %  | 121.8 % |
>>| Random Write   16 MBs |    4.2 s |   3.695 MB/s |   0.0 %  |   7.9 % |
>>| Read           40 MBs |    0.0 s | 2228.288 MB/s |  89.1 %  | 1337.1 % |
>>| Random Read    16 MBs |    0.0 s | 2252.739 MB/s | 230.7 %  | 692.1 % |
>>`----------------------------------------------------------------------'
>>
>>any hints why this could happen ??
>>
>>Using kernel 2.6.15.2 (sorry no RH)
> 
> 
> It would be helpful if you could give us more information about your
> installation: disk topology, lock manager in use (and which nodes are
> lockservers if using GULM) and whether it matters which nodes are started
> first or not.
>