[Linux-cluster] GFS (1 & partially 2) performance problems

Michael Lackner michael.lackner at mu-leoben.at
Mon Jun 14 14:21:44 UTC 2010


Hello!

Thanks for your reply. I unfortunately forgot to mention, HOW I was 
actually testing,
stupid.

I tested with dd, doing 4kB blocksize reads and writes, 160GB total 
testfile size per node.
I read from /dev/zero for writing tests and wrote to /dev/null for 
reading tests. So, totally
sequential, somewhat small blocksize (equal to filesystem BS).

The performance was measured directly on the Fibrechannel Switch, which 
offers nice
per-port monitoring for that purpose.

I have yet to do some serious read testing on GFS2. I have aborted my 
GFS2 tests as
write performance was not up to GFS1 to begin with. My older GFS2 benchmarks
(i did this with a 2-node configuration before) are lost, I will need to 
re-do them to
give you some numbers.

After each write test I did a "sync" to flush everything to disks.  I 
did not do this before
or after read tests though..

As you mentioned Journal Size, "gfs_tool counters <mountpoint>" said, 
that only 2-3%
logspace were in use after the tests (I guess this is the per-node fs 
journal?).

As for the direct I/O tests, by that you mean testing without ANY 
caching going on, a
synchronous write? What I did before was test EXT3 (~190MB/s) and XFS 
(~320MB/s)
on the Storage Array. I think what I'm getting here is raw throughput, 
since I am not
monitoring in the OS, but at the Fibrechannel Switch itself..

I will do GFS2 read tests similiar to those conducted for GFS1. I'll be 
able to do that
tomorrow morning, then I can post the numbers here.

Thanks!

Steven Whitehouse wrote:
> Hi,
>
> On Mon, 2010-06-14 at 14:00 +0200, Michael Lackner wrote:
>   
>> Hello!
>>
>> I am currently building a Cluster sitting on CentOS 5 for GFS usage.
>>
>> At the moment, the storage subsystem consists of an HP MSA2312
>> Fibrechannel SAN linked to an FC 8gbit switch. Three client machines
>> are connected to that switch over 8gbit FC. The disks themselves are
>> 12 * 15.000rpm SAS configured in RAID-5 with two hotspares.
>>
>> Now, the whole storage shall be shared (single filesystem), here GFS
>> comes in.
>>
>> The Cluster is only 3 nodes large at the moment, more nodes will be
>> added later on. I am currently testing GFS1 and GFS2 for performance.
>> Lock Management is done over single 1Gbit Ethernet Links (1 per
>> machine).
>>
>> Thing is, with GFS1 I get far better performance than with the newer
>> GFS2 across the board, with a few tunable parameters set, for writes
>> GFS1 is roughly twice as fast.
>>
>>     
> What tests are you running? GFS2 is generally faster than GFS1 except
> for streaming writes, which is an area that we are putting some effort
> into solving currently. Small writes (one fs block (4k default) or less)
> on GFS2 are much faster than on GFS1.
>
>   
>> But, concurrent reads are totally abysmal. The total write performance
>> (all nodes combined) sits around 280-330Mbyte/sec, whereas the
>> READ performance is as low as 30-40Mbyte/sec when doing concurrent
>> reads. Surprisingly, single-node read is somewhat ok at 180Mbyte/sec,
>> but as soon as several nodes are reading from GFS (version 1 at the
>> moment) at the same time,  things turn ugly.
>>
>>     
> Reads on GFS2 should be much faster than GFS1, so it sounds as if
> something isn't working correctly for some reason. For cached data,
> reads on GFS2 should be as fast as ext2/3 since the code path is
> identical (to the page cache) and only changes if pages are not cached.
> GFS1 does its locking at a higher level, so there will be more overhead
> for cached reads in general.
>
> Do make sure that if you are preparing the test files for reading all
> from one node (or even just a different node to that on which you sre
> running the read tests) that you need to sync them to disk on that node
> before starting the tests to avoid issues with caching.
>
>   
>> This is strange, because for writes, global performance across the
>> cluster increases slightly when adding more nodes. But for reads, the
>> oppsite seems to be true.
>>
>> For read and write tests, separate testfiles were created and read for
>> each node, with each testfile sitting in its own subdirectory, so no
>> node would access another nodes file.
>>
>>     
> That sounds like a good test set up to me.
>
>   
>> GFS1 created with the following mkfs.gfs parameters:
>> "-b 4096 -J 128 -j 16 -r 2048 -p lock_dlm"
>> (4kB blocksite, 16 * 128MB journals, 2GB resource groups,
>> Distributed LockManager)
>>
>> Mount Options set: "noatime,nodiratime,noquota"
>>
>> Tunables set: "glock_purge 50, statfs_slots 128, statfs_fast 1, 
>> demote_secs 20"
>>     
> You shouldn't normally need to set the glock_purge and demote_secs to
> anything other than the default. These settings no longer exist in GFS2
> since it makes use of the shrinker subsystem provided by the VM and is
> auto-tuning. If your workload is metadata heavy, you could try boosting
> the journal size and/or the incore_log_blocks tunable.
>
>   
>> Also, in /etc/cluster/cluster.conf, I added this:
>> <dlm plock_ownership="1" plock_rate_limit="0"/>
>> <gfs_controld plock_rate_limit="0"/>
>>
>> Any ideas on how to figure out what's going wrong, and how to
>> tune GFS1 for better concurrent read performance, or tune GFS2
>> in general to be competitive/better than GFS1?
>>
>> I'm dreaming about 300MB/sec read, 300MB/sec write sequentially
>> and somewhat good reaction times while under heavy sequential
>> and/or random load. But for now, I just wanna get the seq reading
>> to work acceptably fast.
>>
>> Thanks a lot for your help!
>>
>>     
> Can you try doing some I/O direct to the block device so that we can get
> an idea of what the raw device can manage? Using dd both read and write,
> across the nodes (different disk locations on each node to simulate
> different files).
>
> I'm wondering if the problem might be due to the seek pattern generated
> by the multiple read locations,
>
> Steve.
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>   
-- 
Michael Lackner
Chair of Information Technology, University of Leoben
IT Administration
michael.lackner at mu-leoben.at | +43 (0)3842/402-1505




More information about the Linux-cluster mailing list