[Linux-cluster] GFS (1 & partially 2) performance problems
Jankowski, Chris
Chris.Jankowski at hp.com
Mon Jun 14 15:09:30 UTC 2010
Michael,
For comparison, could you do your dd(1) tests with a very large block size (1 MB) and tell us the results, please?
I have a vague hunch that the problem may have something to do with coalescing or not of IO operations.
Also, which IO scheduler are you using?
Thanks abnd regards,
Chris Jankowski
-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Michael Lackner
Sent: Tuesday, 15 June 2010 00:22
To: linux clustering
Subject: Re: [Linux-cluster] GFS (1 & partially 2) performance problems
Hello!
Thanks for your reply. I unfortunately forgot to mention, HOW I was actually testing, stupid.
I tested with dd, doing 4kB blocksize reads and writes, 160GB total testfile size per node.
I read from /dev/zero for writing tests and wrote to /dev/null for reading tests. So, totally sequential, somewhat small blocksize (equal to filesystem BS).
The performance was measured directly on the Fibrechannel Switch, which offers nice per-port monitoring for that purpose.
I have yet to do some serious read testing on GFS2. I have aborted my
GFS2 tests as
write performance was not up to GFS1 to begin with. My older GFS2 benchmarks (i did this with a 2-node configuration before) are lost, I will need to re-do them to give you some numbers.
After each write test I did a "sync" to flush everything to disks. I did not do this before or after read tests though..
As you mentioned Journal Size, "gfs_tool counters <mountpoint>" said, that only 2-3% logspace were in use after the tests (I guess this is the per-node fs journal?).
As for the direct I/O tests, by that you mean testing without ANY caching going on, a synchronous write? What I did before was test EXT3 (~190MB/s) and XFS
(~320MB/s)
on the Storage Array. I think what I'm getting here is raw throughput, since I am not monitoring in the OS, but at the Fibrechannel Switch itself..
I will do GFS2 read tests similiar to those conducted for GFS1. I'll be able to do that tomorrow morning, then I can post the numbers here.
Thanks!
Steven Whitehouse wrote:
> Hi,
>
> On Mon, 2010-06-14 at 14:00 +0200, Michael Lackner wrote:
>
>> Hello!
>>
>> I am currently building a Cluster sitting on CentOS 5 for GFS usage.
>>
>> At the moment, the storage subsystem consists of an HP MSA2312
>> Fibrechannel SAN linked to an FC 8gbit switch. Three client machines
>> are connected to that switch over 8gbit FC. The disks themselves are
>> 12 * 15.000rpm SAS configured in RAID-5 with two hotspares.
>>
>> Now, the whole storage shall be shared (single filesystem), here GFS
>> comes in.
>>
>> The Cluster is only 3 nodes large at the moment, more nodes will be
>> added later on. I am currently testing GFS1 and GFS2 for performance.
>> Lock Management is done over single 1Gbit Ethernet Links (1 per
>> machine).
>>
>> Thing is, with GFS1 I get far better performance than with the newer
>> GFS2 across the board, with a few tunable parameters set, for writes
>> GFS1 is roughly twice as fast.
>>
>>
> What tests are you running? GFS2 is generally faster than GFS1 except
> for streaming writes, which is an area that we are putting some effort
> into solving currently. Small writes (one fs block (4k default) or
> less) on GFS2 are much faster than on GFS1.
>
>
>> But, concurrent reads are totally abysmal. The total write
>> performance (all nodes combined) sits around 280-330Mbyte/sec,
>> whereas the READ performance is as low as 30-40Mbyte/sec when doing
>> concurrent reads. Surprisingly, single-node read is somewhat ok at
>> 180Mbyte/sec, but as soon as several nodes are reading from GFS
>> (version 1 at the
>> moment) at the same time, things turn ugly.
>>
>>
> Reads on GFS2 should be much faster than GFS1, so it sounds as if
> something isn't working correctly for some reason. For cached data,
> reads on GFS2 should be as fast as ext2/3 since the code path is
> identical (to the page cache) and only changes if pages are not cached.
> GFS1 does its locking at a higher level, so there will be more
> overhead for cached reads in general.
>
> Do make sure that if you are preparing the test files for reading all
> from one node (or even just a different node to that on which you sre
> running the read tests) that you need to sync them to disk on that
> node before starting the tests to avoid issues with caching.
>
>
>> This is strange, because for writes, global performance across the
>> cluster increases slightly when adding more nodes. But for reads, the
>> oppsite seems to be true.
>>
>> For read and write tests, separate testfiles were created and read
>> for each node, with each testfile sitting in its own subdirectory, so
>> no node would access another nodes file.
>>
>>
> That sounds like a good test set up to me.
>
>
>> GFS1 created with the following mkfs.gfs parameters:
>> "-b 4096 -J 128 -j 16 -r 2048 -p lock_dlm"
>> (4kB blocksite, 16 * 128MB journals, 2GB resource groups, Distributed
>> LockManager)
>>
>> Mount Options set: "noatime,nodiratime,noquota"
>>
>> Tunables set: "glock_purge 50, statfs_slots 128, statfs_fast 1,
>> demote_secs 20"
>>
> You shouldn't normally need to set the glock_purge and demote_secs to
> anything other than the default. These settings no longer exist in
> GFS2 since it makes use of the shrinker subsystem provided by the VM
> and is auto-tuning. If your workload is metadata heavy, you could try
> boosting the journal size and/or the incore_log_blocks tunable.
>
>
>> Also, in /etc/cluster/cluster.conf, I added this:
>> <dlm plock_ownership="1" plock_rate_limit="0"/> <gfs_controld
>> plock_rate_limit="0"/>
>>
>> Any ideas on how to figure out what's going wrong, and how to tune
>> GFS1 for better concurrent read performance, or tune GFS2 in general
>> to be competitive/better than GFS1?
>>
>> I'm dreaming about 300MB/sec read, 300MB/sec write sequentially and
>> somewhat good reaction times while under heavy sequential and/or
>> random load. But for now, I just wanna get the seq reading to work
>> acceptably fast.
>>
>> Thanks a lot for your help!
>>
>>
> Can you try doing some I/O direct to the block device so that we can
> get an idea of what the raw device can manage? Using dd both read and
> write, across the nodes (different disk locations on each node to
> simulate different files).
>
> I'm wondering if the problem might be due to the seek pattern
> generated by the multiple read locations,
>
> Steve.
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
--
Michael Lackner
Chair of Information Technology, University of Leoben IT Administration michael.lackner at mu-leoben.at | +43 (0)3842/402-1505
--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
More information about the Linux-cluster
mailing list