[Linux-cluster] slow I/O on v7000 SAN

Wed Nov 16 17:27:09 UTC 2011

On the Red Hat and IBM ticket mention - I wasn't really complaining
about either support. They have been pretty responsive. Just we are not
finding the underlying cause.

Red Hat support is pretty good - I have seen much worse. I didn't mean
to come off annoyed by RH tech support.

I really cant see where the issue is. At first I thought it might be a
fencing issue, or a gfs2 tuning issue (plocks, etc).

Recently I started looking at it from the SAN side - wondering if
somethig was amiss there.

On 11/16/2011 12:07 PM, Michael Bubb wrote:
> 
> 
> On 11/16/2011 11:54 AM, Steven Whitehouse wrote:
>> Hi,
>>
>> On Wed, 2011-11-16 at 11:42 -0500, Michael Bubb wrote:
>>> Hello -
>>>
>>> We are experiencing extreme I/O slowness on a gfs2 volume on a SAN.
>>>
>>> We have a:
>>>
>>> Netezza TF24
>>> IBM V7000 SAN
>>> IBM Bladecenter with 3 HS22 blades
>>> Stand alone HP DL380 G7 server
>>>
>>>
>>> The 3 blades and the HP DL380 are clustered using RHEL 6.1 and
>>> clustersuite 5.5.
>>>
>> You should ask the Red Hat support team about this, as they should be
>> able to help.
> 
> I assumed that this would be assumed. I have had a ticket in for a
> while. The ticket is not going anywhere. I have sent sosrepts, etc... I
> also have aticket in with IBM regarding the SAN. After a day they came
> back and told me "Yes, you have I/O slownesss...."
> 
> So I thought I would hit the community and see if this rings a bell with
> anyone else.
>>
>>> We have 2 clustered volumes on different storage pools (one has 10k
>>> drives the other 7.2k).
>>>
>>> We have an internal test that reads a large file (950G) using fopen and
>>> memmap. On a standalone server in a datacenter (Ubuntu Raid 5 10k disks)
>>> the tests take approximately 75seconds to run.
>>>
>>> On the blades the test takes 300 - 350 seconds.
>>>
>>> I have been looking at the cluster conf; any gfs tuning I can find. I am
>>> not really sure what I should post here?
>>>
>>> yrs
>>>
>>> Michael
>>>
>> So it is a streaming data test. Are you running it on all three nodes at
>> the same time or just one when you get the 300 seconds times? Did you
>> mount with noatime,nodiratime set?
> 
> We tested first all nodes, then one node. There is amazing consistency
> here. He have run this test about 10 times in different scenarios and it
> is always about 5 times slower than the Ubuntu SATA Raid5 volume.
> 
>>
>> Are the drives you are using just a straight linear lvm volume from a
>> JBOD effectively?
> THey are RAID6 mdisks on the SAN organized into storage pools. THe
> volumes are created on top of this.
>>
>> Steve.
>>  
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

-- 

Michael Bubb
System Administrator
..................................................
T: +1.646.380-2738  | C: +1.646.783.8769
mbubb at collective.com  | michael.bubb at gmail.com
..................................................
Collective | collective.com | Twitter: @collectivesays
99 Park Ave. 5th Floor |  New York, NY 10016 | +1.888.460.9513
..................................................

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 554 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111116/7d88a38e/attachment.sig>