[Linux-cluster] GFS performance

McDowell, Paul N. Paul.McDowell at celera.com
Wed Jul 12 18:51:18 UTC 2006


Hi Britt, 
>Does the GFS lock traffic have its own interface on each server and segmented switch network? 
No, but it's a good question.  Each server actually has three seperate NIC interfaces configured (public, private and iLO), but at the time of buid, redhat did not support creating the cluster to use a seperate interface, i.e the names in the cluster.ccs file had to be consistant with the hostname, and hence all traffic is on the same gigabit ethernet network.     
But even if the GFS lock traffic was using a seperate network (and switch) I'm not sure this would be the answer.  The problem I'm seeing is consistant when the systems are busy and when they are quiet.   
Also, if I run several ftp's of the same large 275MB file between cluster members across this gigabit network link, it seems to take a consistant amount of time (between 4-8 seconds) so it does not seem likely that network congestion is the issue. 
>What does the perl script do? 
I've copied the source below.  The script is called slowness.pl :) 
#!/usr/local/bin/perl -W 
##################################################################### 
# slowness.pl 
# 
# Utility to help us figure out where GFS is slow. 
# 
# 
# April 28, 2006 
##################################################################### 
use strict; 
my $NUM_LINES_TO_WRITE = 1000000;   
# fileToWrite is 'slowness' plus PID.  Hope you don't have something named 
# that already 
my $fileToWrite = "slowness." . $$; 
timeThis('Total', \&wholeTest); 
sub wholeTest { 
    print STDOUT "######### Testing writing...\n"; 
    timeThis('Total Writing', \&openingWritingClosing); 
    print STDOUT "######### Testing reading...\n"; 
    timeThis('Total Reading', \&openingReadingClosing); 
    print STDOUT "######### Testing removing...\n"; 
    timeThis('Unlinking', \&unlinkTheFile); 
} 
##################################################################### 
# subroutine that contains everything that needs to be timed 
# it is in a subroutine so that it can be timed (total time) as well. 
##################################################################### 
sub openingWritingClosing { 
    timeThis('    openTheFileForWriting', \&openTheFileForWriting); 
    timeThis('    printToFile', \&printToFile); 
    timeThis('    closeTheFile', \&closeTheFile); 
} 
sub openingReadingClosing { 
    timeThis('    openTheFileForReading', \&openTheFileForReading); 
    timeThis('    readTheFile', \&readTheFile); 
    timeThis('    closeTheFile', \&closeTheFile); 
} 
##################################################################### 
# open the file  for writing 
##################################################################### 
sub openTheFileForWriting { 
    open (FILE, ">$fileToWrite") or die "Could not open '$fileToWrite': $!"; 
} 
##################################################################### 
# open the file  for reading 
##################################################################### 
sub openTheFileForReading { 
    open (FILE, "<$fileToWrite") or die "Could not open '$fileToWrite': $!"; 
} 
##################################################################### 
# print to the file 
##################################################################### 
sub printToFile { 
    for(my $i = 0; $i < $NUM_LINES_TO_WRITE; $i++) { 
        print FILE "Is this fast or slow.?\n"; 
    } 
}     
##################################################################### 
# read The file 
##################################################################### 
sub readTheFile { 
    while (<FILE>) { 
        ; 
    } 
} 
##################################################################### 
# close the file 
##################################################################### 
sub closeTheFile { 
    close FILE or die "Could not close '$fileToWrite': $!"; 
} 
##################################################################### 
# unlink the file 
##################################################################### 
sub unlinkTheFile { 
    unlink $fileToWrite or die "Could not unlink '$fileToWrite': $!"; 
} 
##################################################################### 
# print to the stdout 
##################################################################### 
sub printToStdOut { 
    for(my $i = 0; $i < $NUM_LINES_TO_WRITE; $i++) { 
        print STDOUT "Is this fast or slow?\n"; 
    } 
}     
##################################################################### 
# Method that takes a name and a subroutine reference. 
# The execution of the subroutine is timed and the name 
# is printed in the output 
##################################################################### 
sub timeThis { 
    my ($name, $subroutineRef, @rest) = @_; 
    scalar (@rest) && die "timeThis: Too many parameters passed to this subroutine\n"; 
    my $timeStart = time(); 
    &{${subroutineRef}}; 
    my $elapsedTime = time() - $timeStart; 
    print $name . ": " . $elapsedTime . " seconds\n"; 
} 
	"Treece, Britt" <Britt.Treece at savvis.net>@SMTP at USRKVEXG01 
	Sent by: linux-cluster-bounces at redhat.com 

	07/12/2006 12:51 PM 
		Please respond to
		linux clustering <linux-cluster at redhat.com>@SMTP at USRKVEXG01

		
		To
					linux clustering <linux-cluster at redhat.com>@SMTP at USRKVEXG01 
		cc

		Subject
					RE: [Linux-cluster] GFS performance




		
	

Paul, 
  
Does the GFS lock traffic have its own interface on each server and segmented switch network? 
  
What does the perl script do? 
  
Regards, 
  
Britt Treece 
  
  _____   
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Paul n McDowell 
Sent: Wednesday, July 12, 2006 10:49 AM 
To: linux-cluster at redhat.com 
Subject: [Linux-cluster] GFS performance 
  
Hi, 
Can anyone provide techniques or suggestions to improve GFS performance? 
The problem we have is best summarized by the output of a perl script that one of our frustrated developers has written: 
The script simply creates a file, then reads it, then removes it and prints out the time it takes for each of these instructions to complete.  Sometimes it takes only a second to do all three, while sometimes it takes as long as 15 seconds to do these 3 simple instructions.  I've run the script on all of the five GFS file systems and see the same response characteristics.  I've run the script when the systems and when they are fairly quiet and still see the same issue. 
To summarize the environment: 
RH ES3 update 6 (GFS 6.0) 11 node cluster environment with 3 redundant lock managers and five GFS file systems mounted on each participating servers.  The GFS file systems range from 100GB to 1.5 TB. 
The storage array is an EMC CX700 attached to a dual redundant SAN consisting four 2GB Brocade 3900 SAN switches.  
The HBA's in all servers are Qlogic qla2340's Firmware version:  3.03.14, Driver version 7.07.00 
The servers are all HP DL585 64bit AMD opteron server class machines each configured with between 8GB and 32GB of memory. 
..................................................................................................................................................................................................................................................... 
I've raised a support call with RedHat but according to their experts our configuration seems already to be set for optimum performance. 
RedHat provide a utility to get and set tunable gfs file systems parameters but there is next to no supporting documentation. 
So, is there anything I can do or am I missing something obvious that is just plainly mis-configured? 
Shown below is the GFS configuration summary derived from lock_gulmd -C and gfs_tool df for each file system. 
I'll be happy to supply any other information if it will help. 
Thanks to all in advance 
Paul McDowell 
lock_gulmd -C 
# hashed: 0x44164246 
cluster { 
 name = "cra_gfs" 
 lock_gulm { 
  heartbeat_rate = 15.000 
  allowed_misses = 3 
  coreport = 40040 
  new_connection_timeout = 15.000 
  # server cnt: 3 
  # servers = ["iclc1g.cra.applera.net", "iclc2g.cra.applera.net", "ccf001g.cra.applera.net"] 
  servers = ["172.20.8.21", "172.20.8.22", "172.20.8.51"] 
  lt_partitions = 4 
  lt_base_port = 41040 
  lt_high_locks = 20971520 
  lt_drop_req_rate = 300 
  prealloc_locks = 5000000 
  prealloc_holders = 11000000 
  prealloc_lkrqs = 60 
  ltpx_port = 40042 
  
#gfs_tool df 
/crx: 
  SB lock proto = "lock_gulm" 
  SB lock table = "cra_gfs:cra_crx" 
  SB ondisk format = 1308 
  SB multihost format = 1401 
  Block size = 4096 
  Journals = 11 
  Resource Groups = 1988 
  Mounted lock proto = "lock_gulm" 
  Mounted lock table = "cra_gfs:cra_crx" 
  Mounted host data = "" 
  Journal number = 0 
  Lock module flags = async 
  Local flocks = FALSE 
  Local caching = FALSE 
  Type           Total          Used Free           use%           
  ------------------------------------------------------------------------ 
  inodes         933593         933593 0              100% 
  metadata       943899         121868 822031         13% 
  data           128274180      58546879 69727301       46% 
[root at iclc1g tmp]# gfs_tool df /crx/data 
/crx/data: 
  SB lock proto = "lock_gulm" 
  SB lock table = "cra_gfs:cra_crxdata" 
  SB ondisk format = 1308 
  SB multihost format = 1401 
  Block size = 4096 
  Journals = 11 
  Resource Groups = 5970 
  Mounted lock proto = "lock_gulm" 
  Mounted lock table = "cra_gfs:cra_crxdata" 
  Mounted host data = "" 
  Journal number = 0 
  Lock module flags = async 
  Local flocks = FALSE 
  Local caching = FALSE 
  Type           Total          Used Free           use%           
  ------------------------------------------------------------------------ 
  inodes         3296091        3296091 0              100% 
  metadata       2649271        616186 2033085        23% 
  data           385236382      310495360 74741022       81% 
[root at iclc1g tmp]# gfs_tool df /crx/home 
/crx/home: 
  SB lock proto = "lock_gulm" 
  SB lock table = "cra_gfs:cra_crxhome" 
  SB ondisk format = 1308 
  SB multihost format = 1401 
  Block size = 4096 
  Journals = 11 
  Resource Groups = 3978 
  Mounted lock proto = "lock_gulm" 
  Mounted lock table = "cra_gfs:cra_crxhome" 
  Mounted host data = "" 
  Journal number = 0 
  Lock module flags = async 
  Local flocks = FALSE 
  Local caching = FALSE 
  Type           Total          Used Free           use%           
  ------------------------------------------------------------------------ 
  inodes         3477487        3477487 0              100% 
  metadata       3162164        341627 2820537        11% 
  data           254032093      157709829 96322264       62% 
[root at iclc1g tmp]# gfs_tool df /usr/local 
/usr/local: 
  SB lock proto = "lock_gulm" 
  SB lock table = "cra_gfs:cra_usrlocal" 
  SB ondisk format = 1308 
  SB multihost format = 1401 
  Block size = 4096 
  Journals = 11 
  Resource Groups = 394 
  Mounted lock proto = "lock_gulm" 
  Mounted lock table = "cra_gfs:cra_usrlocal" 
  Mounted host data = "" 
  Journal number = 0 
  Lock module flags = async 
  Local flocks = FALSE 
  Local caching = FALSE 
  Type           Total          Used Free           use%           
  ------------------------------------------------------------------------ 
  inodes         765762         765762 0              100% 
  metadata       582989         22854 560135         4% 
  data           24393837       9477084 14916753       39% 
[root at iclc1g tmp]# gfs_tool df /data 
/data: 
  SB lock proto = "lock_gulm" 
  SB lock table = "cra_gfs:cra_GQ" 
  SB ondisk format = 1308 
  SB multihost format = 1401 
  Block size = 4096 
  Journals = 11 
  Resource Groups = 1298 
  Mounted lock proto = "lock_gulm" 
  Mounted lock table = "cra_gfs:cra_GQ" 
  Mounted host data = "" 
  Journal number = 0 
  Lock module flags = async 
  Local flocks = FALSE 
  Local caching = FALSE 
  Type           Total          Used Free           use%           
  ------------------------------------------------------------------------ 
  inodes         10026          10026 0              100% 
  metadata       282680         189037 93643          67% 
  data           103761726      94277221 9484505        91% 
 - ATT61717.txt <<ATT61717.txt>> 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ATT61717.txt
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060712/ea9dc669/attachment.txt>


More information about the Linux-cluster mailing list