[Linux-cluster] GFS + CORAID Performance Problem

Sun Dec 10 08:03:10 UTC 2006

I've just set up a new two-node GFS cluster on a CORAID sr1520
ATA-over-Ethernet.  My nodes are each quad dual-core Opteron CPU systems
with 32GB RAM each.  The CORAID unit exports a 1.6TB block device that I
have a GFS file system on.

I seem to be having performance issues where certain read system calls take
up to three seconds to complete.  My test app is bonnie++, and the
slow-downs appear to be happen in the "Rewriting" portion of the test,
though I'm not sure if this is exclusive.  If I watch top and iostat for the
device in question, I see activity on the device, then long (up to three
second) periods of no apparent I/O.  During the periods of no I/O the
bonnie++ process is blocked on disk I/O, so it seems that the system it
trying to do something.  Network traces seem to show that the host machine
is not waiting on the RAID array, and the packet following the dead-period
seems to always be sent from the host to the coraid device.  Unfortunately,
I don't know how to dig in any deeper to figure out what the problem is.

Below are strace and tcpdump snippets that show what I'm talking about.
Notice the time stamps and the time spent in system calls in <> brackets
after the call.  I'm quite far from a GFS expert, so please let me know if
other data would be helpful.

Any help is much appreciated.

Thanks!
Tom

# tcpdump -s14 -n
...
23:46:40.382119 00:30:48:8b:2b:47 > 00:30:48:57:a7:ed, ethertype Unknown
(0x88a2), length 4132:
23:46:40.382131 00:30:48:8b:2b:47 > 00:30:48:57:a7:ed, ethertype Unknown
(0x88a2), length 4132:
23:46:43.406173 00:30:48:57:a7:ed > 00:30:48:8b:2b:47, ethertype Unknown
(0x88a2), length 60:
23:46:43.406495 00:30:48:8b:2b:47 > 00:30:48:57:a7:ed, ethertype Unknown
(0x88a2), length 4132:
23:46:43.406502 00:30:48:8b:2b:47 > 00:30:48:57:a7:ed, ethertype Unknown
(0x88a2), length 1060:

# strace -p 19845 -T -tt -s 5c
...
23:46:40.380764 write(15, "\301"..., 8192) = 8192 <0.000024>
23:46:40.380814 read(15, "\301"..., 8192) = 8192 <3.026205>
23:46:43.407054 lseek(15, 3899392, SEEK_SET) = 3899392 < 0.000006>
23:46:43.407090 write(15, "\301"..., 8192) = 8192 <0.000032>

[root at gfs03 ~]# gfs_tool counters /opt/coraid
                                  locks 6651
                             locks held 6585
                          incore inodes 69
                       metadata buffers 2519
                        unlinked inodes 0
                              quota IDs 0
                     incore log buffers 1
                         log space used 0.20%
              meta header cache entries 0
                     glock dependencies 0
                 glocks on reclaim list 0
                              log wraps 5
                   outstanding LM calls 0
                  outstanding BIO calls 1
                       fh2dentry misses 0
                       glocks reclaimed 755
                         glock nq calls 104485385
                         glock dq calls 104426553
                   glock prefetch calls 1
                          lm_lock calls 73715
                        lm_unlock calls 651
                           lm callbacks 81471
                     address operations 112591696
                      dentry operations 257
                      export operations 0
                        file operations 29841238
                       inode operations 1929
                       super operations 28201660
                          vm operations 0
                        block I/O reads 3448075
                       block I/O writes 36719795
[root at gfs03 ~]# gfs_tool gettune /opt/coraid
ilimit1 = 100
ilimit1_tries = 3
ilimit1_min = 1
ilimit2 = 500
ilimit2_tries = 10
ilimit2_min = 3
demote_secs = 300
incore_log_blocks = 1024
jindex_refresh_secs = 60
depend_secs = 60
scand_secs = 5
recoverd_secs = 60
logd_secs = 1
quotad_secs = 5
inoded_secs = 15
quota_simul_sync = 64
quota_warn_period = 10
atime_quantum = 3600
quota_quantum = 60
quota_scale = 1.0000   (1, 1)
quota_enforce = 1
quota_account = 1
new_files_jdata = 0
new_files_directio = 0
max_atomic_write = 4194304
max_readahead = 262144
lockdump_size = 131072
stall_secs = 600
complain_secs = 10
reclaim_limit = 5000
entries_per_readdir = 32
prefetch_secs = 10
statfs_slots = 64
max_mhc = 10000
greedy_default = 100
greedy_quantum = 25
greedy_max = 250
rgrp_try_threshold = 100
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20061210/0ab9a1dc/attachment.htm>