[Linux-cluster] Throughput drops with VMware GFS2 cluster when using fence_scsi

Thu Feb 23 16:56:57 UTC 2012

Hi.

I'm testing a two-node virtual-host CentOS 6.2 (2.6.32-220.4.2.el6.x86_64)
GFS2 cluster running on the following hardware:

Two physical hosts, running VMware ESXi 5.0.0
EqualLogic PS6000XV iSCSI SAN

I have exported a 200GB shared LUN that the virtual hosts have mounted
as a Mapped Raw LUN (physical compatibility mode) using the LSI Logic
Parallel adapter.  The hosts are using clvmd.

When I have not explicitly set any fence devices, the throughput is
quite fast.  Testing the nodes concurrently with bonnie++ (the
sequential block testing is representative of what my real-life
workload will be) shows that it's almost as fast as a "local" ext4
device.

When fence_scsi is used (my cluster.conf is included below), the
throughput drops to 1/10th of the no fencing test.  Is this normal?
I've tried enabled SCSI debugging while the test was in progress,
and nothing popped out at me.  I have tried both manually setting
the arguments to fence_scsi and allowing it to determine them on its
own, with the same results.  I also get the same results with one node
brought down, and the other node mounting the filesystem
with lock_nolock.  The node network traffic (through a secondary
interface) is minimal. I have tried different mount options (noatime)
and schedulers (deadline works best), but they offer only modest
performance gains.

I don't know if the following is normal or worth mentioning, but I have
seen that nodes will sometime register their keys multiple times.  Here
node1 has done it (but node2 has done it before as well):

# sg_persist --read-keys /dev/sdb
  EQLOGIC   100E-00           5.2
  Peripheral device type: disk
  PR generation=0x1323d, 4 registered reservation keys follow:
    0x131e0001
    0x131e0001
    0x131e0001
    0x131e0002

I'd be interested in hearing if anyone else has experienced poor
throughput with fence_scsi, or if this is a result of my
misconfiguration of cluster.conf.  I had wanted to do in-band fencing to
simplify my configuration, but I will consider out-of-band fencing
(perhaps using VMware) if I can't resolve this issue.

Thanks.

  Best regards,
    Greg

No explicit fencing:
Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
node1      15744M 70741  82 130502  25 45511  10 53395  75 102164   4  1162   3
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16  1806  28 +++++ +++ 12847  54  1755  31 +++++ +++ 12129  52
node1,15744M,70741,82,130502,25,45511,10,53395,75,102164,4,1161.5,3,16,1806,28,+++++,+++,12847,54,1755,31,+++++,+++,12129,52

Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
node2        15744M 71529  82 122064  20 45717   7 62156  68 102410
5 892.6   2
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16  1663  27 +++++ +++ 16188  59  1685  30 +++++ +++ 10400  52
node2,15744M,71529,82,122064,20,45717,7,62156,68,102410,5,892.6,2,16,1663,27,+++++,+++,16188,59,1685,30,+++++,+++,10400,52

With fence_scsi fencing:
Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
node1      15744M  9939  12 10372   2  5042   1 11753  17 12220   0 756.4   2
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16   587  10 +++++ +++ 10646  44   597  12 +++++ +++ 11742  52
node1,15744M,9939,12,10372,2,5042,1,11753,17,12220,0,756.4,2,16,587,10,+++++,+++,10646,44,597,12,+++++,+++,11742,52

Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
node2        15744M 10335  12 10740   1  4960   0 11761  13 12197   0 730.9   1
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16   603  10 +++++ +++ 11602  49   610  11 +++++ +++ 12739  51
node2,15744M,10335,12,10740,1,4960,0,11761,13,12197,0,730.9,1,16,603,10,+++++,+++,11602,49,610,11,+++++,+++,12739,51

/etc/cluster/cluster.conf:
<?xml version="1.0"?>
<cluster name="centralcluster" config_version="2">

  <cman two_node="1" expected_votes="1"/>
  <clusternodes>
    <clusternode name="node1" votes="1" nodeid="1">
      <fence>
        <method name="scsi">
          <device name="scsi_dev" key="131e0001" action="off"/>
        </method>
      </fence>
      <unfence>
        <device name="scsi_dev" key="131e0001" action="on"/>
      </unfence>
    </clusternode>
    <clusternode name="node2" votes="1" nodeid="2">
      <fence>
        <method name="scsi">
          <device name="scsi_dev" key="131e0002" action="off"/>
        </method>
      </fence>
      <unfence>
        <device name="scsi_dev" key="131e0002" action="on"/>
      </unfence>
    </clusternode>
  </clusternodes>

  <fencedevices>
    <fencedevice agent="fence_scsi" name="scsi_dev" devices="/dev/sdb"/>
  </fencedevices>

  <rm>
    <failoverdomains/>
    <resources/>
  </rm>

  <logging>
    <logging_daemon name="corosync" debug="on"/>
    <logging_daemon name="fenced" debug="on"/>
    <logging_daemon name="dlm_controld" debug="on"/>
    <logging_daemon name="gfs_controld" debug="on"/>
  </logging>

</cluster>