[Linux-cluster] Problem Installing GFS using GNBD

Tue Jul 18 09:54:38 UTC 2006

Hi All,
 I m new to Redhat GFS. I got the GSF code form
http://sources.redhat.com/cluster site (from CVS
With tag - TRHEL4). I compiled and installed the GSF for source code. I
followed the steps mentioned in
cluster/doc/min-gfs.txt file. I want to use GFS using GNBD server with 3
machines.

*The cluster/doc/min-gfs.txt file looks like this:
*

Minimum GFS How To
-----------------

The following gfs configuration requires a minimum amount of hardware and
no expensive storage system.  It's the cheapest and quickest way to "play"
with gfs.

  --------------       --------------
  | GNBD  |       | GNBD   |
  | client    |      | client     |       <-- these nodes use gfs
  | node2   |      | node3    |
  -------------       -------------
      |                |
      ------------------  IP network
               |
          --------------
          | GNBD   |
          | server    |                <-- this node doesn't use gfs
          | node1    |
          ---------------

- There are three machines to use with hostnames: node1, node2, node3

- node1 has an extra disk /dev/sda1 to use for gfs
  (this could be hda1 or an lvm LV or an md device)

- node1 will use gnbd to export this disk to node2 and node3

- Node1 cannot use gfs, it only acts as a gnbd server.
  (Node1 will /not/ actually be part of the cluster since it is only
   running the gnbd server.)

- Only node2 and node3 will be in the cluster and use gfs.
  (A two-node cluster is a special case for cman, noted in the config
below.)

- There's not much point to using clvm in this setup so it's left out.

- Download the "cluster" source tree.

- Build and install from the cluster source tree.  (The kernel components
  are not required on node1 which will only need the gnbd_serv program.)

    cd cluster
    ./configure --kernel_src=/path/to/kernel
    make; make install

- Create /etc/cluster/cluster.conf on node2 with the following contents:

<?xml version="1.0"?>
<cluster name="gamma" config_version="1">

<cman two_node="1" expected_votes="1">
</cman>

<clusternodes>
<clusternode name="node2">
 <fence>
  <method name="single">
   <device name="gnbd" ipaddr="node2"/>
  </method>
 </fence>
</clusternode>

<clusternode name="node3">
 <fence>
  <method name="single">
   <device name="gnbd" ipaddr="node3"/>
  </method>
 </fence>
</clusternode>
</clusternodes>

<fencedevices>
 <fencedevice name="gnbd" agent="fence_gnbd" servers="node1"/>
</fencedevices>

</cluster>

- load kernel modules on nodes

node2 and node3> modprobe gnbd
node2 and node3> modprobe gfs
node2 and node3> modprobe lock_dlm

- run the following commands

node1> gnbd_serv -n
node1> gnbd_export -c -d /dev/sda1 -e global_disk

node2 and node3> gnbd_import -i node1
node2 and node3> ccsd
node2 and node3> cman_tool join
node2 and node3> fence_tool join

node2> gfs_mkfs -p lock_dlm -t gamma:gfs1 -j 2 /dev/gnbd/global_disk

node2 and node3> mount -t gfs /dev/gnbd/global_disk /mnt

- the end, you now have a gfs file system mounted on node2 and node3

Appendix A
----------

To use manual fencing instead of gnbd fencing, the cluster.conf file
would look like this:

<?xml version="1.0"?>
<cluster name="gamma" config_version="1">

<cman two_node="1" expected_votes="1">
</cman>

<clusternodes>
<clusternode name="node2">
 <fence>
  <method name="single">
   <device name="manual" ipaddr="node2"/>
  </method>
 </fence>
</clusternode>

<clusternode name="node3">
 <fence>
  <method name="single">
   <device name="manual" ipaddr="node3"/>
  </method>
 </fence>
</clusternode>
</clusternodes>

<fencedevices>
 <fencedevice name="manual" agent="fence_manual"/>
</fencedevices>

</cluster>

FAQ
---

- Why can't node3 use gfs, too?

You might be able to make it work, but we recommend that you not try.
This software was not intended or designed to allow that kind of usage.

- Isn't node3 a single point of failure? how do I avoid that?

Yes it is.  For the time being, there's no way to avoid that, apart from
not using gnbd, of course.  Eventually, there will be a way to avoid this
using cluster mirroring.

- More info from
  http://sources.redhat.com/cluster/gnbd/gnbd_usage.txt
  http://sources.redhat.com/cluster/doc/usage.txt

*Following commands have been executed on node-1:
*[root at localhost ~]# gnbd_serv -n
gnbd_serv: startup succeeded
[root at localhost ~]# gnbd_export -c -d  /dev/sda5 -e global_disk
gnbd_export: created GNBD global_disk serving file /dev/sda5
[root at localhost ~]# gnbd_export -v
Server[1] : global_disk
--------------------------
      file : /dev/sda5
   sectors : 24820362
  readonly : no
    cached : yes
   timeout : no
       uid :

[root at localhost ~]# ps ax| grep gnbd
12571 ?        S      0:00 gnbd_serv -n
12607 ?        S      0:00 gnbd_serv -n
12609 pts/3    S+     0:00 grep gnbd
[root at localhost ~]#

*But I m getting following messages in /var/log/messages from node-1 (GNBD
server machine):
*Jul 18 14:34:06 localhost gnbd_serv[12571]: startup succeeded
Jul 18 14:37:35 localhost gnbd_serv[12571]: server process 12596 exited
because of signal 15
Jul 18 14:37:40 localhost gnbd_serv[12571]: server process 12597 exited
because of signal 15
Jul 18 14:37:45 localhost gnbd_serv[12571]: server process 12598 exited
because of signal 15
Jul 18 14:37:50 localhost gnbd_serv[12571]: server process 12599 exited
because of signal 15
Jul 18 14:37:55 localhost gnbd_serv[12571]: server process 12600 exited
because of signal 15
Jul 18 14:38:00 localhost gnbd_serv[12571]: server process 12601 exited
because of signal 15
Jul 18 14:38:05 localhost gnbd_serv[12571]: server process 12602 exited
because of signal 15
Jul 18 14:38:10 localhost gnbd_serv[12571]: server process 12603 exited
because of signal 15
Jul 18 14:38:15 localhost gnbd_serv[12571]: server process 12604 exited
because of signal 15
Jul 18 14:38:20 localhost gnbd_serv[12571]: server process 12605 exited
because of signal 15
Jul 18 14:38:25 localhost gnbd_serv[12571]: server process 12606 exited
because of signal 15

*Following commands have been executed on node-2 and node-3:
*[root at localhost ~]# modprobe gnbd
[root at localhost ~]# modprobe gfs
[root at localhost ~]# modprobe lock_dlm
[root at localhost ~]# gnbd_import -n -i 172.16.222.63
gnbd_import: created directory /dev/gnbd
gnbd_import: created gnbd device global_disk
gnbd_recvd: gnbd_recvd started
[root at localhost ~]# ccsd

*And following messages in /var/log/messages from node-2 and node-3 (GNBD
client mchines):
*Jul 18 09:09:19 localhost kernel: gnbd: registered device at major 252
Jul 18 09:09:21 localhost hald[2759]: Timed out waiting for hotplug event
318. Rebasing to 574
Jul 18 09:10:41 localhost kernel: CMAN <CVS> (built Jul 17 2006 09:01:33)
installed
Jul 18 09:10:41 localhost kernel: NET: Registered protocol family 30
Jul 18 09:10:41 localhost kernel: Lock_Harness <CVS> (built Jul 17 2006
09:01:49) installed
Jul 18 09:10:41 localhost kernel: gfs: no version for
"kcl_get_node_by_nodeid" found: kernel tainted.
Jul 18 09:10:41 localhost kernel: GFS <CVS> (built Jul 17 2006 09:02:14)
installed
Jul 18 09:10:57 localhost kernel: DLM <CVS> (built Jul 17 2006 09:01:45)
installed
Jul 18 09:10:57 localhost kernel: Lock_DLM (built Jul 17 2006 09:01:53)
installed
Jul 18 09:15:03 localhost gnbd_recvd[6334]: gnbd_recvd started
Jul 18 09:15:03 localhost kernel: resending requests
Jul 18 09:15:41 localhost gnbd_recvd[6334]: client lost connection with
172.16.222.63 : Broken pipe
Jul 18 09:15:41 localhost gnbd_recvd[6334]: reconnecting
Jul 18 09:15:41 localhost kernel: gnbd0: Receive control failed (result -32)
Jul 18 09:15:41 localhost kernel: gnbd0: shutting down socket
Jul 18 09:15:41 localhost kernel: exitting GNBD_DO_IT ioctl
Jul 18 09:15:46 localhost kernel: resending requests
Jul 18 09:15:51 localhost gnbd_recvd[6334]: client lost connection with
172.16.222.63 : Broken pipe
Jul 18 09:15:51 localhost gnbd_recvd[6334]: reconnecting
Jul 18 09:15:51 localhost kernel: gnbd0: Receive control failed (result -32)
Jul 18 09:15:51 localhost kernel: gnbd0: shutting down socket
Jul 18 09:15:51 localhost kernel: exitting GNBD_DO_IT ioctl
Jul 18 09:15:56 localhost kernel: resending requests
Jul 18 09:15:58 localhost ccsd[6336]: Starting ccsd DEVEL.1153141288:
Jul 18 09:15:58 localhost ccsd[6336]:  Built: Jul 17 2006 09:02:27
Jul 18 09:15:58 localhost ccsd[6336]:  Copyright (C) Red Hat, Inc.  2004
All rights reserved.
Jul 18 09:16:01 localhost gnbd_recvd[6334]: client lost connection with
172.16.222.63 : Broken pipe
Jul 18 09:16:01 localhost gnbd_recvd[6334]: reconnecting
Jul 18 09:16:01 localhost kernel: gnbd0: Receive control failed (result -32)
Jul 18 09:16:01 localhost kernel: gnbd0: shutting down socket
Jul 18 09:16:01 localhost kernel: exitting GNBD_DO_IT ioctl
Jul 18 09:16:06 localhost kernel: resending requests
Jul 18 09:16:11 localhost gnbd_recvd[6334]: client lost connection with
172.16.222.63 : Broken pipe
Jul 18 09:16:11 localhost gnbd_recvd[6334]: reconnecting
Jul 18 09:16:11 localhost kernel: gnbd0: Receive control failed (result -32)
Jul 18 09:16:11 localhost kernel: gnbd0: shutting down socket
Jul 18 09:16:11 localhost kernel: exitting GNBD_DO_IT ioctl
Jul 18 09:16:16 localhost kernel: resending requests
Jul 18 09:16:21 localhost gnbd_recvd[6334]: client lost connection with
172.16.222.63 : Broken pipe
Jul 18 09:16:21 localhost gnbd_recvd[6334]: reconnecting
Jul 18 09:16:21 localhost kernel: gnbd0: Receive control failed (result -32)
Jul 18 09:16:21 localhost kernel: gnbd0: shutting down socket
Jul 18 09:16:21 localhost kernel: exitting GNBD_DO_IT ioctl
Jul 18 09:16:26 localhost kernel: resending requests
Jul 18 09:16:27 localhost ccsd[6336]: Unable to connect to cluster
infrastructure after 30 seconds.
Jul 18 09:16:31 localhost gnbd_recvd[6334]: client lost connection with
172.16.222.63 : Broken pipe
Jul 18 09:16:31 localhost gnbd_recvd[6334]: reconnecting
Jul 18 09:16:31 localhost kernel: gnbd0: Receive control failed (result -32)
Jul 18 09:16:31 localhost kernel: gnbd0: shutting down socket
Jul 18 09:16:31 localhost kernel: exitting GNBD_DO_IT ioctl
Jul 18 09:16:57 localhost ccsd[6336]: Unable to connect to cluster
infrastructure after 60 seconds.
Jul 18 09:17:27 localhost ccsd[6336]: Unable to connect to cluster
infrastructure after 90 seconds.
Jul 18 09:17:57 localhost ccsd[6336]: Unable to connect to cluster
infrastructure after 120 seconds.
Jul 18 09:18:27 localhost ccsd[6336]: Unable to connect to cluster
infrastructure after 150 seconds.
Jul 18 09:18:57 localhost ccsd[6336]: Unable to connect to cluster
infrastructure after 180 seconds.
Jul 18 09:19:27 localhost ccsd[6336]: Unable to connect to cluster
infrastructure after 210 seconds.
Jul 18 09:19:57 localhost ccsd[6336]: Unable to connect to cluster
infrastructure after 240 seconds.
Jul 18 09:20:27 localhost ccsd[6336]: Unable to connect to cluster
infrastructure after 270 seconds.
Jul 18 09:20:57 localhost ccsd[6336]: Unable to connect to cluster
infrastructure after 300 seconds.
Jul 18 09:21:27 localhost ccsd[6336]: Unable to connect to cluster
infrastructure after 330 seconds.
Jul 18 09:21:57 localhost ccsd[6336]: Unable to connect to cluster
infrastructure after 360 seconds.
Jul 18 09:22:28 localhost ccsd[6336]: Unable to connect to cluster
infrastructure after 390 seconds.

*My /etc/cluster/cluster.conf file looks like:
*

<?xml version="1.0"?>
<cluster name="gamma" config_version="1">

<cman two_node="1" expected_votes="1">
</cman>

<clusternodes>
<clusternode name="172.16.222.128">
 <fence>
  <method name="single">
   <device name="gnbd" ipaddr="172.16.222.128"/>
  </method>
 </fence>
</clusternode>

<clusternode name="172.16.222.62">
 <fence>
  <method name="single">
   <device name="gnbd" ipaddr="172.16.222.62"/>
  </method>
 </fence>
</clusternode>
</clusternodes>

<fencedevices>
 <fencedevice name="gnbd" agent="fence_gnbd" servers="172.16.222.63"/>
</fencedevices>

</cluster>

* If i m using GNBD to export a disk partition from node-1 (GNBD server) and
importing that partition using GNBD_IMPORT command from node-2 and node-3,*
*then I can create the file system on that exported device.*
**
*But in above case I m failing.*

*And finally I m not able to use GFS. If any body has any idea please help
me...*

With Regards
Rajesh.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060718/ba64bad1/attachment.htm>