[Linux-cluster] Expanding a LUN and a GFS2 filesystem

Tue Jan 24 21:01:53 UTC 2012

I am running CentOS with a GFS2 filesystem on a Dell EqualLogic SAN.  I
created the filesystem by mapping an RDM through VMWare to the guest
OS.  I used pvcreate, vgcreate, lvcreate, and mkfs.gfs2 to create the
filesystem and the underlying architecture.  I've included the log I
created to document the process below.

I've already increased the size of the LUN on the SAN.  Now, how do I
increase the size of the GFS2 filesystem and the LVM beneath it?  Do I
need to do something with the PV and VG as well? 

Thanks in advance for your help.

Wes

Here is the log of the process I used to create the filesystem:

    With the RDM created and all the daemons started (luci, ricci, cman)
    now I can config GFS.  Make sure they are running on all of our nodes.
    We can even see the RDM on the guest systems:

    [root at test03]# ls /dev/sdb
    /dev/sdb

    [root at test04]# ls /dev/sdb
    /dev/sdb

    So we are doing this using lvm clustering: 
    http://emrahbaysal.blogspot.com/2011/03/gfs-cluster-on-vmware-vsphere-rh...
    <http://emrahbaysal.blogspot.com/2011/03/gfs-cluster-on-vmware-vsphere-rhel.html>
    and http://linuxdynasty.org/215/howto-setup-gfs2-with-clustering/

    We've already set up gfs daemons and fencing and whatnot.
    Before we start to create the LVM2 volumes and Proceed to GFS2, we
    will need to enable clustering in LVM2.

    [root at test03]# lvmconf --enable-cluster

    I try to create the cluster FS
    [root at test03]# pvcreate /dev/sdb
      connect() failed on local socket: No such file or directory
      Internal cluster locking initialisation failed.
      WARNING: Falling back to local file-based locking.
      Volume Groups with the clustered attribute will be inaccessible.
      Physical volume "/dev/sdb" successfully created
    One internet source says:

    >> That indicates that you have cluster locking enabled but that the cluster LVM
    >> daemon (clvmd) is not running.

    So let's start it,

    [root at test03]# service clvmd status
    clvmd is stopped
    [root at test03]# service clvmd start
    Starting clvmd:
    Activating VG(s):   2 logical volume(s) in volume group "VolGroup00" now active
      clvmd not running on node test04
                                                               [  OK  ]
    [root at test03]# chkconfig clvmd on

    Okay, over on the other node:

    [root at test04]# service clvmd status
    clvmd is stopped
    [root at test04]# service clvmd start
    Starting clvmd: clvmd could not connect to cluster manager
    Consult syslog for more information
    [root at test04]# service cman status
    groupd is stopped
    [root at test04]# service cman start
    Starting cluster:
       Loading modules... done
       Mounting configfs... done
       Starting ccsd... done
       Starting cman... done
       Starting daemons... done
       Starting fencing... done
                                                               [  OK  ]
    [root at test04]# chkconfig cman on
    [root at test04]# service luci status
    luci is running...
    [root at test04]# service ricci status
    ricci (pid  4381) is running...
    [root at test04]# chkconfig ricci on
    [root at test04]# chkconfig luci on
    [root at test04]# service clvmd start
    Starting clvmd:
    Activating VG(s):   2 logical volume(s) in volume group "VolGroup00" now active
                                                               [  OK  ]

    And this time, no complaints:

    [root at test03]# service clvmd restart
    Restarting clvmd:                                          [  OK  ]

    Try again with pvcreate:

    [root at test03]# pvcreate /dev/sdb
      Physical volume "/dev/sdb" successfully created

    Create volume group:

    [root at test03]# vgcreate gdcache_vg /dev/sdb
      Clustered volume group "gdcache_vg" successfully created

    Create logical volume:

    [root at test03]# lvcreate -n gdcache_lv -L 2T gdcache_vg
      Logical volume "gdcache_lv" created

    Create GFS filesystem, ahem, GFS2 filesystem.  I screwed this up the
    first time.

    [root at test03]# mkfs.gfs2 -j 8 -p lock_dlm -t gdcluster:gdcache -j 4 /dev/mapper/gdcache_vg-gdcache_lv     
    This will destroy any data on /dev/mapper/gdcache_vg-gdcache_lv.
      It appears to contain a gfs filesystem.

    Are you sure you want to proceed? [y/n] y

    Device:                    /dev/mapper/gdcache_vg-gdcache_lv
    Blocksize:                 4096
    Device Size                2048.00 GB (536870912 blocks)
    Filesystem Size:           2048.00 GB (536870910 blocks)
    Journals:                  4
    Resource Groups:           8192
    Locking Protocol:          "lock_dlm"
    Lock Table:                "gdcluster:gdcache"
    UUID:                      0542628C-D8B8-2480-F67D-081435F38606

    Okay!  And!  Finally!  We mount it!

    [root at test03]# mount /dev/mapper/gdcache_vg-gdcache_lv /data
    /sbin/mount.gfs: fs is for a different cluster
    /sbin/mount.gfs: error mounting lockproto lock_dlm

    Wawawwah.  Bummer.
    /var/log/messages says:

    Jan 19 14:21:05 test03 gfs_controld[3369]: mount: fs requires cluster="gdcluster" current="gdao_cluster"

    Someone on the interwebs concurs:

    the cluster name defined in /etc/cluster/cluster.conf is different
    from the one tagged on the GFS volume.

    Okay, so looking at cluster.conf:

    [root at test03]# vi /etc/cluster/cluster.conf

    <?xml version="1.0"?>
    <cluster config_version="25" name="gdao_cluster">

    Let's change that to match how I named the cluster in the above cfg_mkfs

    [root at test03]# vi /etc/cluster/cluster.conf

    <?xml version="1.0"?>
    <cluster config_version="25" name="gdcluster">

    And restart some stuff:

    [root at test03]# /etc/init.d/gfs2 stop
    [root at test03]# service luci stop
    Shutting down luci: service ricci                          [  OK  ]
    [root at test03]# service ricci stop
    Shutting down ricci:                                       [  OK  ]
    [root at test03]# service cman stop
    Stopping cluster:
       Stopping fencing... done
       Stopping cman... failed
    /usr/sbin/cman_tool: Error leaving cluster: Device or resource busy
                                                               [FAILED]

    [root at test03]# cman_tool leave force

    [root at test03]# service cman stop
    Stopping cluster:
       Stopping fencing... done
       Stopping cman... done
       Stopping ccsd... done
       Unmounting configfs... done
                                                               [  OK  ]

    AAAARRRRGGGHGHHH

    [root at test03]# service ricci start
    Starting ricci:                                            [  OK  ]
    [root at test03]# service luci start
    Starting luci:                                             [  OK  ]

    Point your web browser to https://test03.gdao.ucsc.edu:8084 to access luci

    [root at test03]# service gfs2 start
    [root at test03]# service cman start
    Starting cluster:
       Loading modules... done
       Mounting configfs... done
       Starting ccsd... done
       Starting cman... done
       Starting daemons... done
       Starting fencing... failed

                                                               [FAILED]

    I had to reboot. 

    [root at test03]# service luci status
    luci is running...
    [root at test03]# service ricci status
    ricci (pid  4385) is running...
    [root at test03]# service cman status
    cman is running.
    [root at test03]# service gfs2 status

    Okay, again?

    [root at test03]# mount /dev/mapper/gdcache_vg-gdcache_lv /data

    Did that just work?  And on test04

    [root at test04]# mount /dev/mapper/gdcache_vg-gdcache_lv /data

    Okay, how about a test:

    [root at test03]# touch /data/killme

    And then we look on the other node:

    [root at test04]# ls /data
    killme

    Holy shit. 
    I've been working so hard for this moment that I don't completely
    know what to do now.
    Question is, now that I have two working nodes, can I duplicate it?

    Okay, finish up:

    [root at test03]# chkconfig rgmanager on
    [root at test03]# service rgmanager start
    Starting Cluster Service Manager:                          [  OK  ]
    [root at test03]# vi /etc/fstab

    /dev/mapper/gdcache_vg-gdcache_lv /data         gfs2    defaults,noatime,nodiratime 0 0

    and on the other node:

    [root at test04]# chkconfig rgmanager on
    [root at test04]# service rgmanager start
    Starting Cluster Service Manager:
    [root at test04]# vi /etc/fstab

    /dev/mapper/gdcache_vg-gdcache_lv /data         gfs2    defaults,noatime,nodiratime 0 0

     And it works.  Hell, yeah.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120124/4e065301/attachment.htm>