[Linux-cluster] Linux clustering (one-node), GFS, iSCSI, clvmd (lock problem)

Tue Oct 16 18:15:30 UTC 2007

berthiaume_wayne at emc.com wrote:
> I think it's because clvmd is trying to acquire the iSCSI LUNs and the
> iSCSI driver has not come up fully yet. The network layer has to come
> up, then iSCSI, then there iss a separate mount with a separate
> liesystem tag _netdev that tells mount to wait for these. I'm not sue if
> the same capabilities are in LVM to accommodate when an iSCSI device
> comes up. This may by the reason they are missed by LVM.
>   
Yes it is true that when the system initially boots, iSCSI is not 
available.  But even after it boots, and the iSCSI device is available, 
I cannot manually make the device active.

And furthermore, as of this morning, I've got the same problem.  I just 
added a new 3TB physical device.

Performed the following
pvcreate /dev/sdj
vgextend nasvg_00 /dev/sjd
then...

[root at flax ~]# lvextend -l 2145765 /dev/nasvg_00/lvol0
  Extending logical volume lvol0 to 8.19 TB
  Error locking on node flax: Volume group for uuid not found: 
oNhRO1WqNJp3BZxxrlMT16dwpwcRiIQPejnrEUbQ3HMJ6BjHef1hKAsoA6Sl9ISS
  Failed to suspend lvol0

I tried modifying lvm.conf to set the locking_type back to 2, but since 
the file system is in use it won't make the change.

Any thoughts?
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Paul Risenhoover
> Sent: Tuesday, October 16, 2007 12:52 AM
> To: Linux-cluster at redhat.com
> Subject: [Linux-cluster] Linux clustering (one-node), GFS, iSCSI,clvmd
> (lock problem)
>
> Hi All,
>
> I am a noob to this maillist, but I've got some kind of locking problem 
> with Linux and clusters, and iSCSI that plagues me.  It's a pretty 
> serious issue because every time I reboot my server, it fails to mount 
> my primary iSCSI device out of the box, and in order to get it working, 
> I have to perform some pretty manual operations to get it operational
> again.
>
> Here is some configuration information:
>
> Linux flax.xxx.com 2.6.9-55.0.9.ELsmp #1 SMP Thu Sep 27 18:27:41 EDT 
> 2007 i686 i686 i386 GNU/Linux
>
> [root at flax ~]# clvmd -V
> Cluster LVM daemon version: 2.02.21-RHEL4 (2007-04-17)
> Protocol version:           0.2.1
>
> dmesg (excerpted)
> iscsi-sfnet: Loading iscsi_sfnet version 4:0.1.11-3
> iscsi-sfnet: Control device major number 254
> iscsi-sfnet:host3: Session established
> scsi3 : SFNet iSCSI driver
>   Vendor: Promise   Model: VTrak M500i       Rev: 2211
>   Type:   Direct-Access                      ANSI SCSI revision: 04
> sdh : very big device. try to use READ CAPACITY(16).
> SCSI device sdh: 5859373056 512-byte hdwr sectors (2999999 MB)
> SCSI device sdh: drive cache: write back
> sdh : very big device. try to use READ CAPACITY(16).
> SCSI device sdh: 5859373056 512-byte hdwr sectors (2999999 MB)
> SCSI device sdh: drive cache: write back
>  sdh: unknown partition table
>
> [root at flax ~]# clustat
> Member Status: Quorate
>
>   Member Name                              Status
>   ------ ----                              ------
>   flax                                     Online, Local, rgmanager
>
> YES, THIS IS A ONE-NODE CLUSTER (Which, I suspect, might be the problem)
>
> SYMPTOM:
>
> When the server comes up, the clustered logical volume that is on the 
> iSCSI device is labeled "inactive" when I do an "lvscan:"
> [root at flax ~]# lvscan
>   inactive            '/dev/nasvg_00/lvol0' [5.46 TB] inherit
>   ACTIVE            '/dev/lgevg_00/lvol0' [3.55 TB] inherit
>   ACTIVE            '/dev/noraidvg_01/lvol0' [546.92 GB] inherit
>   ACTIVE            '/dev/VolGroup00/LogVol00' [134.47 GB] inherit
>   ACTIVE            '/dev/VolGroup00/LogVol01' [1.94 GB] inherit
>  
> The thing that's interesting is the lgevg_00 and the noraidvg_01 volumes
>
> are also clustered, but they are direct-attached SCSI (ie, not ISCSI).
>
> The volume group that the logical volume is a member of shows clean:
> [root at flax ~]# vgscan
>   Reading all physical volumes.  This may take a while...
>   Found volume group "nasvg_00" using metadata type lvm2
>   Found volume group "lgevg_00" using metadata type lvm2
>   Found volume group "noraidvg_01" using metadata type lvm2
>
> So, in order to fix this, I execute the following:
>
> [root at flax ~]# lvchange -a y /dev/nasvg_00/lvol0
> Error locking on node flax: Volume group for uuid not found: 
> oNhRO1WqNJp3BZxxrlMT16dwpwcRiIQPejnrEUbQ3HMJ6BjHef1hKAsoA6Sl9ISS
>
> This also shows up in my syslog, as such:
> Oct 13 11:27:40 flax vgchange:   Error locking on node flax: Volume 
> group for uuid not found: 
> oNhRO1WqNJp3BZxxrlMT16dwpwcRiIQPejnrEUbQ3HMJ6BjHef1hKAsoA6Sl9ISS
>
> RESOLUTION:
>
> It took me a very long time to figure this out, but since it happens to 
> me every time I reboot my server, somebody's bound to run into this 
> again sometime soon (and it will probably be me).
>
> Here's how I resolved it:
>
> I edited the /etc/lvm/lvm.conf file as such:
>
> was:
>     # Type of locking to use. Defaults to local file-based locking (1).
>     # Turn locking off by setting to 0 (dangerous: risks metadata
> corruption
>     # if LVM2 commands get run concurrently).
>     # Type 2 uses the external shared library locking_library.
>     # Type 3 uses built-in clustered locking.
>     #locking_type = 1
>     locking_type = 3
>
> changed to:
>
> (snip)
>     # Type 3 uses built-in clustered locking.
>     #locking_type = 1
>     locking_type = 2
>
> Then, restart clvmd as such:
> [root at flax ~]# service clvmd restart
>
> Then:
> [root at flax ~]# lvchange -a y /dev/nasvg_00/lvol0
> [root at flax ~]#
>
> (see, no error!)
> [root at flax ~]# lvscan
>   ACTIVE            '/dev/nasvg_00/lvol0' [5.46 TB] inherit
>   ACTIVE            '/dev/lgevg_00/lvol0' [3.55 TB] inherit
>   ACTIVE            '/dev/noraidvg_01/lvol0' [546.92 GB] inherit
>   ACTIVE            '/dev/VolGroup00/LogVol00' [134.47 GB] inherit
>   ACTIVE            '/dev/VolGroup00/LogVol01' [1.94 GB] inherit
>
> (it's active!)
>
> Then, go back and modify /etc/lvm/lvm.conf to restore the original 
> locking_type to 3
> Then, restart clvmd.
>
> THOUGHTS:
>
> I admit I don't know much about clustering, but from the evidence I see,
>
> the problem appears to be isolated to clvmd and iSCSI, if only for the 
> fact that my direct-attached clustered volumes don't exhibit the
> symptoms.
>
> I'll make another leap here and guess that it's probably isolated to 
> single-node clusters, since I'd imagine that most people who are using 
> clustering are probably using clustering as it was intended to be used 
> (ie, multiple machines).
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>