[Linux-cluster] Linux clustering (one-node), GFS, iSCSI, clvmd (lock problem)
Paul Risenhoover
prisenhoover at sampledigital.com
Tue Oct 16 18:15:30 UTC 2007
berthiaume_wayne at emc.com wrote:
> I think it's because clvmd is trying to acquire the iSCSI LUNs and the
> iSCSI driver has not come up fully yet. The network layer has to come
> up, then iSCSI, then there iss a separate mount with a separate
> liesystem tag _netdev that tells mount to wait for these. I'm not sue if
> the same capabilities are in LVM to accommodate when an iSCSI device
> comes up. This may by the reason they are missed by LVM.
>
Yes it is true that when the system initially boots, iSCSI is not
available. But even after it boots, and the iSCSI device is available,
I cannot manually make the device active.
And furthermore, as of this morning, I've got the same problem. I just
added a new 3TB physical device.
Performed the following
pvcreate /dev/sdj
vgextend nasvg_00 /dev/sjd
then...
[root at flax ~]# lvextend -l 2145765 /dev/nasvg_00/lvol0
Extending logical volume lvol0 to 8.19 TB
Error locking on node flax: Volume group for uuid not found:
oNhRO1WqNJp3BZxxrlMT16dwpwcRiIQPejnrEUbQ3HMJ6BjHef1hKAsoA6Sl9ISS
Failed to suspend lvol0
I tried modifying lvm.conf to set the locking_type back to 2, but since
the file system is in use it won't make the change.
Any thoughts?
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Paul Risenhoover
> Sent: Tuesday, October 16, 2007 12:52 AM
> To: Linux-cluster at redhat.com
> Subject: [Linux-cluster] Linux clustering (one-node), GFS, iSCSI,clvmd
> (lock problem)
>
> Hi All,
>
> I am a noob to this maillist, but I've got some kind of locking problem
> with Linux and clusters, and iSCSI that plagues me. It's a pretty
> serious issue because every time I reboot my server, it fails to mount
> my primary iSCSI device out of the box, and in order to get it working,
> I have to perform some pretty manual operations to get it operational
> again.
>
> Here is some configuration information:
>
> Linux flax.xxx.com 2.6.9-55.0.9.ELsmp #1 SMP Thu Sep 27 18:27:41 EDT
> 2007 i686 i686 i386 GNU/Linux
>
> [root at flax ~]# clvmd -V
> Cluster LVM daemon version: 2.02.21-RHEL4 (2007-04-17)
> Protocol version: 0.2.1
>
> dmesg (excerpted)
> iscsi-sfnet: Loading iscsi_sfnet version 4:0.1.11-3
> iscsi-sfnet: Control device major number 254
> iscsi-sfnet:host3: Session established
> scsi3 : SFNet iSCSI driver
> Vendor: Promise Model: VTrak M500i Rev: 2211
> Type: Direct-Access ANSI SCSI revision: 04
> sdh : very big device. try to use READ CAPACITY(16).
> SCSI device sdh: 5859373056 512-byte hdwr sectors (2999999 MB)
> SCSI device sdh: drive cache: write back
> sdh : very big device. try to use READ CAPACITY(16).
> SCSI device sdh: 5859373056 512-byte hdwr sectors (2999999 MB)
> SCSI device sdh: drive cache: write back
> sdh: unknown partition table
>
> [root at flax ~]# clustat
> Member Status: Quorate
>
> Member Name Status
> ------ ---- ------
> flax Online, Local, rgmanager
>
> YES, THIS IS A ONE-NODE CLUSTER (Which, I suspect, might be the problem)
>
> SYMPTOM:
>
> When the server comes up, the clustered logical volume that is on the
> iSCSI device is labeled "inactive" when I do an "lvscan:"
> [root at flax ~]# lvscan
> inactive '/dev/nasvg_00/lvol0' [5.46 TB] inherit
> ACTIVE '/dev/lgevg_00/lvol0' [3.55 TB] inherit
> ACTIVE '/dev/noraidvg_01/lvol0' [546.92 GB] inherit
> ACTIVE '/dev/VolGroup00/LogVol00' [134.47 GB] inherit
> ACTIVE '/dev/VolGroup00/LogVol01' [1.94 GB] inherit
>
> The thing that's interesting is the lgevg_00 and the noraidvg_01 volumes
>
> are also clustered, but they are direct-attached SCSI (ie, not ISCSI).
>
> The volume group that the logical volume is a member of shows clean:
> [root at flax ~]# vgscan
> Reading all physical volumes. This may take a while...
> Found volume group "nasvg_00" using metadata type lvm2
> Found volume group "lgevg_00" using metadata type lvm2
> Found volume group "noraidvg_01" using metadata type lvm2
>
> So, in order to fix this, I execute the following:
>
> [root at flax ~]# lvchange -a y /dev/nasvg_00/lvol0
> Error locking on node flax: Volume group for uuid not found:
> oNhRO1WqNJp3BZxxrlMT16dwpwcRiIQPejnrEUbQ3HMJ6BjHef1hKAsoA6Sl9ISS
>
> This also shows up in my syslog, as such:
> Oct 13 11:27:40 flax vgchange: Error locking on node flax: Volume
> group for uuid not found:
> oNhRO1WqNJp3BZxxrlMT16dwpwcRiIQPejnrEUbQ3HMJ6BjHef1hKAsoA6Sl9ISS
>
> RESOLUTION:
>
> It took me a very long time to figure this out, but since it happens to
> me every time I reboot my server, somebody's bound to run into this
> again sometime soon (and it will probably be me).
>
> Here's how I resolved it:
>
> I edited the /etc/lvm/lvm.conf file as such:
>
> was:
> # Type of locking to use. Defaults to local file-based locking (1).
> # Turn locking off by setting to 0 (dangerous: risks metadata
> corruption
> # if LVM2 commands get run concurrently).
> # Type 2 uses the external shared library locking_library.
> # Type 3 uses built-in clustered locking.
> #locking_type = 1
> locking_type = 3
>
> changed to:
>
> (snip)
> # Type 3 uses built-in clustered locking.
> #locking_type = 1
> locking_type = 2
>
> Then, restart clvmd as such:
> [root at flax ~]# service clvmd restart
>
> Then:
> [root at flax ~]# lvchange -a y /dev/nasvg_00/lvol0
> [root at flax ~]#
>
> (see, no error!)
> [root at flax ~]# lvscan
> ACTIVE '/dev/nasvg_00/lvol0' [5.46 TB] inherit
> ACTIVE '/dev/lgevg_00/lvol0' [3.55 TB] inherit
> ACTIVE '/dev/noraidvg_01/lvol0' [546.92 GB] inherit
> ACTIVE '/dev/VolGroup00/LogVol00' [134.47 GB] inherit
> ACTIVE '/dev/VolGroup00/LogVol01' [1.94 GB] inherit
>
> (it's active!)
>
> Then, go back and modify /etc/lvm/lvm.conf to restore the original
> locking_type to 3
> Then, restart clvmd.
>
> THOUGHTS:
>
> I admit I don't know much about clustering, but from the evidence I see,
>
> the problem appears to be isolated to clvmd and iSCSI, if only for the
> fact that my direct-attached clustered volumes don't exhibit the
> symptoms.
>
> I'll make another leap here and guess that it's probably isolated to
> single-node clusters, since I'd imagine that most people who are using
> clustering are probably using clustering as it was intended to be used
> (ie, multiple machines).
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
More information about the Linux-cluster
mailing list