[Linux-cluster] weird happenings on my cluster and another panic.

Sat Oct 28 03:14:02 UTC 2006

Im 99% sure that these disks are in the shared/clusterd mode.
Ill update my rpms from http://mirror.centos.org/centos/4/csgfs/i386/RPMS/
and see what I get.

Jason

On Fri, Oct 27, 2006 at 12:06:01PM -0400, Lon Hohberger wrote:
> On Thu, 2006-10-26 at 21:03 -0400, jason at monsterjam.org wrote:
> 
> > Oct 25 20:31:14 tf1 rpcidmapd: rpc.idmapd startup succeeded
> > Oct 25 20:31:14 tf1 kernel:   Vendor: DELL      Model: PERC 4/DC         Rev: 351X
> > Oct 25 20:31:14 tf1 kernel:   Type:   Processor                          ANSI SCSI revision: 02
> > Oct 25 20:31:14 tf1 kernel: scsi[1]: scanning scsi channel 1 [Phy 1] for non-raid devices
> > Oct 25 20:31:14 tf1 kernel:   Vendor: DELL      Model: PERC 4/DC         Rev: 351X
> > Oct 25 20:31:14 tf1 kernel:   Type:   Processor                          ANSI SCSI revision: 02
> > Oct 25 20:31:14 tf1 kernel:   Vendor: DELL      Model: PV22XS            Rev: E.17
> > Oct 25 20:31:14 tf1 kernel:   Type:   Processor                          ANSI SCSI revision: 03
> > Oct 25 20:31:14 tf1 kernel: scsi[1]: scanning scsi channel 2 [virtual] for logical drives
> > Oct 25 20:31:14 tf1 kernel:   Vendor: MegaRAID  Model: LD 0 RAID5  139G  Rev: 351X
> > Oct 25 20:31:14 tf1 kernel:   Type:   Direct-Access                      ANSI SCSI revision: 02
> > Oct 25 20:31:14 tf1 kernel: scsi1 (2,0,0) : reservation conflict
> 
> Those things are in "cluster mode", right?
> 
> 
> > Oct 25 20:31:14 tf1 kernel: sdb: asking for cache data failed
> > Oct 25 20:31:14 tf1 kernel: sdb: assuming drive cache: write through
> > Oct 25 20:31:14 tf1 kernel:  sdb: sdb1
> > Oct 25 20:31:14 tf1 kernel: Attached scsi disk sdb at scsi1, channel 2, id 0, lun 0
> > Oct 25 20:31:14 tf1 kernel: Adaptec aacraid driver (1.1-5[2412])
> > Oct 25 20:31:14 tf1 kernel: device-mapper: 4.5.0-ioctl (2005-10-04) initialised: dm-devel at redhat.com
> > Oct 25 20:31:14 tf1 kernel: EXT3-fs: INFO: recovery required on readonly filesystem.
> > Oct 25 20:31:14 tf1 kernel: EXT3-fs: write access will be enabled during recovery.
> > 
> > so sdb is the gfs volume and is already locked by the other server at this point is my guess.
> 
> GFS doesn't do SCSI reservations.  Both nodes need concurrent write
> access to the disks.  More to the point, see below...
> 
> > Oct 25 20:36:13 tf1 kernel: ------------[ cut here ]------------
> > ...
> > Oct 25 20:36:13 tf1 kernel:  <0>Fatal exception: panic in 5 seconds
> 
> ^^^ Argh.
> 
> > so my question now is that it appears that I have something misconfigured.. tf1 should come up as secondary while tf2 is running as 
> > primary, right? or should tf1 come up and take over as primary and tf2 let him?
> 
> Irrespective of anything you did (or didn't do), the panic above is a
> bug in cman (or maybe the kernel, but not likely).
> 
> ... The node panicked trying to start up the cluster software, before
> GFS (or rgmanager, or dlm) was even in the picture.  You'll note that in
> the modules list, 'gfs' and 'dlm' are not even listed.
> 
> I hope the newer cman-kernel / dlm-kernel fixes it ;) 
> 
> -- Lon
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
================================================
|    Jason Welsh   jason at monsterjam.org        |
| http://monsterjam.org    DSS PGP: 0x5E30CC98 |
|    gpg key: http://monsterjam.org/gpg/       |
================================================