[Linux-cluster] Any thoughts on losing mount?

isplist at logicore.net isplist at logicore.net
Tue Nov 27 18:04:15 UTC 2007


> The error message indicates resource group (RG) may get corrupted. Have
> you tried to do an fsck (or did it fixes anything) ?

Should this be while the partition is unmapped on any of the nodes?

# ./fsck /dev/mapper/VolGroup03-web
fsck 1.35 (28-Feb-2004)
e2fsck 1.35 (28-Feb-2004)
Couldn't find ext2 superblock, trying backup blocks...
fsck.ext2: Bad magic number in super-block while trying to open 
/dev/mapper/VolGroup03-web

I've also seen this in the log;

compdev kernel: GFS: Trying to join cluster "lock_dlm", "vgcomp:qm"
compdev kernel: GFS: fsid=vgcomp:qm.1: Joined cluster. Now mounting FS...
compdev kernel: GFS: fsid=vgcomp:qm.1: jid=1: Trying to acquire journal 
lock...
compdev kernel: GFS: fsid=vgcomp:qm.1: jid=1: Looking at journal...
compdev kernel: GFS: fsid=vgcomp:qm.1: jid=1: Done
compdev kernel: GFS: fsid=vgcomp:web.3: fatal: filesystem consistency error
compdev kernel: GFS: fsid=vgcomp:web.3:   RG = 31104599
compdev kernel: GFS: fsid=vgcomp:web.3:   function = gfs_setbit
compdev kernel: GFS: fsid=vgcomp:web.3:   file = 
/home/xos/gen/updates-2007-11/xlrpm29472/rpm/BUILD/gfs-
kernel-2.6.9-72/up/src/gfs/bits.c, line = 71
compdev kernel: GFS: fsid=vgcomp:web.3:   time = 1196105648
compdev kernel: GFS: fsid=vgcomp:web.3: about to withdraw from the cluster
compdev kernel: GFS: fsid=vgcomp:web.3: waiting for outstanding I/O
compdev kernel: GFS: fsid=vgcomp:web.3: telling LM to withdraw
compdev kernel: lock_dlm: withdraw abandoned memory
compdev kernel: GFS: fsid=vgcomp:web.3: withdrawn

and;

compdev kernel: GFS: fsid=vgcomp:web.3: Scanning for log elements...
compdev kernel: GFS: fsid=vgcomp:web.3: Found 1 unlinked inodes
compdev kernel: GFS: fsid=vgcomp:web.3: Found quota changes for 0 IDs
compdev kernel: GFS: fsid=vgcomp:web.3: Done
compdev kernel: GFS: fsid=vgcomp:web.3: fatal: filesystem consistency error
compdev kernel: GFS: fsid=vgcomp:web.3:   RG = 31104599
compdev kernel: GFS: fsid=vgcomp:web.3:   function = gfs_setbit
compdev kernel: GFS: fsid=vgcomp:web.3:   file = 
/home/xos/gen/updates-2007-11/xlrpm29472/rpm/BUILD/gfs-
kernel-2.6.9-72/up/src/gfs/bits.c, line = 71

> Also do you remember any abnormal event (unclean shut-down, panic,
> power-lost, etc) *before* this issue pops out ?

Yes, I posted a few things about that recently. The cluster was dying in 
kernel panic until I updated all of them to be identical again. Since then, 
this node has been having these problems. I have also noticed that cman never 
shuts down correctly when I reboot nodes and that there is a lot of garbage 
(for lack of better word) about volume group information which no longer 
exists when I reboot nodes.

Last but not least, I wasn't sure what to post here so I decided I better post 
more than not enough.

compdev rc.sysinit: Checking root filesystem succeeded
compdev kernel: IP route cache hash table entries: 32768 (order: 5, 131072 
bytes)
compdev rc.sysinit: Remounting root filesystem in read-write mode:  succeeded
compdev kernel: TCP established hash table entries: 131072 (order: 8, 1048576 
bytes)
compdev lvm.static:
compdev kernel: TCP bind hash table entries: 131072 (order: 9, 3670016 bytes)
compdev lvm.static: connect() failed on local socket: Connection refused
compdev kernel: TCP: Hash tables configured (established 131072 bind 131072)
compdev lvm.static:   WARNING: Falling back to local file-based locking.
compdev kernel: Initializing IPsec netlink socket
compdev lvm.static:   Volume Groups with the clustered attribute will be 
inaccessible.
compdev kernel: NET: Registered protocol family 1
compdev lvm.static:   1 logical volume(s) in volume group VolGroup03 now 
active
compdev kernel: NET: Registered protocol family 17
compdev lvm.static:   1 logical volume(s) in volume group VolGroup02 now 
active
compdev kernel: Freeing unused kernel memory: 168k freed
compdev lvm.static:   1 logical volume(s) in volume group VolGroup01 now 
active
compdev kernel: SCSI subsystem initialized
compdev rc.sysinit: Setting up Logical Volume Management: succeeded
compdev kernel: QLogic Fibre Channel HBA Driver
compdev rc.sysinit: Checking filesystems succeeded
compdev kernel: qla2200 0000:00:11.0: Found an ISP2200, irq 11, iobase 
0xe0816000
compdev rc.sysinit: Mounting local filesystems:  succeeded
compdev kernel: qla2200 0000:00:11.0: Configuring PCI space...
compdev rc.sysinit: Enabling local filesystem quotas:  succeeded
compdev kernel: qla2200 0000:00:11.0: Configure NVRAM parameters...
compdev rc.sysinit: Enabling swap space:  succeeded
compdev kernel: qla2200 0000:00:11.0: Verifying loaded RISC code...
compdev init: Entering runlevel: 3
compdev kernel: qla2200 0000:00:11.0: LIP reset occured (0).
compdev microcode_ctl: microcode_ctl startup succeeded
compdev kernel: qla2200 0000:00:11.0: Waiting for LIP to complete...
compdev vgchange:   connect() failed on local socket: Connection refused
compdev kernel: qla2200 0000:00:11.0: LOOP UP detected (1 Gbps).
compdev vgchange:   connect() failed on local socket: Connection refused
compdev kernel: qla2200 0000:00:11.0: Topology - (F_Port), Host Loop address 
0xffff
compdev vgchange:
compdev kernel: scsi0 : qla2xxx
compdev vgchange:   WARNING: Falling back to local file-based locking.
compdev kernel: qla2200 0000:00:11.0:
compdev vgchange:   Volume Groups with the clustered attribute will be 
inaccessible.
compdev kernel:  QLogic Fibre Channel HBA Driver: 8.01.04-d8
compdev vgchange: Volume group "WARNING:" not found
compdev kernel:   QLogic QLA22xx -
compdev lvm2-monitor: Starting monitoring for VG WARNING:: failed
compdev kernel:   ISP2200: PCI (33 MHz) @ 0000:00:11.0 hdma-, host#=0, 
fw=2.02.08 TP
compdev vgchange:
compdev kernel:   Vendor: MYLEX     Model: DACARMRB          Rev: 7775
compdev vgchange: connect() failed on local socket: Connection refused
compdev kernel:   Type:   Direct-Access                      ANSI SCSI 
revision: 02
compdev vgchange:   connect() failed on local socket: Connection refused
compdev kernel: qla2200 0000:00:11.0: scsi(0:0:0:0): Enabled tagged queuing, 
queue depth 16.
compdev vgchange:
compdev kernel: SCSI device sda: 1013760000 512-byte hdwr sectors (519045 MB)
compdev vgchange:   WARNING: Falling back to local file-based locking.
compdev kernel: SCSI device sda: drive cache: write back
compdev vgchange:   Volume Groups with the clustered attribute will be 
inaccessible.
compdev kernel: SCSI device sda: 1013760000 512-byte hdwr sectors (519045 MB)
compdev vgchange: Volume group "Falling" not found
compdev kernel: SCSI device sda: drive cache: write back
compdev lvm2-monitor: Starting monitoring for VG Falling: failed
compdev kernel:  sda:
compdev vgchange:   connect() failed on local socket: Connection refused
compdev kernel: Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
compdev vgchange:   connect() failed on local socket: Connection refused
compdev kernel:   Vendor: MYLEX     Model: DACARMRB          Rev: 7775
compdev vgchange:
compdev kernel:   Type:   Direct-Access                      ANSI SCSI 
revision: 02
compdev vgchange:   WARNING: Falling back to local file-based locking.
compdev kernel: qla2200 0000:00:11.0: scsi(0:0:0:1): Enabled tagged queuing, 
queue depth 16.
compdev vgchange:   Volume Groups with the clustered attribute will be 
inaccessible.
compdev kernel: SCSI device sdb: 1013760000 512-byte hdwr sectors (519045 MB)
compdev vgchange: Volume group "back" not found
compdev kernel: SCSI device sdb: drive cache: write back
compdev lvm2-monitor: Starting monitoring for VG back: failed
compdev kernel: SCSI device sdb: 1013760000 512-byte hdwr sectors (519045 MB)
compdev vgchange:   connect() failed on local socket: Connection refused
compdev kernel: SCSI device sdb: drive cache: write back
compdev vgchange:   connect() failed on local socket: Connection refused
compdev kernel:  sdb:
compdev vgchange:
compdev kernel: Attached scsi disk sdb at scsi0, channel 0, id 0, lun 1
compdev vgchange:   WARNING: Falling back to local file-based locking.
compdev kernel:   Vendor: MYLEX     Model: DACARMRB          Rev: 7775
compdev vgchange:   Volume Groups with the clustered attribute will be 
inaccessible.
compdev kernel:   Type:   Direct-Access                      ANSI SCSI 
revision: 02
compdev vgchange: Volume group "to" not found
compdev kernel: qla2200 0000:00:11.0: scsi(0:0:0:2): Enabled tagged queuing, 
queue depth 16.
compdev lvm2-monitor: Starting monitoring for VG to: failed
compdev kernel: SCSI device sdc: 997449728 512-byte hdwr sectors (510694 MB)
compdev vgchange:
compdev kernel: SCSI device sdc: drive cache: write back
compdev vgchange: connect() failed on local socket: Connection refused
compdev kernel: SCSI device sdc: 997449728 512-byte hdwr sectors (510694 MB)
compdev vgchange:   connect() failed on local socket: Connection refused
compdev kernel: SCSI device sdc: drive cache: write back
compdev vgchange:
compdev kernel:  sdc:
compdev vgchange:   WARNING: Falling back to local file-based locking.
compdev kernel: Attached scsi disk sdc at scsi0, channel 0, id 0, lun 2
compdev vgchange:   Volume Groups with the clustered attribute will be 
inaccessible.
compdev vgchange: Volume group "local" not found
compdev kernel: device-mapper: 4.5.5-ioctl (2006-12-01) initialised: 
dm-devel at redhat.com
compdev lvm2-monitor: Starting monitoring for VG local: failed
compdev kernel: kjournald starting.  Commit interval 5 seconds
compdev vgchange:   connect() failed on local socket: Connection refused
compdev kernel: EXT3-fs: mounted filesystem with ordered data mode.
compdev vgchange:   connect() failed on local socket: Connection refused
compdev kernel: SELinux:  Disabled at runtime.
compdev vgchange:
compdev kernel: SELinux:  Unregistering netfilter hooks
compdev vgchange:   WARNING: Falling back to local file-based locking.
compdev kernel: inserting floppy driver for 2.6.9-55.0.12.EL.XOS.1
compdev vgchange:   Volume Groups with the clustered attribute will be 
inaccessible.
compdev kernel: Floppy drive(s): fd0 is 1.44M
compdev vgchange: Volume group "file-based" not found
compdev kernel: FDC 0 is a post-1991 82077
compdev lvm2-monitor: Starting monitoring for VG file-based: failed
compdev kernel: e100: Intel(R) PRO/100 Network Driver, 3.5.10-k2-NAPI
compdev vgchange:   connect() failed on local socket: Connection refused
compdev kernel: e100: Copyright(c) 1999-2005 Intel Corporation
compdev vgchange:   connect() failed on local socket: Connection refused
compdev vgchange:
compdev kernel: e100: eth0: e100_probe: addr 0xfebfe000, irq 5, MAC addr 
00:20:94:10:43:67
compdev vgchange:   WARNING: Falling back to local file-based locking.
compdev vgchange:   Volume Groups with the clustered attribute will be 
inaccessible.
compdev kernel: e100: eth1: e100_probe: addr 0xfebfd000, irq 11, MAC addr 
00:20:94:10:43:68
compdev vgchange: Volume group "locking." not found
compdev kernel: USB Universal Host Controller Interface driver v2.2
compdev lvm2-monitor: Starting monitoring for VG locking.: failed
compdev kernel: PCI: Enabling device 0000:00:07.2 (0000 -> 0001)
compdev vgchange:   connect() failed on local socket: Connection refused
compdev kernel: PCI: No IRQ known for interrupt pin D of device 0000:00:07.2. 
Please try using pci=biosi
rq.
compdev vgchange:   connect() failed on local socket: Connection refused
compdev kernel: uhci_hcd 0000:00:07.2: Found HC with no IRQ.  Check BIOS/PCI 
0000:00:07.2 setup!
compdev vgchange:
compdev kernel: md: Autodetecting RAID arrays.
compdev vgchange:   WARNING: Falling back to local file-based locking.
compdev kernel: md: autorun ...
compdev vgchange:   Volume Groups with the clustered attribute will be 
inaccessible.
compdev kernel: md: ... autorun DONE.
compdev vgchange: Volume group "Volume" not found
compdev kernel: EXT3 FS on hda1, internal journal
compdev lvm2-monitor: Starting monitoring for VG Volume: failed
compdev kernel: Adding 787176k swap on /dev/hda2.  Priority:-1 extents:1
compdev vgchange:
compdev kernel: IA-32 Microcode Update Driver: v1.14 <tigran at veritas.com>
compdev vgchange: connect() failed on local socket: Connection refused
compdev kernel: microcode: CPU0 updated from revision 0x7 to 0x8, date = 
05052000
compdev vgchange:   connect() failed on local socket: Connection refused
compdev kernel: IA-32 Microcode Update Driver v1.14 unregistered
compdev vgchange:
compdev kernel: ip_tables: (C) 2000-2002 Netfilter core team
compdev vgchange:   WARNING: Falling back to local file-based locking.
compdev kernel: ip_tables: (C) 2000-2002 Netfilter core team
compdev vgchange:   Volume Groups with the clustered attribute will be 
inaccessible.
compdev kernel: e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex
compdev vgchange: Volume group "Groups" not found
compdev kernel: NET: Registered protocol family 10
compdev lvm2-monitor: Starting monitoring for VG Groups: failed
compdev kernel: Disabled Privacy Extensions on device c0386e60(lo)
compdev vgchange:   connect() failed on local socket: Connection refused
compdev kernel: IPv6 over IPv4 tunneling driver
compdev vgchange:   connect() failed on local socket: Connection refused
compdev vgchange:
compdev kernel: CMAN 2.6.9-50.2.0.6.XOS.1 (built Nov 15 2007 12:03:01) 
installed
compdev vgchange:   WARNING: Falling back to local file-based locking.
compdev kernel: NET: Registered protocol family 30
compdev vgchange:   Volume Groups with the clustered attribute will be 
inaccessible.
compdev kernel: DLM 2.6.9-46.16.0.12.XOS.1 (built Nov 15 2007 12:27:30) 
installed
compdev vgchange: Volume group "with" not found
compdev lvm2-monitor: Starting monitoring for VG with: failed
compdev vgchange:   connect() failed on local socket: Connection refused
compdev vgchange:   connect() failed on local socket: Connection refused
compdev vgchange:
compdev vgchange:   WARNING: Falling back to local file-based locking.
compdev vgchange:   Volume Groups with the clustered attribute will be 
inaccessible.
compdev vgchange: Volume group "the" not found
compdev lvm2-monitor: Starting monitoring for VG the: failed
compdev vgchange:
compdev vgchange: connect() failed on local socket: Connection refused
compdev vgchange:   connect() failed on local socket: Connection refused
compdev vgchange:
compdev vgchange:   WARNING: Falling back to local file-based locking.
compdev vgchange:   Volume Groups with the clustered attribute will be 
inaccessible.
compdev vgchange: Volume group "clustered" not found
compdev lvm2-monitor: Starting monitoring for VG clustered: failed
compdev vgchange:   connect() failed on local socket: Connection refused
compdev vgchange:   connect() failed on local socket: Connection refused
compdev vgchange:
compdev vgchange:   WARNING: Falling back to local file-based locking.
compdev vgchange:   Volume Groups with the clustered attribute will be 
inaccessible.
compdev vgchange: Volume group "attribute" not found
compdev lvm2-monitor: Starting monitoring for VG attribute: failed
compdev vgchange:   connect() failed on local socket: Connection refused
compdev vgchange:   connect() failed on local socket: Connection refused
compdev vgchange:
compdev vgchange:   WARNING: Falling back to local file-based locking.
compdev vgchange:   Volume Groups with the clustered attribute will be 
inaccessible.
compdev vgchange: Volume group "will" not found
compdev lvm2-monitor: Starting monitoring for VG will: failed
compdev vgchange:
compdev vgchange: connect() failed on local socket: Connection refused
compdev vgchange:   connect() failed on local socket: Connection refused
compdev vgchange:
compdev vgchange:   WARNING: Falling back to local file-based locking.
compdev vgchange:   Volume Groups with the clustered attribute will be 
inaccessible.
compdev vgchange: Volume group "be" not found
compdev lvm2-monitor: Starting monitoring for VG be: failed
compdev vgchange:   connect() failed on local socket: Connection refused
compdev vgchange:   connect() failed on local socket: Connection refused
compdev vgchange:
compdev vgchange:   WARNING: Falling back to local file-based locking.
compdev vgchange:   Volume Groups with the clustered attribute will be 
inaccessible.
compdev vgchange: Volume group "inaccessible." not found
compdev lvm2-monitor: Starting monitoring for VG inaccessible.: failed
compdev vgchange:   connect() failed on local socket: Connection refused
compdev vgchange:   connect() failed on local socket: Connection refused
compdev vgchange:   WARNING: Falling back to local file-based locking.
compdev vgchange:   Volume Groups with the clustered attribute will be 
inaccessible.
compdev vgchange:   1 logical volume(s) in volume group "VolGroup01" monitored
compdev lvm2-monitor: Starting monitoring for VG VolGroup01: succeeded
compdev vgchange:
compdev vgchange: connect() failed on local socket: Connection refused
compdev vgchange:   connect() failed on local socket: Connection refused
compdev vgchange:   WARNING: Falling back to local file-based locking.
compdev vgchange:   Volume Groups with the clustered attribute will be 
inaccessible.
compdev vgchange:   1 logical volume(s) in volume group "VolGroup02" monitored
compdev lvm2-monitor: Starting monitoring for VG VolGroup02: succeeded
compdev vgchange:
compdev vgchange: connect() failed on local socket: Connection refused
compdev vgchange:   connect() failed on local socket: Connection refused
compdev vgchange:   WARNING: Falling back to local file-based locking.
compdev vgchange:   Volume Groups with the clustered attribute will be 
inaccessible.
compdev vgchange:   1 logical volume(s) in volume group "VolGroup03" monitored
compdev lvm2-monitor: Starting monitoring for VG VolGroup03: succeeded
compdev kudzu:  succeeded
compdev sysctl: net.ipv4.ip_forward = 0
compdev sysctl: net.ipv4.conf.default.rp_filter = 1
compdev sysctl: net.ipv4.conf.default.accept_source_route = 0
compdev sysctl: kernel.sysrq = 0
compdev sysctl: kernel.core_uses_pid = 1
compdev sysctl: kernel.panic_on_oops = 1
compdev network: Setting network parameters:  succeeded
compdev network: Bringing up loopback interface:  succeeded
compdev network: Bringing up interface eth0:  succeeded
compdev ccsd[2458]: Remote copy of cluster.conf is from quorate node.
compdev ccsd[2458]:  Local version # : 80
compdev ccsd[2458]:  Remote version #: 80
compdev kernel: CMAN: Waiting to join or form a Linux-cluster
compdev kernel: CMAN: sending membership request
compdev ccsd[2458]: Connected to cluster infrastruture via: CMAN/SM Plugin 
v1.1.7.4
compdev ccsd[2458]: Initial status:: Inquorate
compdev kernel: CMAN: got node cweb93
compdev kernel: CMAN: got node cweb94
compdev kernel: CMAN: got node cweb92
compdev kernel: CMAN: got node img62
compdev ccsd[2458]: Cluster is quorate.  Allowing connections.
compdev kernel: CMAN: quorum regained, resuming activity
compdev cman: startup succeeded
compdev fenced: startup succeeded
compdev clvmd: Cluster LVM daemon started - connected to CMAN
compdev clvmd: clvmd startup succeeded
compdev vgchange:   1 logical volume(s) in volume group "VolGroup03" now 
active
compdev vgchange:   1 logical volume(s) in volume group "VolGroup02" now 
active
compdev vgchange:   1 logical volume(s) in volume group "VolGroup01" now 
active
compdev clvmd: Activating VGs: succeeded
compdev netfs: Mounting other filesystems:  succeeded
compdev kernel: Lock_Harness 2.6.9-72.2.0.9.XOS.1 (built Nov 15 2007 12:30:46) 
installed
compdev kernel: GFS 2.6.9-72.2.0.9.XOS.1 (built Nov 15 2007 12:31:07) 
installed
compdev kernel: GFS: Trying to join cluster "lock_dlm", "vgcomp:qm"
compdev kernel: Lock_DLM (built Nov 15 2007 12:30:48) installed
compdev kernel: GFS: fsid=vgcomp:qm.1: Joined cluster. Now mounting FS...
compdev kernel: GFS: fsid=vgcomp:qm.1: jid=1: Trying to acquire journal 
lock...
compdev kernel: GFS: fsid=vgcomp:qm.1: jid=1: Looking at journal...
compdev kernel: GFS: fsid=vgcomp:qm.1: jid=1: Done
compdev kernel: GFS: Trying to join cluster "lock_dlm", "vgcomp:web"
compdev kernel: GFS: fsid=vgcomp:web.3: Joined cluster. Now mounting FS...
compdev kernel: GFS: fsid=vgcomp:web.3: jid=3: Trying to acquire journal 
lock...
compdev kernel: GFS: fsid=vgcomp:web.3: jid=3: Looking at journal...
compdev kernel: GFS: fsid=vgcomp:web.3: jid=3: Done
compdev kernel: GFS: fsid=vgcomp:web.3: Scanning for log elements...
compdev kernel: GFS: fsid=vgcomp:web.3: Found 1 unlinked inodes
compdev kernel: GFS: fsid=vgcomp:web.3: Found quota changes for 0 IDs
compdev kernel: GFS: fsid=vgcomp:web.3: Done
compdev gfs: Mounting GFS filesystems:  succeeded






More information about the Linux-cluster mailing list