[Linux-cluster] failover questions after upgrade
Lon Hohberger
lhh at redhat.com
Wed Nov 15 17:02:23 UTC 2006
On Tue, 2006-11-14 at 20:06 -0500, jason at monsterjam.org wrote:
>
> and when I reboot both servers of 2 node cluster, they come up fine..
> [jason at tf2 ~]$ clustat
> Member Status: Quorate, Group Member
>
> Member Name State ID
> ------ ---- ----- --
> tf1 Online 0x0000000000000001
> tf2 Online 0x0000000000000002
>
> Service Name Owner (Last) State
> ------- ---- ----- ------ -----
> Apache Service tf1 started
> [jason at tf2 ~]$
>
> when I reboot (shutdown -r now) tf1,
> tf2 never takes over
>
> [jason at tf2 ~]$ clustat
> Member Status: Quorate, Group Member
>
> Member Name State ID
> ------ ---- ----- --
> tf2 Online 0x0000000000000002
>
> Service Name Owner (Last) State
> ------- ---- ----- ------ -----
> Apache Service ((null) ) failed
> [jason at tf2 ~]$
>
> heres the logs from tf2:
>
> Nov 14 19:48:21 tf2 clurgmgrd[5345]: <info> Logged in SG "usrm::manager"
> Nov 14 19:48:21 tf2 clurgmgrd[5345]: <info> Magma Event: Membership Change
> Nov 14 19:48:21 tf2 clurgmgrd[5345]: <info> State change: Local UP
> Nov 14 19:48:22 tf2 clurgmgrd[5345]: <info> State change: tf1 UP
> Nov 14 19:48:25 tf2 snmpd[5195]: Got trap from peer on fd 13
> Nov 14 19:48:44 tf2 kernel: process `omaws32' is using obsolete setsockopt SO_BSDCOMPAT
> Nov 14 19:48:58 tf2 Server Administrator: Storage Service EventID: 2164 See readme.txt for a list
> of validated controller driver versions.
> Nov 14 19:49:00 tf2 snmpd[5195]: Got trap from peer on fd 13
> Nov 14 19:50:31 tf2 sshd(pam_unix)[6920]: session opened for user jason by (uid=0)
> Nov 14 19:51:03 tf2 sshd(pam_unix)[6951]: session opened for user jason by (uid=0)
>
> Nov 14 19:51:39 tf2 clurgmgrd[5345]: <info> Magma Event: Membership Change
> Nov 14 19:51:39 tf2 clurgmgrd[5345]: <info> State change: tf1 DOWN
> Nov 14 19:52:19 tf2 ntpd[4896]: synchronized to 193.162.159.97, stratum 2
> Nov 14 19:52:19 tf2 ntpd[4896]: kernel time sync disabled 0041
> Nov 14 19:52:28 tf2 kernel: e100: eth2: e100_watchdog: link down
> Nov 14 19:52:34 tf2 kernel: CMAN: removing node tf1 from the cluster : Missed too many heartbeats
> Nov 14 19:52:58 tf2 kernel: e100: eth2: e100_watchdog: link up, 100Mbps, full-duplex
> Nov 14 19:55:14 tf2 kernel: CMAN: node tf1 rejoining
> Nov 14 19:55:45 tf2 clurgmgrd[5345]: <info> Magma Event: Membership Change
> Nov 14 19:55:45 tf2 clurgmgrd[5345]: <info> State change: tf1 UP
>
>
> then when tf1 comes back up, my apache service doesnt come up correctly..
>
> [jason at tf2 ~]$ clustat
> Member Status: Quorate, Group Member
>
> Member Name State ID
> ------ ---- ----- --
> tf1 Online 0x0000000000000001
> tf2 Online 0x0000000000000002
>
> Service Name Owner (Last) State
> ------- ---- ----- ------ -----
> Apache Service (tf1 ) failed
> [jason at tf2 ~]$
>
>
> and I see this in the logs on tf1 as hes booting up.
> Nov 14 19:55:44 tf1 rhnsd[5445]: Red Hat Network Services Daemon starting up.
> Nov 14 19:55:44 tf1 rhnsd: rhnsd startup succeeded
> Nov 14 19:55:44 tf1 cups-config-daemon: cups-config-daemon startup succeeded
> Nov 14 19:55:44 tf1 haldaemon: haldaemon startup succeeded
> Nov 14 19:55:44 tf1 clurgmgrd[5488]: <info> Loading Service Data
> Nov 14 19:55:44 tf1 rgmanager: clurgmgrd startup succeeded
> Nov 14 19:55:44 tf1 fstab-sync[5764]: removed all generated mount points
> Nov 14 19:55:45 tf1 clurgmgrd[5488]: <info> Initializing Services
> Nov 14 19:55:45 tf1 fstab-sync[6152]: added mount point /media/cdrom for /dev/hda
> Nov 14 19:55:45 tf1 httpd: httpd shutdown failed
> Nov 14 19:55:45 tf1 clurgmgrd[5488]: <notice> stop on script "cluster_apache" returned 1 (generic
> error)
> Nov 14 19:55:45 tf1 clurgmgrd[5488]: <info> Services Initialized
> Nov 14 19:55:45 tf1 clurgmgrd[5488]: <info> Logged in SG "usrm::manager"
> Nov 14 19:55:45 tf1 clurgmgrd[5488]: <info> Magma Event: Membership Change
> Nov 14 19:55:45 tf1 clurgmgrd[5488]: <info> State change: Local UP
> Nov 14 19:55:46 tf1 fstab-sync[6465]: added mount point /media/floppy for /dev/fd0
> Nov 14 19:55:46 tf1 clurgmgrd[5488]: <info> State change: tf2 UP
>
> any suggestions?
>
http://sources.redhat.com/cluster/faq.html#rgm_wontrestart
The init script probably is returning 1 for stop-after-stop (or
stop-when-stopped), when it should be returning 0. This is a bug in the
initscripts package, and here's a patch to /etc/init.d/functions to make
httpd work normally:
https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=111998
-- Lon
More information about the Linux-cluster
mailing list