[Linux-cluster] frozen services are stopped when rgmanager is restarted
Martin Waite
Martin.Waite at datacash.com
Mon Jun 21 15:50:54 UTC 2010
Hi,
RHEL 5.4: cluster2 (I think).
I expected to be able to freeze a service on a node and restart
rgmanager on that node without interrupting the service. In practice,
starting rgmanager causes the service to be stopped.
Is this what is supposed to happen ? I thought the whole point of
freezing services was to allow maintenance (including restarting cluster
software).
Are there any options to prevent the services from being stopped when
rgmanager is started ?
One effect of rgmanager stopping the service is that the cluster reaches
an inconsistent state. Once rgmanager has restarted, the cluster
believes that the services are still frozen, where in reality they are
stopped. Any attempt to unfreeze the service causes the service to
failover to a standby node.
regards,
Martin
sudo /usr/sbin/clustat
Cluster Status for EDISV1DBM @ Mon Jun 21 16:27:05 2010
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
svXprdclu001 1 Online,
rgmanager
svXprdclu002 2 Online,
Local, rgmanager
svXprdclu003 3 Online,
rgmanager
svXprdclu004 4 Online,
rgmanager
svXprdclu005 5 Online,
rgmanager
Service Name Owner (Last)
State
------- ---- ----- ------
-----
service:ACTIVESITE svXprdclu002
started
service:MASTERVIP svXprdclu002
started
[martin at cp1edidbm002 ~]$ sudo /usr/sbin/clusvcadm -Z ACTIVESITE
Local machine freezing service:ACTIVESITE...Success
[martin at cp1edidbm002 ~]$ sudo /usr/sbin/clusvcadm -Z MASTERVIP
Local machine freezing service:MASTERVIP...Success
[martin at cp1edidbm002 ~]$ sudo /usr/sbin/clustat
Cluster Status for EDISV1DBM @ Mon Jun 21 16:34:02 2010
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
svXprdclu001 1 Online,
rgmanager
svXprdclu002 2 Online,
Local, rgmanager
svXprdclu003 3 Online,
rgmanager
svXprdclu004 4 Online,
rgmanager
svXprdclu005 5 Online,
rgmanager
Service Name Owner (Last)
State
------- ---- ----- ------
-----
service:ACTIVESITE svXprdclu002
started [Z]
service:MASTERVIP svXprdclu002
started [Z]
[martin at cp1edidbm002 ~]$ sudo /etc/init.d/rgmanager stop
Shutting down Cluster Service Manager...
Waiting for services to stop: [ OK ]
Cluster Service Manager is stopped.
[martin at cp1edidbm002 ~]$ sudo /etc/init.d/rgmanager start
Starting Cluster Service Manager: [ OK ]
#
# the services are stopped by rgmanager start. Ugh!
#
[martin at cp1edidbm002 ~]$ sudo /usr/sbin/clustat
Cluster Status for EDISV1DBM @ Mon Jun 21 16:35:34 2010
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
svXprdclu001 1 Online,
rgmanager
svXprdclu002 2 Online,
Local, rgmanager
svXprdclu003 3 Online,
rgmanager
svXprdclu004 4 Online,
rgmanager
svXprdclu005 5 Online,
rgmanager
Service Name Owner (Last)
State
------- ---- ----- ------
-----
service:ACTIVESITE svXprdclu002
started [Z]
service:MASTERVIP svXprdclu002
started [Z]
=========================================
The logs show that the service is stopped as rgmanager is started on
svXprdclu002.
Jun 21 16:31:19 cp1edidbm002 clurgmgrd: [14256]: <info> Executing
/home/martin/dc-dsm status
Jun 21 16:34:58 cp1edidbm002 rgmanager: [15526]: <notice> Shutting down
Cluster Service Manager...
Jun 21 16:34:58 cp1edidbm002 clurgmgrd[14256]: <notice> Shutting down
Jun 21 16:35:08 cp1edidbm002 clurgmgrd[14256]: <notice> Shutdown
complete, exiting
Jun 21 16:35:08 cp1edidbm002 rgmanager: [15526]: <notice> Cluster
Service Manager is stopped.
Jun 21 16:35:16 cp1edidbm002 kernel: dlm: Using TCP for communications
Jun 21 16:35:16 cp1edidbm002 kernel: dlm: got connection from 4
Jun 21 16:35:16 cp1edidbm002 kernel: dlm: got connection from 5
Jun 21 16:35:16 cp1edidbm002 kernel: dlm: got connection from 1
Jun 21 16:35:16 cp1edidbm002 kernel: dlm: got connection from 3
Jun 21 16:35:17 cp1edidbm002 clurgmgrd[15574]: <notice> Resource Group
Manager Starting
Jun 21 16:35:17 cp1edidbm002 clurgmgrd[15574]: <info> Loading Service
Data
Jun 21 16:35:17 cp1edidbm002 clurgmgrd[15574]: <info> Initializing
Services
Jun 21 16:35:17 cp1edidbm002 clurgmgrd: [15574]: <info> Executing
/bin/true stop
Jun 21 16:35:17 cp1edidbm002 clurgmgrd: [15574]: <info> Removing IPv4
address 10.3.17.20/24 from bond0
Jun 21 16:35:27 cp1edidbm002 clurgmgrd: [15574]: <info> Executing
/home/martin/dc-dsm stop
Jun 21 16:35:27 cp1edidbm002 clurgmgrd[15574]: <info> Services
Initialized
Jun 21 16:35:27 cp1edidbm002 clurgmgrd[15574]: <info> State change:
Local UP
Jun 21 16:35:27 cp1edidbm002 clurgmgrd[15574]: <info> State change:
svXprdclu001 UP
Jun 21 16:35:27 cp1edidbm002 clurgmgrd[15574]: <info> State change:
svXprdclu003 UP
Jun 21 16:35:27 cp1edidbm002 clurgmgrd[15574]: <info> State change:
svXprdclu004 UP
Jun 21 16:35:27 cp1edidbm002 clurgmgrd[15574]: <info> State change:
svXprdclu005 UP
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20100621/8ef502e2/attachment.htm>
More information about the Linux-cluster
mailing list