[Linux-cluster] frozen services are stopped when rgmanager is restarted

Martin Waite Martin.Waite at datacash.com
Mon Jun 21 15:50:54 UTC 2010


Hi,

 

RHEL 5.4:  cluster2 (I think).

 

I expected to be able to freeze a service on a node and restart
rgmanager on that node without interrupting the service.   In practice,
starting rgmanager causes the service to be stopped.  

 

Is this what is supposed to happen ?  I thought the whole point of
freezing services was to allow maintenance (including restarting cluster
software).

 

Are there any options to prevent the services from being stopped when
rgmanager is started ?

 

One effect of rgmanager stopping the service is that the cluster reaches
an inconsistent state.  Once rgmanager has restarted, the cluster
believes that the services are still frozen, where in reality they are
stopped.   Any attempt to unfreeze the service causes the service to
failover to a standby node.

 

regards,

Martin

 

 

sudo /usr/sbin/clustat

Cluster Status for EDISV1DBM @ Mon Jun 21 16:27:05 2010

Member Status: Quorate

 

 Member Name                                           ID   Status

 ------ ----                                           ---- ------

 svXprdclu001                                              1 Online,
rgmanager

 svXprdclu002                                              2 Online,
Local, rgmanager

 svXprdclu003                                              3 Online,
rgmanager

 svXprdclu004                                              4 Online,
rgmanager

 svXprdclu005                                              5 Online,
rgmanager

 

 Service Name                                 Owner (Last)
State

 ------- ----                                 ----- ------
-----

 service:ACTIVESITE                           svXprdclu002
started

 service:MASTERVIP                            svXprdclu002
started

 

[martin at cp1edidbm002 ~]$ sudo /usr/sbin/clusvcadm -Z ACTIVESITE

Local machine freezing service:ACTIVESITE...Success

 

[martin at cp1edidbm002 ~]$ sudo /usr/sbin/clusvcadm -Z MASTERVIP

Local machine freezing service:MASTERVIP...Success

 

[martin at cp1edidbm002 ~]$ sudo /usr/sbin/clustat

Cluster Status for EDISV1DBM @ Mon Jun 21 16:34:02 2010

Member Status: Quorate

 

 Member Name                                           ID   Status

 ------ ----                                           ---- ------

 svXprdclu001                                              1 Online,
rgmanager

 svXprdclu002                                              2 Online,
Local, rgmanager

 svXprdclu003                                              3 Online,
rgmanager

 svXprdclu004                                              4 Online,
rgmanager

 svXprdclu005                                              5 Online,
rgmanager

 

 Service Name                                 Owner (Last)
State

 ------- ----                                 ----- ------
-----

 service:ACTIVESITE                           svXprdclu002
started    [Z]

 service:MASTERVIP                            svXprdclu002
started    [Z]

 

[martin at cp1edidbm002 ~]$ sudo /etc/init.d/rgmanager stop

Shutting down Cluster Service Manager...

Waiting for services to stop:                              [  OK  ]

Cluster Service Manager is stopped.

 

[martin at cp1edidbm002 ~]$ sudo /etc/init.d/rgmanager start

Starting Cluster Service Manager:                          [  OK  ]

 

#

# the services are stopped by rgmanager start.  Ugh!

#

 

[martin at cp1edidbm002 ~]$ sudo /usr/sbin/clustat

Cluster Status for EDISV1DBM @ Mon Jun 21 16:35:34 2010

Member Status: Quorate

 

 Member Name                                           ID   Status

 ------ ----                                           ---- ------

 svXprdclu001                                              1 Online,
rgmanager

 svXprdclu002                                              2 Online,
Local, rgmanager

 svXprdclu003                                              3 Online,
rgmanager

 svXprdclu004                                              4 Online,
rgmanager

 svXprdclu005                                              5 Online,
rgmanager

 

 Service Name                                 Owner (Last)
State

 ------- ----                                 ----- ------
-----

 service:ACTIVESITE                           svXprdclu002
started    [Z]

 service:MASTERVIP                            svXprdclu002
started    [Z]

 

=========================================

 

The logs show that the service is stopped as rgmanager is started on
svXprdclu002.  

 

Jun 21 16:31:19 cp1edidbm002 clurgmgrd: [14256]: <info> Executing
/home/martin/dc-dsm status

Jun 21 16:34:58 cp1edidbm002 rgmanager: [15526]: <notice> Shutting down
Cluster Service Manager...

Jun 21 16:34:58 cp1edidbm002 clurgmgrd[14256]: <notice> Shutting down

Jun 21 16:35:08 cp1edidbm002 clurgmgrd[14256]: <notice> Shutdown
complete, exiting

Jun 21 16:35:08 cp1edidbm002 rgmanager: [15526]: <notice> Cluster
Service Manager is stopped.

 

Jun 21 16:35:16 cp1edidbm002 kernel: dlm: Using TCP for communications

Jun 21 16:35:16 cp1edidbm002 kernel: dlm: got connection from 4

Jun 21 16:35:16 cp1edidbm002 kernel: dlm: got connection from 5

Jun 21 16:35:16 cp1edidbm002 kernel: dlm: got connection from 1

Jun 21 16:35:16 cp1edidbm002 kernel: dlm: got connection from 3

Jun 21 16:35:17 cp1edidbm002 clurgmgrd[15574]: <notice> Resource Group
Manager Starting

Jun 21 16:35:17 cp1edidbm002 clurgmgrd[15574]: <info> Loading Service
Data

Jun 21 16:35:17 cp1edidbm002 clurgmgrd[15574]: <info> Initializing
Services

Jun 21 16:35:17 cp1edidbm002 clurgmgrd: [15574]: <info> Executing
/bin/true stop

Jun 21 16:35:17 cp1edidbm002 clurgmgrd: [15574]: <info> Removing IPv4
address 10.3.17.20/24 from bond0

Jun 21 16:35:27 cp1edidbm002 clurgmgrd: [15574]: <info> Executing
/home/martin/dc-dsm stop

Jun 21 16:35:27 cp1edidbm002 clurgmgrd[15574]: <info> Services
Initialized

Jun 21 16:35:27 cp1edidbm002 clurgmgrd[15574]: <info> State change:
Local UP

Jun 21 16:35:27 cp1edidbm002 clurgmgrd[15574]: <info> State change:
svXprdclu001 UP

Jun 21 16:35:27 cp1edidbm002 clurgmgrd[15574]: <info> State change:
svXprdclu003 UP

Jun 21 16:35:27 cp1edidbm002 clurgmgrd[15574]: <info> State change:
svXprdclu004 UP

Jun 21 16:35:27 cp1edidbm002 clurgmgrd[15574]: <info> State change:
svXprdclu005 UP

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20100621/8ef502e2/attachment.htm>


More information about the Linux-cluster mailing list