[Linux-cluster] Multiple "rgmanager" instances after re-booting from a kernel panic.

Demetres Pantermalis dpant at intracom-telecom.com
Wed Jan 29 10:36:59 UTC 2014


Please find attached the cluster.conf file and the relevant logs from
both servers.

There are two scenarios executed:
1) From 11:48:00 till 11:55 (This is a normal/expected situation)
app01 is active. Kernel panic at 11:48:00
app02 resumes normally the service
app01 re-joins the cluster at 11:50:00
Kernel panic on app02 at 11:50:45
app01 starts normally the service
app02 re-joins the cluster correctly

2) From 11:55:30 till end (This is where the problem appear)
app01 is active. Kernel panic at 11:55:30
app02 resumes normally the service
app01 re-joins the cluster at 11:57:07
Manually migrate the service to app01 at 11:58:40
Service start normally on app01
kernel panic on app01 at 12:00:35
service resumes normally on app02
app01 re-joins the cluster at 12:02:09
After that, the clustat output on node app02 is:
Cluster Status for par_clu @ Wed Jan 29 12:30:46 2014
Member Status: Quorate

 Member Name                             ID   Status
 ------ ----                             ---- ------
 adr-par-app01-hb                            1 Online
 adr-par-app02-hb                            2 Online, Local, rgmanager

 Service Name                   Owner (Last)                   State
 ------- ----                   ----- ------                   -----
 service:sv-CPAR                adr-par-app02-hb               started

and on node app01 is:
Cluster Status for par_clu @ Wed Jan 29 12:30:43 2014
Member Status: Quorate

 Member Name                             ID   Status
 ------ ----                             ---- ------
 adr-par-app01-hb                            1 Online, Local
 adr-par-app02-hb                            2 Online

The output of "ps -ef | grep rgmanager" on node app01 is:
root      4034     1  0 12:02 ?        00:00:00 rgmanager
root      4036  4034  0 12:02 ?        00:00:00 rgmanager
root      4175  4036  0 12:02 ?        00:00:00 rgmanager

The problem is that rgmanager is not active anymore on node app01.
As a workaround, killing the last process (pid 4175) resumes the
rgmanager without restart.


Thanks for your help.

BR,
Demetres
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cluster.conf
Type: text/xml
Size: 1766 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140129/4e847583/attachment.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: messages_app01.txt.gz
Type: application/x-gzip
Size: 42096 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140129/4e847583/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: messages_app02.txt.gz
Type: application/x-gzip
Size: 15480 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140129/4e847583/attachment-0001.bin>


More information about the Linux-cluster mailing list