[Linux-cluster] Multiple "rgmanager" instances after re-booting from a kernel panic.
Demetres Pantermalis
dpant at intracom-telecom.com
Wed Jan 29 10:36:59 UTC 2014
Please find attached the cluster.conf file and the relevant logs from
both servers.
There are two scenarios executed:
1) From 11:48:00 till 11:55 (This is a normal/expected situation)
app01 is active. Kernel panic at 11:48:00
app02 resumes normally the service
app01 re-joins the cluster at 11:50:00
Kernel panic on app02 at 11:50:45
app01 starts normally the service
app02 re-joins the cluster correctly
2) From 11:55:30 till end (This is where the problem appear)
app01 is active. Kernel panic at 11:55:30
app02 resumes normally the service
app01 re-joins the cluster at 11:57:07
Manually migrate the service to app01 at 11:58:40
Service start normally on app01
kernel panic on app01 at 12:00:35
service resumes normally on app02
app01 re-joins the cluster at 12:02:09
After that, the clustat output on node app02 is:
Cluster Status for par_clu @ Wed Jan 29 12:30:46 2014
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
adr-par-app01-hb 1 Online
adr-par-app02-hb 2 Online, Local, rgmanager
Service Name Owner (Last) State
------- ---- ----- ------ -----
service:sv-CPAR adr-par-app02-hb started
and on node app01 is:
Cluster Status for par_clu @ Wed Jan 29 12:30:43 2014
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
adr-par-app01-hb 1 Online, Local
adr-par-app02-hb 2 Online
The output of "ps -ef | grep rgmanager" on node app01 is:
root 4034 1 0 12:02 ? 00:00:00 rgmanager
root 4036 4034 0 12:02 ? 00:00:00 rgmanager
root 4175 4036 0 12:02 ? 00:00:00 rgmanager
The problem is that rgmanager is not active anymore on node app01.
As a workaround, killing the last process (pid 4175) resumes the
rgmanager without restart.
Thanks for your help.
BR,
Demetres
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cluster.conf
Type: text/xml
Size: 1766 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140129/4e847583/attachment.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: messages_app01.txt.gz
Type: application/x-gzip
Size: 42096 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140129/4e847583/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: messages_app02.txt.gz
Type: application/x-gzip
Size: 15480 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140129/4e847583/attachment-0001.bin>
More information about the Linux-cluster
mailing list