[Linux-cluster] cluster issues - configuration OK?
Terry
td3201 at gmail.com
Sun Aug 26 16:14:07 UTC 2012
I have a two node cluster on RHEL 6.3. It is serving up three NFS mounts
and a Postgres 9.0 database. The database uses a GFS2 disk and the NFS
mount points are ext4. I can't seem to fail the services between nodes
with out a disable/enable. On top of that issue, please just look at my
config and let me know where it can be improved in general. Here's a log
showing me trying to relocate postgres from one node to the other:
*Aug 26 10:50:35 omadvnfs01c rgmanager[9149]: Stopping service
service:postgresql90*
*Aug 26 10:50:35 omadvnfs01c rgmanager[19756]: [ip] Removing IPv4 address
10.198.1.112/24 from bond0*
*Aug 26 10:50:35 omadvnfs01c avahi-daemon[6596]: Withdrawing address record
for 10.198.1.112 on bond0.*
*Aug 26 10:50:35 omadvnfs01c rsyslogd-2177: imuxsock begins to drop
messages from pid 5431 due to rate-limiting*
*Aug 26 10:50:45 omadvnfs01c rsyslogd-2177: imuxsock lost 270 messages from
pid 5431 due to rate-limiting*
*Aug 26 10:50:45 omadvnfs01c rgmanager[20118]: [script] Executing
/etc/init.d/postgresql-9.0 stop*
*Aug 26 10:50:45 omadvnfs01c postgres[18312]: [2-1] LOG: received fast
shutdown request*
*Aug 26 10:50:45 omadvnfs01c postgres[18312]: [3-1] LOG: aborting any
active transactions*
*Aug 26 10:50:45 omadvnfs01c postgres[19284]: [10-1] FATAL: terminating
connection due to administrator command*
*Aug 26 10:50:45 omadvnfs01c postgres[19207]: [2-1] FATAL: terminating
connection due to administrator command*
*Aug 26 10:50:45 omadvnfs01c postgres[19102]: [2-1] FATAL: terminating
connection due to administrator command*
*Aug 26 10:50:45 omadvnfs01c postgres[19100]: [2-1] FATAL: terminating
connection due to administrator command*
*Aug 26 10:50:45 omadvnfs01c postgres[19099]: [2-1] FATAL: terminating
connection due to administrator command*
*Aug 26 10:50:45 omadvnfs01c postgres[19141]: [2-1] FATAL: terminating
connection due to administrator command*
*Aug 26 10:50:45 omadvnfs01c postgres[19142]: [2-1] FATAL: terminating
connection due to administrator command*
*Aug 26 10:50:45 omadvnfs01c postgres[19072]: [2-1] LOG: autovacuum
launcher shutting down*
*Aug 26 10:50:45 omadvnfs01c postgres[19138]: [2-1] FATAL: terminating
connection due to administrator command*
*Aug 26 10:50:45 omadvnfs01c postgres[19137]: [2-1] FATAL: terminating
connection due to administrator command*
*Aug 26 10:50:45 omadvnfs01c postgres[19139]: [2-1] FATAL: terminating
connection due to administrator command*
*Aug 26 10:50:45 omadvnfs01c postgres[19134]: [2-1] FATAL: terminating
connection due to administrator command*
*Aug 26 10:50:45 omadvnfs01c postgres[19110]: [2-1] FATAL: terminating
connection due to administrator command*
*Aug 26 10:50:45 omadvnfs01c postgres[19136]: [2-1] FATAL: terminating
connection due to administrator command*
*Aug 26 10:50:45 omadvnfs01c postgres[19098]: [2-1] FATAL: terminating
connection due to administrator command*
*Aug 26 10:50:45 omadvnfs01c postgres[19101]: [2-1] FATAL: terminating
connection due to administrator command*
*Aug 26 10:50:45 omadvnfs01c postgres[19140]: [2-1] FATAL: terminating
connection due to administrator command*
*Aug 26 10:50:45 omadvnfs01c postgres[19135]: [2-1] FATAL: terminating
connection due to administrator command*
*Aug 26 10:50:45 omadvnfs01c postgres[19133]: [2-1] FATAL: terminating
connection due to administrator command*
*Aug 26 10:50:46 omadvnfs01c rsyslogd-2177: imuxsock begins to drop
messages from pid 5431 due to rate-limiting*
*Aug 26 10:50:55 omadvnfs01c nrpe[20652]: Error: Could not complete SSL
handshake. 5*
*Aug 26 10:50:55 omadvnfs01c rsyslogd-2177: imuxsock lost 352 messages from
pid 5431 due to rate-limiting*
*Aug 26 10:50:57 omadvnfs01c rsyslogd-2177: imuxsock begins to drop
messages from pid 5431 due to rate-limiting*
*Aug 26 10:51:05 omadvnfs01c rsyslogd-2177: imuxsock lost 32 messages from
pid 5431 due to rate-limiting*
*Aug 26 10:51:15 omadvnfs01c rsyslogd-2177: imuxsock begins to drop
messages from pid 5431 due to rate-limiting*
*Aug 26 10:51:24 omadvnfs01c rsyslogd-2177: imuxsock lost 212 messages from
pid 5431 due to rate-limiting*
*Aug 26 10:51:27 omadvnfs01c rsyslogd-2177: imuxsock begins to drop
messages from pid 5431 due to rate-limiting*
*Aug 26 10:51:45 omadvnfs01c rsyslogd-2177: imuxsock lost 38 messages from
pid 5431 due to rate-limiting*
*Aug 26 10:51:46 omadvnfs01c rsyslogd-2177: imuxsock begins to drop
messages from pid 5431 due to rate-limiting*
*Aug 26 10:51:46 omadvnfs01c rgmanager[22393]: [script]
script:postgresql90-init: stop of /etc/init.d/postgresql-9.0 failed
(returned 1)*
*Aug 26 10:51:46 omadvnfs01c rgmanager[9149]: stop on script
"postgresql90-init" returned 1 (generic error)*
*Aug 26 10:51:46 omadvnfs01c rgmanager[22492]: [fs] unmounting /data03*
*Aug 26 10:51:46 omadvnfs01c rgmanager[22533]: [fs] Sending SIGTERM to
processes on /data03*
*Aug 26 10:51:52 omadvnfs01c rsyslogd-2177: imuxsock lost 248 messages from
pid 5431 due to rate-limiting*
*Aug 26 10:51:52 omadvnfs01c rgmanager[22636]: [fs] unmounting /data03*
*Aug 26 10:51:52 omadvnfs01c rgmanager[22677]: [fs] Sending SIGKILL to
processes on /data03*
*Aug 26 10:51:55 omadvnfs01c rsyslogd-2177: imuxsock begins to drop
messages from pid 5431 due to rate-limiting*
*Aug 26 10:51:57 omadvnfs01c rgmanager[23435]: [fs] unmounting /data03*
*Aug 26 10:51:58 omadvnfs01c rsyslogd-2177: imuxsock lost 344 messages from
pid 5431 due to rate-limiting*
*Aug 26 10:51:58 omadvnfs01c rgmanager[9149]: #12: RG service:postgresql90
failed to stop; intervention required*
*Aug 26 10:51:58 omadvnfs01c rgmanager[9149]: Service service:postgresql90
is failed*
Here is my cluster.conf:
*<?xml version="1.0"?>*
*<cluster config_version="166" name="omadvnfs01">*
* <cman expected_votes="1" two_node="1"/>*
* <clusternodes>*
* <clusternode name="omadvnfs01c.sec.jel.lc" nodeid="1">*
* <fence>*
* <method name="drac">*
* <device name="omadvnfs01c-drac"/>*
* </method>*
* </fence>*
* </clusternode>*
* <clusternode name="omadvnfs01b.sec.jel.lc" nodeid="2">*
* <fence>*
* <method name="drac">*
* <device name="omadvnfs01b-drac"/>*
* </method>*
* </fence>*
* </clusternode>*
* </clusternodes>*
* <fencedevices>*
* <fencedevice agent="fence_drac5" ipaddr="10.98.1.213"
login="root" module_name="omadvnfs01c" name="omadvnfs01c-drac"
passwd="narf" secure="on"/>*
* <fencedevice agent="fence_drac5" ipaddr="10.98.1.212"
login="root" module_name="omadvnfs01b" name="omadvnfs01b-drac"
passwd="narf" secure="on"/>*
* </fencedevices>*
* <rm>*
* <resources>*
* <nfsexport name="data01a"/>*
* <nfsexport name="data01b"/>*
* <nfsexport name="data01c"/>*
* <nfsclient allow_recover="on" name="omadvdss01a"
options="rw,no_root_squash,async" target="omadvdss01a"/>*
* <nfsclient allow_recover="on" name="omadvdss01b"
options="rw,no_root_squash,async" target="omadvdss01b"/>*
* <nfsclient allow_recover="on" name="omadvdss01c"
options="rw,no_root_squash,async" target="omadvdss01c"/>*
* <script file="/etc/init.d/postgresql-9.0"
name="postgresql90-init"/>*
* <script file="/etc/init.d/postgresql-9.1"
name="postgresql91-init"/>*
* <ip address="10.198.1.112" monitor_link="on"
sleeptime="10"/>*
* <ip address="10.198.1.113" monitor_link="on"
sleeptime="10"/>*
* <ip address="10.198.1.114" monitor_link="on"
sleeptime="10"/>*
* <ip address="10.198.1.115" monitor_link="on"
sleeptime="10"/>*
* <script file="/etc/init.d/postgresql-8.4"
name="postgresql84-init"/>*
* <fs device="/dev/vg_data01a/lv_data01a"
force_unmount="1" fsid="18521" self_fence="1" fstype="ext4"
mountpoint="/data01a" name="omadvnfs01-data01a" nfslock="1"
options="noatime,nodiratime,data=writeback,commit=30"/>*
* <fs device="/dev/vg_data01b/lv_data01b"
force_unmount="1" fsid="6623" self_fence="1" fstype="ext4"
mountpoint="/data01b" name="omadvnfs01-data01b" nfslock="1"
options="noatime,nodiratime,data=writeback,commit=30"/>*
* <fs device="/dev/vg_data01c/lv_data01c"
force_unmount="1" fsid="91523" self_fence="1" fstype="ext4"
mountpoint="/data01c" name="omadvnfs01-data01c" nfslock="1"
options="noatime,nodiratime,data=writeback,commit=30"/>*
* <fs device="/dev/vg_data03/lv_data03"
force_unmount="1" force_fsck="1" self_fence="1" fsid="15631" fstype="gfs2"
mountpoint="/data03" name="omadvnfs01-data03" options=""/>*
* </resources>*
* <failoverdomains>*
* <failoverdomain name="fd_omadvnfs01c"
nofailback="1" ordered="1" restricted="0">*
* <failoverdomainnode name="
omadvnfs01c.sec.jel.lc" priority="1"/>*
* <failoverdomainnode name="
omadvnfs01b.sec.jel.lc" priority="2"/>*
* </failoverdomain>*
* <failoverdomain name="fd_omadvnfs01b"
nofailback="1" ordered="1" restricted="0">*
* <failoverdomainnode name="
omadvnfs01b.sec.jel.lc" priority="1"/>*
* <failoverdomainnode name="
omadvnfs01c.sec.jel.lc" priority="2"/>*
* </failoverdomain>*
* </failoverdomains>*
* <service domain="fd_omadvnfs01b"
name="omadvnfs01-nfs-data01b" nfslock="1" recovery="relocate">*
* <fs ref="omadvnfs01-data01b">*
* <nfsexport ref="data01b">*
* <ip ref="10.198.1.114"/>*
* <nfsclient ref="omadvdss01a"/>*
* <nfsclient ref="omadvdss01b"/>*
*
<nfsclient ref="omadvdss01c"/>
</nfsexport>
</fs>
</service>
<service domain="fd_omadvnfs01c"
name="omadvnfs01-nfs-data01a" nfslock="1" recovery="relocate">
<fs ref="omadvnfs01-data01a">
<nfsexport ref="data01a">
<ip ref="10.198.1.113"/>
<nfsclient ref="omadvdss01a"/>
<nfsclient ref="omadvdss01b"/>
<nfsclient ref="omadvdss01c"/>
</nfsexport>
</fs>
</service>
<service domain="fd_omadvnfs01c"
name="omadvnfs01-nfs-data01c" nfslock="1" recovery="relocate">
<fs ref="omadvnfs01-data01c">
<nfsexport ref="data01c">
<ip ref="10.198.1.115"/>
<nfsclient ref="omadvdss01a"/>
<nfsclient ref="omadvdss01b"/>
<nfsclient ref="omadvdss01c"/>
</nfsexport>
</fs>
</service>
<service domain="fd_omadvnfs01b" name="postgresql90"
recovery="relocate">
<ip ref="10.198.1.112"/>
<fs ref="omadvnfs01-data03">
<script ref="postgresql90-init"/>
</fs>
</service>
</rm>
<logging debug="on" logfile="/var/log/cluster.log"
logfile_priority="debug"/>
</cluster>
*
There's nothing of interest in my cluster.log file during the time when I
attempted to relocate.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120826/cd277bae/attachment.htm>
More information about the Linux-cluster
mailing list