[Linux-cluster] Strange starting order during rgmanager starting and every failover

Poós Krisztián krisztian at poos.hu
Mon Aug 13 12:28:39 UTC 2012


Dear All,

After I successfully solved the ha-lvm/clvmd issue, during the startup
of the SAP group I experience strage behavior of the cluster.
Before starting the servicegroup it tries to start/stop the SAP instance
and mounting the disks (however the service is still not starting up)...
After this was unsuccessful, it starts the service itself, which starts
all the resources without problem in the right order.

What can be the reason of this trial of starting some resources before
starting the whole service (and before the node sees itself up )


Can you help me to identify the error why the resource dependecies does
not work all the time?

Thanks in advance,
Krisztian

Aug 11 17:27:16 linuxsap1 rgmanager[9801]: I am node #1
Aug 11 17:27:16 linuxsap1 rgmanager[9801]: Resource Group Manager Starting
Aug 11 17:27:16 linuxsap1 rgmanager[9801]: Loading Service Data
Aug 11 17:27:17 linuxsap1 rgmanager[9801]: Initializing Services
Aug 11 17:27:18 linuxsap1 rgmanager[10884]: [SAPInstance] sapstartsrv is
not running for instance PRD-DVEBMGS00, it will be started now
Aug 11 17:27:18 linuxsap1 rgmanager[10911]: [SAPInstance] sapstartsrv
for instance PRD-DVEBMGS00 could not be started!
Aug 11 17:27:18 linuxsap1 rgmanager[10934]: [SAPInstance] SAP Instance
PRD-DVEBMGS00 stop failed:
Aug 11 17:27:18 linuxsap1 rgmanager[10956]: [SAPInstance] Attribute
POST_STOP_USEREXIT is set to /usr/sap/PRD/sapsrvstop.sh, but this file
is not executable
Aug 11 17:27:18 linuxsap1 rgmanager[9801]: stop on SAPInstance
"PRD_DVEBMGS00_sapprd" returned 1 (generic error)
Aug 11 17:27:18 linuxsap1 rgmanager[10997]: [SAPDatabase] Cannot find
startdb,stopdb and R3trans executable, please set DIR_EXECUTABLE parameter!
Aug 11 17:27:18 linuxsap1 rgmanager[9801]: stop on SAPDatabase "PRD"
returned 7 (unspecified)
Aug 11 17:27:18 linuxsap1 rgmanager[11035]: [ip] 10.100.100.104 is not
configured
Aug 11 17:27:18 linuxsap1 rgmanager[11072]: [fs] stop: Could not match
/dev/vg_PRD_trans/lv_PRD_trans with a real device
Aug 11 17:27:18 linuxsap1 rgmanager[9801]: stop on fs "PRD_trans"
returned 2 (invalid argument(s))
Aug 11 17:27:19 linuxsap1 rgmanager[11150]: [fs] stop: Could not match
/dev/vg_PRD_usrsap/lv_PRD_usrsap with a real device
Aug 11 17:27:19 linuxsap1 rgmanager[9801]: stop on fs "PRD_usrsap"
returned 2 (invalid argument(s))
Aug 11 17:27:21 linuxsap1 rgmanager[11227]: [fs] stop: Could not match
/dev/vg_PRD_sapmnt/lv_PRD_sapmnt with a real device
Aug 11 17:27:21 linuxsap1 rgmanager[9801]: stop on fs "PRD_sapmnt"
returned 2 (invalid argument(s))
Aug 11 17:27:22 linuxsap1 rgmanager[9801]: stop on fs "PRD_sapdata1"
returned 2 (invalid argument(s))
Aug 11 17:27:22 linuxsap1 rgmanager[11305]: [fs] stop: Could not match
/dev/vg_PRD_oracle/lv_PRD_sapdata1 with a real device
Aug 11 17:27:22 linuxsap1 rgmanager[11342]: [fs] stop: Could not match
/dev/vg_PRD_oracle/lv_PRD_oraarch with a real device
Aug 11 17:27:22 linuxsap1 rgmanager[9801]: stop on fs "PRD_oraarch"
returned 2 (invalid argument(s))
Aug 11 17:27:22 linuxsap1 rgmanager[11379]: [fs] stop: Could not match
/dev/vg_PRD_oracle/lv_PRD_oralog1 with a real device
Aug 11 17:27:22 linuxsap1 rgmanager[9801]: stop on fs "PRD_oralog1"
returned 2 (invalid argument(s))
Aug 11 17:27:22 linuxsap1 rgmanager[11416]: [fs] stop: Could not match
/dev/vg_PRD_oracle/lv_PRD_oralog2 with a real device
Aug 11 17:27:22 linuxsap1 rgmanager[9801]: stop on fs "PRD_oralog2"
returned 2 (invalid argument(s))
Aug 11 17:27:22 linuxsap1 rgmanager[11453]: [fs] stop: Could not match
/dev/vg_PRD_oracle/lv_PRD_orabin with a real device
Aug 11 17:27:22 linuxsap1 rgmanager[9801]: stop on fs "PRD_orabin"
returned 2 (invalid argument(s))
Aug 11 17:27:24 linuxsap1 rgmanager[9801]: Services Initialized
Aug 11 17:27:24 linuxsap1 rgmanager[9801]: State change: Local UP
Aug 11 17:27:24 linuxsap1 rgmanager[9801]: Starting stopped service
service:SAP-PRD
Aug 11 17:27:25 linuxsap1 rgmanager[11551]: [lvm] Starting volume group,
vg_PRD_oracle
Aug 11 17:27:25 linuxsap1 rgmanager[11580]: [lvm] I can claim this
volume group
Aug 11 17:27:25 linuxsap1 rgmanager[11619]: [lvm] New tag
"linuxsap1-priv" added to vg_PRD_oracle
Aug 11 17:27:26 linuxsap1 rgmanager[11803]: [fs] mounting /dev/dm-13 on
/oracle/PRD
Aug 11 17:27:26 linuxsap1 rgmanager[11825]: [fs] mount -t ext4
/dev/dm-13 /oracle/PRD
Aug 11 17:27:26 linuxsap1 rgmanager[11985]: [fs] mounting /dev/dm-15 on
/oracle/PRD/origlogB
Aug 11 17:27:26 linuxsap1 rgmanager[12007]: [fs] mount -t ext4
/dev/dm-15 /oracle/PRD/origlogB
Aug 11 17:27:26 linuxsap1 kernel: EXT4-fs (dm-15): warning: maximal
mount count reached, running e2fsck is recommended
Aug 11 17:27:26 linuxsap1 kernel: EXT4-fs (dm-15): mounted filesystem
with ordered data mode. Opts:
Aug 11 17:27:26 linuxsap1 rgmanager[12200]: [fs] mounting /dev/dm-14 on
/oracle/PRD/origlogA
Aug 11 17:27:26 linuxsap1 rgmanager[12222]: [fs] mount -t ext4
/dev/dm-14 /oracle/PRD/origlogA
Aug 11 17:27:27 linuxsap1 kernel: EXT4-fs (dm-14): mounted filesystem
with ordered data mode. Opts:
Aug 11 17:27:27 linuxsap1 rgmanager[12391]: [fs] mounting /dev/dm-16 on
/oracle/PRD/oraarch
Aug 11 17:27:27 linuxsap1 rgmanager[12413]: [fs] mount -t ext4
/dev/dm-16 /oracle/PRD/oraarch
Aug 11 17:27:27 linuxsap1 kernel: EXT4-fs (dm-16): mounted filesystem
with ordered data mode. Opts:
Aug 11 17:27:27 linuxsap1 rgmanager[12589]: [fs] mounting /dev/dm-17 on
/oracle/PRD/sapdata1
Aug 11 17:27:27 linuxsap1 rgmanager[12611]: [fs] mount -t ext4
/dev/dm-17 /oracle/PRD/sapdata1
Aug 11 17:27:27 linuxsap1 kernel: EXT4-fs (dm-17): mounted filesystem
with ordered data mode. Opts:
Aug 11 17:27:28 linuxsap1 rgmanager[12681]: [lvm] Starting volume group,
vg_PRD_sapmnt
Aug 11 17:27:28 linuxsap1 rgmanager[12710]: [lvm] I can claim this
volume group
Aug 11 17:27:28 linuxsap1 rgmanager[12749]: [lvm] New tag
"linuxsap1-priv" added to vg_PRD_sapmnt
Aug 11 17:27:29 linuxsap1 rgmanager[12920]: [fs] mounting /dev/dm-18 on
/sapmnt/PRD
Aug 11 17:27:29 linuxsap1 rgmanager[12942]: [fs] mount -t ext4
/dev/dm-18 /sapmnt/PRD
Aug 11 17:27:29 linuxsap1 kernel: EXT4-fs (dm-18): mounted filesystem
with ordered data mode. Opts:
Aug 11 17:27:30 linuxsap1 rgmanager[13018]: [lvm] Starting volume group,
vg_PRD_usrsap
Aug 11 17:27:30 linuxsap1 rgmanager[13047]: [lvm] I can claim this
volume group
Aug 11 17:27:30 linuxsap1 rgmanager[13094]: [lvm] New tag
"linuxsap1-priv" added to vg_PRD_usrsap
Aug 11 17:27:31 linuxsap1 rgmanager[13298]: [fs] mounting /dev/dm-19 on
/usr/sap/PRD
Aug 11 17:27:31 linuxsap1 rgmanager[13320]: [fs] mount -t ext4
/dev/dm-19 /usr/sap/PRD
Aug 11 17:27:31 linuxsap1 kernel: EXT4-fs (dm-19): warning: maximal
mount count reached, running e2fsck is recommended
Aug 11 17:27:31 linuxsap1 kernel: EXT4-fs (dm-19): mounted filesystem
with ordered data mode. Opts:
Aug 11 17:27:32 linuxsap1 rgmanager[13391]: [lvm] Starting volume group,
vg_PRD_trans
Aug 11 17:27:32 linuxsap1 rgmanager[13422]: [lvm] I can claim this
volume group
Aug 11 17:27:32 linuxsap1 rgmanager[13461]: [lvm] New tag
"linuxsap1-priv" added to vg_PRD_trans
Aug 11 17:27:33 linuxsap1 rgmanager[13658]: [fs] mounting /dev/dm-33 on
/usr/sap/transERP
Aug 11 17:27:33 linuxsap1 rgmanager[13681]: [fs] mount -t ext4
/dev/dm-33 /usr/sap/transERP
Aug 11 17:27:33 linuxsap1 kernel: EXT4-fs (dm-33): mounted filesystem
with ordered data mode. Opts:
Aug 11 17:27:33 linuxsap1 kernel: SELinux: initialized (dev dm-33, type
ext4), uses xattr
Aug 11 17:27:33 linuxsap1 rgmanager[13761]: [ip] Link for publicteam1:
Detected
Aug 11 17:27:33 linuxsap1 rgmanager[13783]: [ip] Adding IPv4 address
10.100.100.104/16 to publicteam1
Aug 11 17:27:33 linuxsap1 rgmanager[13805]: [ip] Pinging addr
10.100.100.104 from dev publicteam1
Aug 11 17:27:35 linuxsap1 rgmanager[13832]: [ip] Sending gratuitous ARP:
10.100.100.104 d0:67:e5:ea:0f:a0 brd ff:ff:ff:ff:ff:ff
Aug 11 17:27:36 linuxsap1 su: pam_unix(su-l:session): session opened for
user oraprd by (uid=0)
Aug 11 17:27:37 linuxsap1 su: pam_unix(su-l:session): session closed for
user oraprd
Aug 11 17:27:38 linuxsap1 rgmanager[14005]: [SAPDatabase] Oracle
Listener LIST_PRD started: Warning: no access to tty (Bad file descriptor).
Aug 11 17:27:38 linuxsap1 Thus no job control in this s
Aug 11 17:27:38 linuxsap1 su: pam_unix(su-l:session): session opened for
user prdadm by (uid=0)
Aug 11 17:27:52 linuxsap1 su: pam_unix(su-l:session): session closed for
user prdadm
Aug 11 17:27:52 linuxsap1 rgmanager[14275]: [SAPDatabase] SAP database
PRD started: Trying to start PRD database ...
Aug 11 17:27:52 linuxsap1 Log file: /home/prdadm/startdb.log
Aug 11 17:27:52 linuxsap1 PRD database start
Aug 11 17:27:52 linuxsap1 rgmanager[14333]: [SAPInstance] sapstartsrv is
not running for instance PRD-DVEBMGS00, it will be started now
Aug 11 17:27:53 linuxsap1 SAPPRD_00[14507]: SAP Service SAPPRD_00
successfully started.
Aug 11 17:27:55 linuxsap1 rgmanager[14539]: [SAPInstance] sapstartsrv
for instance PRD-DVEBMGS00 was restarted !
Aug 11 17:27:55 linuxsap1 rgmanager[14702]: [SAPInstance] Starting SAP
Instance PRD-DVEBMGS00:
Aug 11 17:27:55 linuxsap1 11.08.2012 17:27:55
Aug 11 17:27:55 linuxsap1 Start
Aug 11 17:27:55 linuxsap1 OK
Aug 11 17:28:15 linuxsap1 rgmanager[15169]: [SAPInstance] SAP Instance
PRD-DVEBMGS00 started:
Aug 11 17:28:15 linuxsap1 11.08.2012 17:28:15
Aug 11 17:28:15 linuxsap1 WaitforStarted
Aug 11 17:28:15 linuxsap1 OK
Aug 11 17:28:15 linuxsap1 rgmanager[9801]: Service service:SAP-PRD started”



The cluster.conf is the next

<?xml version="1.0"?>
<cluster config_version="167" name="linuxsap">
        <clusternodes>
                <clusternode name="linuxsap1-priv" nodeid="1">
                        <fence>
                                <method name="scsi">
                                        <device key="1" name="scsi_dev"/>
                                </method>
                        </fence>
                        <unfence>
                                <device action="on" key="1"
name="scsi_dev"/>
                        </unfence>
                </clusternode>
                <clusternode name="linuxsap2-priv" nodeid="2">
                        <fence>
                                <method name="scsi">
                                        <device key="2" name="scsi_dev"/>
                                </method>
                        </fence>
                        <unfence>
                                <device action="on" key="2"
name="scsi_dev"/>
                        </unfence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="3" transport="udpu"/>
        <rm>
                <failoverdomains>
                        <failoverdomain name="FOD-SAP" nofailback="1"
ordered="1" restricted="0">
                                <failoverdomainnode
name="linuxsap1-priv" priority="1"/>
                                <failoverdomainnode
name="linuxsap2-priv" priority="2"/>
                        </failoverdomain>
                        <failoverdomain name="FOD-Oracle" nofailback="1"
ordered="1" restricted="0">
                                <failoverdomainnode
name="linuxsap1-priv" priority="2"/>
                                <failoverdomainnode
name="linuxsap2-priv" priority="1"/>
                        </failoverdomain>
                        <failoverdomain name="FOD-LinuxSap1"
nofailback="0" ordered="0" restricted="1">
                                <failoverdomainnode name="linuxsap1-priv"/>
                        </failoverdomain>
                        <failoverdomain name="FOD-LinuxSap2"
nofailback="0" ordered="0" restricted="1">
                                <failoverdomainnode name="linuxsap2-priv"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <lvm name="vg_PRD_oracle" vg_name="vg_PRD_oracle"/>
                        <lvm name="vg_PRD_sapmnt" vg_name="vg_PRD_sapmnt"/>
                        <lvm name="vg_PRD_usrsap" vg_name="vg_PRD_usrsap"/>
                        <lvm name="vg_PRD_trans" vg_name="vg_PRD_trans"/>
                        <fs device="/dev/vg_PRD_oracle/lv_PRD_orabin"
force_unmount="1" fstype="ext4" mountpoint="/oracle/PRD" name="PRD_orabin"/>
                        <fs device="/dev/vg_PRD_oracle/lv_PRD_oralog1"
force_unmount="1" fstype="ext4" mountpoint="/oracle/PRD/origlogA"
name="PRD_oralog1"/>
                        <fs device="/dev/vg_PRD_oracle/lv_PRD_oralog2"
force_unmount="1" fstype="ext4" mountpoint="/oracle/PRD/origlogB"
name="PRD_oralog2"/>
                        <fs device="/dev/vg_PRD_oracle/lv_PRD_oraarch"
force_unmount="1" fstype="ext4" mountpoint="/oracle/PRD/oraarch"
name="PRD_oraarch"/>
                        <fs device="/dev/vg_PRD_oracle/lv_PRD_sapdata1"
force_unmount="1" fstype="ext4" mountpoint="/oracle/PRD/sapdata1"
name="PRD_sapdata1"/>
                        <fs device="/dev/vg_PRD_sapmnt/lv_PRD_sapmnt"
force_unmount="1" fstype="ext4" mountpoint="/sapmnt/PRD" name="PRD_sapmnt"/>
                        <fs device="/dev/vg_PRD_usrsap/lv_PRD_usrsap"
force_unmount="1" fstype="ext4" mountpoint="/usr/sap/PRD"
name="PRD_usrsap"/>
                        <fs device="/dev/vg_PRD_trans/lv_PRD_trans"
force_unmount="1" fstype="ext4" mountpoint="/usr/sap/transERP"
name="PRD_trans"/>
                        <ip address="10.100.100.104" monitor_link="on"
sleeptime="10"/>
                        <SAPInstance DIR_EXECUTABLE="/sapmnt/PRD/exe"
DIR_PROFILE="/sapmnt/PRD/profile" InstanceName="PRD_DVEBMGS00_sapprd"
POST_STOP_USEREXIT="/usr/sap/PRD/sapsrvstop.sh"
START_PROFILE="/sapmnt/PRD/profile/START_DVEBMGS00_sapprd"
START_WAITTIME="60"/>
                        <SAPDatabase DBTYPE="ORA"
DIR_EXECUTABLE="/sapmnt/PRD/exe" NETSERVICENAME="LIST_PRD"
POST_STOP_USEREXIT="/usr/sap/PRD/sapsrvstop.sh" SID="PRD"/>
                        <fs device="/dev/vg_teszt_10GB/lv_teszt_10GB"
force_unmount="1" fsid="1886" fstype="ext4" mountpoint="/teszt"
name="Teszt_10GB"/>
                        <lvm name="vg_teszt_10GB" vg_name="vg_teszt_10GB"/>
                </resources>
                <service domain="FOD-SAP" name="SAP-PRD"
recovery="relocate">
                        <lvm ref="vg_PRD_oracle">
                                <fs ref="PRD_orabin">
                                        <fs ref="PRD_oralog2"/>
                                        <fs ref="PRD_oralog1"/>
                                        <fs ref="PRD_oraarch"/>
                                        <fs ref="PRD_sapdata1"/>
                                </fs>
                        </lvm>
                        <lvm ref="vg_PRD_sapmnt">
                                <fs ref="PRD_sapmnt"/>
                        </lvm>
                        <lvm ref="vg_PRD_usrsap">
                                <fs ref="PRD_usrsap"/>
                        </lvm>
                        <lvm ref="vg_PRD_trans">
                                <fs ref="PRD_trans"/>
                        </lvm>
                        <ip ref="10.100.100.104"/>
                        <SAPDatabase ref="PRD">
                                <SAPInstance ref="PRD_DVEBMGS00_sapprd"/>
                        </SAPDatabase>
                </service>
        </rm>
        <dlm enable_deadlk="1" enable_quorum="1"/>
        <quorumd label="qdisk_dev"/>
        <fencedevices>
                <fencedevice agent="fence_scsi" aptpl="1"
devices="/dev/mapper/36006016057a01e006226605213c4e111,/dev/mapper/36006016057a01e0080664e5d9fa4e111,/dev/mapper/36006016057a01e00c88c499a9ea4e111,/dev/mapper/36006016057a01e00ecf4bd78beafe111"
logfile="/var/log/cluster/fence_scsi.log" name="scsi_dev"/>
        </fencedevices>
</cluster>


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4925 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120813/d67b6126/attachment.p7s>


More information about the Linux-cluster mailing list