[Linux-cluster] Clustering tomcat
Digimer
lists at alteeve.ca
Wed Apr 11 15:44:19 UTC 2012
On 04/11/2012 07:24 AM, Sadvary, Bill wrote:
>
> Hi,
>
> I'm having some difficulty getting a tomcat cluster service up and running with Centos v6.2 and Tomcat6.
>
> The service won't start tomcat and it keeps ping-ponging back and forth between the servers every 30 seconds.
>
> Below is the cluster.conf file, "messages" file and the rgmanager.log
>
> Any help would be appreciated.
>
> Thanks,
> -Bill
>
>
> Here's my cluster.conf
> ---------------------------
>
> <?xml version="1.0"?>
> <cluster config_version="11" name="AUTHCLUSTERDEV">
> <cman expected_votes="1" two_node="1"/>
> <clusternodes>
> <clusternode name="AUTHCLUSTER1DEV" nodeid="1">
> <fence>
> <method name="single"/>
> </fence>
> </clusternode>
> <clusternode name="AUTHCLUSTER2DEV" nodeid="2">
> <fence>
> <method name="single"/>
> </fence>
> </clusternode>
> </clusternodes>
> <rm>
> <failoverdomains>
> <failoverdomain name="failoverDom" nofailback="1" ordered="0" restricted="0">
> <failoverdomainnode name="AUTHCLUSTER1DEV" priority="1"/>
> <failoverdomainnode name="AUTHCLUSTER2DEV" priority="1"/>
> </failoverdomain>
> </failoverdomains>
> <resources>
> <ip address="172.16.223.69" monitor_link="1"/>
> <tomcat-6 config_file="/etc/tomcat6/tomcat6.conf" name="tomcat6" shutdown_wait="30"/>
> </resources>
> <service domain="failoverDom" name="ipservice" recovery="relocate">
> <ip ref="172.16.223.69">
> <tomcat-6 ref="tomcat6"/>
> </ip>
> </service>
> </rm>
> <logging debug="on"/>
> </cluster>
>
> Here's the "messages" file after one full cycle of ping-pongs
> ------------------------------------------------------------------------
> Apr 10 10:09:44 DKNAUTH1DEV rgmanager[2191]: Service service:ipservice is now running on member 2
> Apr 10 10:10:55 DKNAUTH1DEV rgmanager[2191]: Recovering failed service service:ipservice
> Apr 10 10:10:56 DKNAUTH1DEV rgmanager[8695]: [ip] Adding IPv4 address 172.16.223.69/28 to eth2
> Apr 10 10:11:00 DKNAUTH1DEV rgmanager[8837]: [tomcat-6] Starting Service tomcat-6:tomcat6
> Apr 10 10:11:00 DKNAUTH1DEV ntpd[1938]: Listening on interface #81 eth2, 172.16.223.69#123 Enabled
> Apr 10 10:11:01 DKNAUTH1DEV rgmanager[2191]: Service service:ipservice started
> Apr 10 10:12:09 DKNAUTH1DEV rgmanager[9694]: [tomcat-6] Checking Existence Of File /var/run/cluster/tomcat-6/tomcat-6:tomcat6.pid [tomcat-6:tomcat6] > Failed
> Apr 10 10:12:09 DKNAUTH1DEV rgmanager[9714]: [tomcat-6] Monitoring Service tomcat-6:tomcat6 > Service Is Not Running
> Apr 10 10:12:09 DKNAUTH1DEV rgmanager[2191]: status on tomcat-6 "tomcat6" returned 1 (generic error)
> Apr 10 10:12:09 DKNAUTH1DEV rgmanager[2191]: Stopping service service:ipservice
> Apr 10 10:12:09 DKNAUTH1DEV rgmanager[9805]: [tomcat-6] Stopping Service tomcat-6:tomcat6
> Apr 10 10:12:10 DKNAUTH1DEV rgmanager[9825]: [tomcat-6] Checking Existence Of File /var/run/cluster/tomcat-6/tomcat-6:tomcat6.pid [tomcat-6:tomcat6] > Failed - File Doesn'
> Apr 10 10:12:10 DKNAUTH1DEV rgmanager[9845]: [tomcat-6] Stopping Service tomcat-6:tomcat6 > Succeed
> Apr 10 10:12:10 DKNAUTH1DEV rgmanager[9896]: [ip] Removing IPv4 address 172.16.223.69/28 from eth2
> Apr 10 10:12:11 DKNAUTH1DEV ntpd[1938]: Deleting interface #81 eth2, 172.16.223.69#123, interface stats: received=0, sent=0, dropped=0, active_time=71 secs
> Apr 10 10:12:20 DKNAUTH1DEV rgmanager[2191]: Service service:ipservice is recovering
> Apr 10 10:12:24 DKNAUTH1DEV rgmanager[2191]: Service service:ipservice is now running on member 2
>
> The rgmanager.log for the same time duration
> --------------------------------------------------------
> Apr 10 10:09:44 rgmanager Service service:ipservice is now running on member 2
> Apr 10 10:09:49 rgmanager 2 events processed
> Apr 10 10:10:55 rgmanager Recovering failed service service:ipservice
> Apr 10 10:10:56 rgmanager [ip] Link for eth2: Detected
> Apr 10 10:10:56 rgmanager [ip] Adding IPv4 address 172.16.223.69/28 to eth2
> Apr 10 10:10:56 rgmanager [ip] Pinging addr 172.16.223.69 from dev eth2
> Apr 10 10:10:59 rgmanager [ip] Sending gratuitous ARP: 172.16.223.69 00:15:5d:98:91:05 brd ff:ff:ff:ff:ff:ff
> Apr 10 10:11:00 rgmanager [tomcat-6] Verifying Configuration Of tomcat-6:tomcat6
> Apr 10 10:11:00 rgmanager [tomcat-6] Verifying Configuration Of tomcat-6:tomcat6 > Succeed
> Apr 10 10:11:00 rgmanager [tomcat-6] Starting Service tomcat-6:tomcat6
> Apr 10 10:11:00 rgmanager 1 events processed
> Apr 10 10:11:00 rgmanager [tomcat-6] Looking For IP Addresses
> Apr 10 10:11:01 rgmanager [tomcat-6] 1 IP addresses found for ipservice/tomcat6
> Apr 10 10:11:01 rgmanager [tomcat-6] Looking For IP Addresses > Succeed - IP Addresses Found
> Apr 10 10:11:01 rgmanager [tomcat-6] Checking: SHA1 checksum of config file /tomcat-6/tomcat-6:tomcat6/conf/server.xml
> Apr 10 10:11:01 rgmanager [tomcat-6] Checking: SHA1 checksum > succeed
> Apr 10 10:11:01 rgmanager [tomcat-6] Generating New Config File /tomcat-6/tomcat-6:tomcat6/conf/server.xml From /usr/share/tomcat6/conf/server.xml
> Apr 10 10:11:01 rgmanager [tomcat-6] Generating New Config File /tomcat-6/tomcat-6:tomcat6/conf/server.xml From /usr/share/tomcat6/conf/server.xml > SucApr 10 10:11:01 rgmanager [tomcat-6] Starting Service tomcat-6:tomcat6 > Succeed
> Apr 10 10:11:01 rgmanager Service service:ipservice started
> Apr 10 10:11:07 rgmanager 1 events processed
> Apr 10 10:11:29 rgmanager [ip] Checking 172.16.223.69, Level 0
> Apr 10 10:11:29 rgmanager [ip] 172.16.223.69 present on eth2
> Apr 10 10:11:29 rgmanager [ip] Link for eth2: Detected
> Apr 10 10:11:29 rgmanager [ip] Link detected on eth2
> Apr 10 10:11:49 rgmanager [ip] Checking 172.16.223.69, Level 0
> Apr 10 10:11:49 rgmanager [ip] 172.16.223.69 present on eth2
> Apr 10 10:11:49 rgmanager [ip] Link for eth2: Detected
> Apr 10 10:11:49 rgmanager [ip] Link detected on eth2
> Apr 10 10:12:09 rgmanager [ip] Checking 172.16.223.69, Level 10
> Apr 10 10:12:09 rgmanager [ip] 172.16.223.69 present on eth2
> Apr 10 10:12:09 rgmanager [ip] Link for eth2: Detected
> Apr 10 10:12:09 rgmanager [ip] Link detected on eth2
> Apr 10 10:12:09 rgmanager [ip] Local ping to 172.16.223.69 succeeded
> Apr 10 10:12:09 rgmanager [tomcat-6] Verifying Configuration Of tomcat-6:tomcat6
> Apr 10 10:12:09 rgmanager [tomcat-6] Verifying Configuration Of tomcat-6:tomcat6 > Succeed
> Apr 10 10:12:09 rgmanager [tomcat-6] Monitoring Service tomcat-6:tomcat6
> Apr 10 10:12:09 rgmanager [tomcat-6] Checking Existence Of File /var/run/cluster/tomcat-6/tomcat-6:tomcat6.pid [tomcat-6:tomcat6] > Failed
> Apr 10 10:12:09 rgmanager [tomcat-6] Monitoring Service tomcat-6:tomcat6 > Service Is Not Running
> Apr 10 10:12:09 rgmanager status on tomcat-6 "tomcat6" returned 1 (generic error)
> Apr 10 10:12:09 rgmanager Stopping service service:ipservice
> Apr 10 10:12:09 rgmanager [tomcat-6] Verifying Configuration Of tomcat-6:tomcat6
> Apr 10 10:12:09 rgmanager [tomcat-6] Verifying Configuration Of tomcat-6:tomcat6 > Succeed
> Apr 10 10:12:09 rgmanager [tomcat-6] Stopping Service tomcat-6:tomcat6
> Apr 10 10:12:10 rgmanager [tomcat-6] Checking Existence Of File /var/run/cluster/tomcat-6/tomcat-6:tomcat6.pid [tomcat-6:tomcat6] > Failed - File Doesn'Apr 10 10:12:10 rgmanager [tomcat-6] Stopping Service tomcat-6:tomcat6 > Succeed
> Apr 10 10:12:10 rgmanager [ip] Removing IPv4 address 172.16.223.69/28 from eth2
> Apr 10 10:12:20 rgmanager Service service:ipservice is recovering
> Apr 10 10:12:20 rgmanager Sent remote-start request to 2
> Apr 10 10:12:24 rgmanager Service service:ipservice is now running on member 2
> Apr 10 10:12:29 rgmanager 2 events processed
> Apr 10 10:12:39 rgmanager Forwarding req. to AUTHCLUSTER2DEV.
> Apr 10 10:12:40 rgmanager FW: Forwarding disable request to 2
> Apr 10 10:12:55 rgmanager 1 events processed
I've not used tomcat (or it's RA), so I can't speak to it specifically.
It looks like the RA is returning a bad exit code though... If you look
at /usr/share/cluster/tomcat-6.sh, you might be able to suss out what it
is failing on.
As an aside; you need a proper fence device. As it is now, a node
failure will hang your cluster as 'single' is not defined from what I
see. Have you tested a node failure?
--
Digimer
Papers and Projects: https://alteeve.com
More information about the Linux-cluster
mailing list