[Linux-cluster] Web services 2 node cluster

Tue May 16 13:06:10 UTC 2006

Good Day everyone.

I have read archives but could not find answer for my case.

Situation:
2 node cluster with 5-6 web demons (Apache) running on virtual IP,
system is RedHat Enterprise 4 (Nahant Update 2).

cluster.conf looks like:
---------------- cut ----------------
<?xml version="1.0" ?>
<cluster config_version="9" name="www-cluster">
        <fence_daemon clean_start="0" post_fail_delay="0"
post_join_delay="3"/>
        <clusternodes>
                <clusternode name="emisweb04" votes="1">
                        <multicast addr="225.0.0.13" interface="eth0"/>
                        <fence/>
                </clusternode>
                <clusternode name="emisweb02" votes="1">
                        <multicast addr="225.0.0.13" interface="eth0"/>
                        <fence/>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1">
                <multicast addr="225.0.0.13"/>
        </cman>
        <fencedevices/>
        <rm>
                <failoverdomains>
                        <failoverdomain name="EMIS_Service" ordered="0" restricted="0">
                                <failoverdomainnode name="emisweb04" priority="1"/>
                                <failoverdomainnode name="emisweb02" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <ip address="192.168.200.26" monitor_link="1"/>
                        <script file="/etc/rc.d/init.d/apache" name="ISI Web Server"/>
                </resources>
                <service autostart="1" domain="EMIS_Service" exclusive="1" name="EMIS Web Server">
                        <ip ref="192.168.200.26"/>
                        <script ref="Multi Web Server"/>
                </service>
        </rm>
</cluster>

---------------- cut ----------------

It generally comes up and runs. Everything is on local network then no
router is involved. 
Worse comes when i try give it stresstest like shut down interface eth0.
With one httpd demon running service switches to failover node in most
casea. Generally cluster works for 90% of what I expect from it.

But with more web demons up it is not functioning properly.
Possibly /etc/rc.d/init.d/apache is written in a way improper for
cluster as it gives result:

[root at emisweb04 ~]# service apache status
httpd (pid 3344 3343 3342 3341 3340 3339 3338 3337 3336 3335 3334 3333
3332 3331 3316 3315 3314 3313 3312 3311 3310 3309 3308 3307 3306 3305
3304 3303 3302 3301 3300 3292 3291 3290 3289 3288 3287 3286 3285 3284
3283 3279 3278 3277 3276 3275 3274 3273 3272 3271 3270 3269 3268 3267
3262 3257 3252) is running...

I may kill some of demons but status will not change - many processes
will be shown as running then cluster is fooled. It doesn't react 
and does not bring killed demons back up.

Well - this is easy to resolve - I may split demons for separate services.

Worse thing is that after bringing eth0 back up multicast addres
doesn't come up and cluster demons on this node need to be restarted.

Yet more worse is that when i try to stop cluster - cman processes 
remain and in a hard way - it is not possible to kill them:

They are:
     |-cman_comms
     |-cman_hbeat
     |-cman_memb
     |-events/3---cman_serviced

Then virtual IP is not released and custer cannot me started back:
/var/log/messages:
dlm: process_cluster_request invalid lockspace 1000003 from 2 req 1
dlm: process_cluster_request invalid lockspace 1000003 from 2 req 1

Must I set fencing ? 
(for now it is not set). I'd prefer not to get into reboot loop if 
something goes wrong.

Any hint appreciated ...
May Power be with you.
-- 

          Czeslaw M <system_admin at pah156.warszawa.sdi.tpnet.pl>