[Linux-cluster] newbie questions

Sat Jul 1 04:17:33 UTC 2006

I see you figured out your multiple ports fencing issue.  Good, that 
saves me a rant about system-config-cluster ... ;-)

First thing to test is that you can configure the IP address manually, 
mount the filesystem, and start apache "the old-fashioned way", using 
the /etc/init.d/httpd script on either machine.

If that works, then I'd guess your problem with the cluster service is 
that the <ip > resource needs to be listed before the <script > 
resource, inside the <service/> block, since apache will bomb if the IP 
address you told it to bind to isn't present (and I assume apache is 
configured to bind to that address).  If that's the case, then you 
should see an error concerning it in the apache error.log.

As far as nothing being logged about the cluster service trying to 
start, it SHOULD be logging in /var/log/messages, but I've seen some 
wierdness with this in the past.  A healthy cluster node should show 
something like this when the service starts:

Jun 22 09:36:51 knob clurgmgrd[3652]: <notice> Starting stopped service 
maps_ip
Jun 22 09:36:51 knob clurgmgrd: [3652]: <info> Adding IPv4 address 
x.y.8.60 to eth0
Jun 22 09:36:52 knob clurgmgrd[3652]: <notice> Service maps_ip started
Jun 22 09:36:52 knob clurgmgrd[3652]: <notice> Starting stopped service 
httpd
Jun 22 09:36:52 knob clurgmgrd: [3652]: <info> Executing 
/etc/init.d/httpd start
Jun 22 09:36:54 knob httpd: httpd startup succeeded
Jun 22 09:36:54 knob clurgmgrd[3652]: <notice> Service httpd started

(I always find the concept of "starting" an IP address faintly 
hilarious), and then you should see something like:

Jun 22 09:37:33 knob clurgmgrd: [3652]: <info> Executing 
/etc/init.d/httpd status

every 30 seconds or so.

That brings me to an important point - the apache init script doesn't 
follow whatever standard RedHat init script are supposed to follow 
(there's a thread about this that I was involved in 6-9 months back), 
with respect to the status command.  At least, it didn't at the time, 
maybe they've fixed it (I hope, by now).  The stop action return(s/ed) 
non-zero (failure) if apache wasn't running.  If the cluster manager 
thinks that service was failed, it will first try to stop it before 
starting it.  If the apache script returns failure on the attempt to 
stop it because it was stopped already, then the cluster manager will 
think something's wrong and never try to start it.  The upshot of which 
is, you have to hack the init script to make it return 0 in this 
situation.  I took the copout approach of just forcing it to always 
return 0:

  stop() {
          echo -n $"Stopping $prog: "
          killproc $httpd
-        RETVAL=$?
+        RETVAL=0 # makes cluster admin less crazy
          echo
          [ $RETVAL = 0 ] && rm -f ${lockfile} ${pidfile}
  }

which should be safe enough (if killproc fails to kill it you've 
probably got bigger problems on your hands), but could be better. 
Someone else may have pasted a better patch on this list, check the 
archives.

I just checked a fresh install of httpd on an AS 4 latest box, and the 
script is still the same.  Convenient, since httpd is the specific 
example service used for setting up a cluster service in the Cluster 
Suite docs.  ;-)

I hope this helps - I'll stop rambling now.

Oh, one other thing - if the filesystem is GFS, why bother 
mounting/unmounting at all?  Just have it mounted in fstab, or make it a 
separate cluster service if you want the extra assurance that it'll stay 
mounted.

-g

Jason wrote:
> ok, one last question, I hope... im following the directions at
> http://www.redhat.com/docs/manuals/csgfs/browse/rh-cs-en/s1-apache-inshttpd.html
> to set up apache as a test... and I cannot see that apache gets started on either of my cluster 
> nodes (only 2)
> the ip address ive configured it as is an unused ip address in the subnet that both boxes are 
> on. how/where can I troubleshoot this? I dont see anything in the logs about the service trying 
> to start.  here is my cluster.config
> 
> 
> <?xml version="1.0"?>
> <cluster config_version="22" name="progressive">
>         <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>         <clusternodes>
>                 <clusternode name="tf1" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="apc_power_switch" option="off" port="1" 
> switch="1"/>
>                                         <device name="apc_power_switch" option="off" port="2" 
> switch="1"/>
>                                         <device name="apc_power_switch" option="on" port="1" 
> switch="1"/>
>                                         <device name="apc_power_switch" option="on" port="2" 
> switch="1"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="tf2" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="apc_power_switch" option="off" port="3" 
> switch="1"/>
>                                         <device name="apc_power_switch" option="off" port="4" 
> switch="1"/>
>                                         <device name="apc_power_switch" option="on" port="3" 
> switch="1"/>
>                                         <device name="apc_power_switch" option="on" port="4" 
> switch="1"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>         </clusternodes>
>         <cman expected_votes="1" two_node="1"/>
>         <fencedevices>
>                 <fencedevice agent="fence_apc" ipaddr="192.168.1.8" login="apc" 
> name="apc_power_switch" passwd="apc"/>
>         </fencedevices>
>         <rm>
>                 <failoverdomains>
>                         <failoverdomain name="httpd" ordered="1" restricted="1">
>                                 <failoverdomainnode name="tf1" priority="1"/>
>                                 <failoverdomainnode name="tf2" priority="2"/>
>                         </failoverdomain>
>                 </failoverdomains>
>                 <resources>
>                         <script file="/etc/init.d/httpd" name="cluster_apache"/>
>                         <fs device="/dev/mapper/diskarray-lv1" fstype="ext3" 
> mountpoint="/mnt/gfs/htdocs" name="apache_content"/>
>                         <ip address="192.168.1.7" monitor_link="1"/>
>                 </resources>
>                 <service autostart="1" domain="httpd" name="Apache Service">
>                         <script ref="cluster_apache"/>
>                         <fs ref="apache_content"/>
>                         <ip ref="192.168.1.7"/>
>                 </service>
>         </rm>
> </cluster>
> 
> 
> ooh the other thing is that I had to lie about the filesystem in which it lives, it only gave 
> me the ext2/ext3 options, (i chose ext3) but its on a gfs partition.
> 
> Jason
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>