[Linux-cluster] virtual address went down?

Wed Oct 18 19:29:36 UTC 2006

since no one responded to my question, I tried stopping the services on 
the first box:
service rgmanager stop
service gfs stop
service clvmd stop
service fenced stop
service cman stop
service ccsd stop

everything came down fine.
then I started em back up..
service ccsd start
this seemed to hang for about 2 minutes, then I got a panic..
as shown in the attached graphic..

this is on  2.6.9-34.ELsmp redhat  Enterprise Linux AS release 4 (Nahant 
Update 4)
running ccs-1.0.3-0,
cman-kernel-hugemem-2.6.9-43.8
cman-kernel-2.6.9-43.8
cman-1.0.4-0
cman-kernel-smp-2.6.9-43.8
cman-kernheaders-2.6.9-43.8

 build from sources..

heres my cluster.conf

<?xml version="1.0"?>
<cluster config_version="22" name="progressive">
        <fence_daemon clean_start="0" post_fail_delay="0" 
post_join_delay="3"/>
        <clusternodes>
                <clusternode name="tf1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="apc_power_switch" 
option="                                                        off" 
port="1" switch="1"/>
                                        <device name="apc_power_switch" 
option="                                                        off" 
port="2" switch="1"/>
                                        <device name="apc_power_switch" 
option="                                                        on" 
port="1" switch="1"/>
                                        <device name="apc_power_switch" 
option="                                                        on" 
port="2" switch="1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="tf2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="apc_power_switch" 
option="                                                        off" 
port="3" switch="1"/>
                                        <device name="apc_power_switch" 
option="                                                        off" 
port="4" switch="1"/>
                                        <device name="apc_power_switch" 
option="                                                        on" 
port="3" switch="1"/>
                                        <device name="apc_power_switch" 
option="                                                        on" 
port="4" switch="1"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_apc" ipaddr="192.168.1.8" 
login="xxx"                                                         
name="apc_power_switch" passwd="xxx"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="httpd" ordered="1" 
restricted="1">
                                <failoverdomainnode name="tf1" 
priority="1"/>
                                <failoverdomainnode name="tf2" 
priority="2"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <script file="/etc/init.d/httpd" 
name="cluster_apache"/>
                        <fs device="/dev/mapper/diskarray-lv1" 
fstype="ext3" mou                                                        
ntpoint="/mnt/gfs/htdocs" name="apache_content"/>
                        <ip address="192.168.1.7" monitor_link="1"/>
                </resources>
                <service autostart="1" domain="httpd" name="Apache 
Service">
                        <ip ref="192.168.1.7"/>
                        <script ref="cluster_apache"/>
                        <fs ref="apache_content"/>
                </service>
        </rm>
</cluster>

ooh and shortly after the first box came back up, the second one got 
rebooted automagically (power fenced from the first one im guessing) for 
good measure.

but now the virtual address is working again.

any help appreciated 

Jason

On Tue, Oct 17, 2006 at 09:37:15PM -0400, jason at monsterjam.org wrote:
> so Ive had a test cluster running for quite a while now, both nodes of a 2 node cluster are up, 
> but the virtual address seems to have disappeared.. its not pingable, neither server has it 
> configured anymore.. The only application I had using the virtual address was apache (just for 
> testing it). what logs/information should I be looking at to see what happened and why?
> 
> regards,
> Jason
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
================================================
|    Jason Welsh   jason at monsterjam.org        |
| http://monsterjam.org    DSS PGP: 0x5E30CC98 |
|    gpg key: http://monsterjam.org/gpg/       |
================================================