[Linux-cluster] Re: Linux-cluster Digest, Vol 43, Issue 37
jialisong at datuu.com
jialisong at datuu.com
Wed Nov 28 01:33:14 UTC 2007
?????GFS6.1???????????????fence???????????
----- Original Message -----
From: <linux-cluster-request at redhat.com>
To: <linux-cluster at redhat.com>
Sent: Wednesday, November 28, 2007 1:01 AM
Subject: Linux-cluster Digest, Vol 43, Issue 37
> Send Linux-cluster mailing list submissions to
> linux-cluster at redhat.com
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://www.redhat.com/mailman/listinfo/linux-cluster
> or, via email, send a message with subject or body 'help' to
> linux-cluster-request at redhat.com
>
> You can reach the person managing the list at
> linux-cluster-owner at redhat.com
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Linux-cluster digest..."
>
>
> Today's Topics:
>
> 1. Re: Tests to demonstrate Red Hat Cluster Behaviour (Lon Hohberger)
> 2. Re: Problems to start ony one cluster service (carlopmart)
> 3. Re: Re: CS4 : problem with multiple IP addresses (Alain Moulle)
> 4. Re: Re: Re: CS4 : problem with multiple IP addresses
> (Patrick Caulfield)
> 5. Any thoughts on losing mount? (isplist at logicore.net)
> 6. Re: Service Recovery Failure (Scott Becker)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 27 Nov 2007 10:36:15 -0500
> From: Lon Hohberger <lhh at redhat.com>
> Subject: Re: [Linux-cluster] Tests to demonstrate Red Hat Cluster
> Behaviour
> To: linux clustering <linux-cluster at redhat.com>
> Message-ID:
> <1196177775.12646.48.camel at ayanami.boston.devel.redhat.com>
> Content-Type: text/plain
>
> On Mon, 2007-11-26 at 20:14 -0500, Eric Kerin wrote:
>> Scott,
>>
>> Not sure if it works with GFS (I would assume so, but I don't have it
>> installed to test) But normally you would run the following to remount
>> an already mounted filesystem in read only mode:
>> mount -o remount,ro <mountpoint>
>>
>> And conversely to remount read-write:
>> mount -o remount,rw <mountpoint>
>
> Should be the same w/ GFS.
>
> -- Lon
>
>
>
> ------------------------------
>
> Message: 2
> Date: Tue, 27 Nov 2007 16:43:36 +0100
> From: carlopmart <carlopmart at gmail.com>
> Subject: Re: [Linux-cluster] Problems to start ony one cluster service
> To: linux clustering <linux-cluster at redhat.com>
> Message-ID: <474C3B28.6090505 at gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Lon Hohberger wrote:
>> On Tue, 2007-11-27 at 11:26 +0100, carlopmart wrote:
>>> Hi all
>>>
>>> I have a very strange problem. I have configured three nodes under RHCS on
>>> rhel5.1 servers. All works ok, except for one service that never starts when
>>> rgmanager start-up. My cluster conf is:
>>>
>>> <?xml version="1.0"?>
>>> <cluster alias="RhelXenCluster" config_version="17" name="RhelXenCluster">
>>> <fence_daemon post_fail_delay="0" post_join_delay="3"/>
>>> <clusternodes>
>>> <clusternode name="rhelclu01.hpulabs.org" nodeid="1" votes="1">
>>> <fence>
>>> <method name="1">
>>> <device name="gnbd-fence"
>>> nodename="rhelclu01.hpulabs.org"/>
>>> </method>
>>> </fence>
>>> <multicast addr="239.192.75.55" interface="eth0"/>
>>> </clusternode>
>>> <clusternode name="rhelclu02.hpulabs.org" nodeid="2" votes="1">
>>> <fence>
>>> <method name="1">
>>> <device name="gnbd-fence"
>>> nodename="rhelclu02.hpulabs.org"/>
>>> </method>
>>> </fence>
>>> <multicast addr="239.192.75.55" interface="eth0"/>
>>> </clusternode>
>>> <clusternode name="rhelclu03.hpulabs.org" nodeid="3" votes="1">
>>> <fence>
>>> <method name="1">
>>> <device name="gnbd-fence"
>>> nodename="rhelclu03.hpulabs.org"/>
>>> </method>
>>> </fence>
>>> <multicast addr="239.192.75.55" interface="xenbr0"/>
>>> </clusternode>
>>> </clusternodes>
>>> <cman expected_votes="1" two_node="0">
>>> <multicast addr="239.192.75.55"/>
>>> </cman>
>>> <fencedevices>
>>> <fencedevice agent="fence_gnbd" name="gnbd-fence"
>>> servers="rhelclu03.hpulabs.org"/>
>>> </fencedevices>
>>> <rm log_facility="local4" log_level="7">
>>> <failoverdomains>
>>> <failoverdomain name="PriCluster" ordered="1"
>>> restricted="1">
>>> <failoverdomainnode
>>> name="rhelclu01.hpulabs.org" priority="1"/>
>>> <failoverdomainnode
>>> name="rhelclu02.hpulabs.org" priority="2"/>
>>> </failoverdomain>
>>> <failoverdomain name="SecCluster" ordered="1"
>>> restricted="1">
>>> <failoverdomainnode
>>> name="rhelclu02.hpulabs.org" priority="1"/>
>>> <failoverdomainnode
>>> name="rhelclu01.hpulabs.org" priority="2"/>
>>> </failoverdomain>
>>> </failoverdomains>
>>> <resources>
>>> <ip address="172.25.50.10" monitor_link="1"/>
>>> <ip address="172.25.50.11" monitor_link="1"/>
>>> <ip address="172.25.50.12" monitor_link="1"/>
>>> <ip address="172.25.50.13" monitor_link="1"/>
>>> <ip address="172.25.50.14" monitor_link="1"/>
>>> <ip address="172.25.50.15" monitor_link="1"/>
>>> <ip address="172.25.50.16" monitor_link="1"/>
>>> <ip address="172.25.50.17" monitor_link="1"/>
>>> <ip address="172.25.50.18" monitor_link="1"/>
>>> <ip address="172.25.50.19" monitor_link="1"/>
>>> <ip address="172.25.50.20" monitor_link="1"/>
>>> </resources>
>>> <service autostart="1" domain="PriCluster" name="dns-svc"
>>> recovery="relocate">
>>> <ip ref="172.25.50.10">
>>> <script
>>> file="/data/cfgcluster/etc/init.d/named" name="named"/>
>>> </ip>
>>> </service>
>>> <service autostart="1" domain="SecCluster" name="mail-svc"
>>> recovery="relocate">
>>> <ip ref="172.25.50.11">
>>> <script
>>> file="/data/cfgcluster/etc/init.d/postfix-cluster" name="postfix"/>
>>> </ip>
>>> </service>
>>> <service autostart="1" domain="SecCluster" name="rsync-svc"
>>> recovery="relocate">
>>> <ip ref="172.25.50.13">
>>> <script
>>> file="/data/cfgcluster/etc/init.d/rsyncd" name="rsyncd"/>
>>> </ip>
>>> </service>
>>> <service autostart="1" domain="PriCluster" name="wwwsoft-svc"
>>> recovery="relocate">
>>> <ip ref="172.25.50.14">
>>> <script
>>> file="/data/cfgcluster/etc/init.d/httpd-mirror" name="httpd-mirror"/>
>>> </ip>
>>> </service>
>>> <service autostart="1" domain="SecCluster" name="proxy-svc"
>>> recovery="relocate">
>>> <ip ref="172.25.50.15">
>>> <script
>>> file="/data/cfgcluster/etc/init.d/squid" name="squid"/>
>>> </ip>
>>> </service>
>>> </rm>
>>> </cluster>
>>>
>>> The service that returns me errors and never starts when rgmanager start-up is
>>> postfix-cluster. On maillog file I find this error:
>>
>>
>>> Nov 26 11:27:31 rhelclu01 postfix[27959]: fatal: parameter inet_interfaces: no
>>> local interface found for 172.25.50.11
>>> Nov 26 11:27:43 rhelclu01 postfix[28313]: fatal:
>>> /data/cfgcluster/etc/postfix-cluster/postfix-script: Permission denied
>>
>>> but thath's not true. If I start this service manually all works ok. Postfix
>>> configuration it is ok, What can be the problem??? I don't know why rgmanager
>>> dosen't config 172.25.50.11 address before execute postfix-cluster service ....
>>
>
> Hi Lon,
>
>
>> When you start it manually -- how?
>> * add IP manually / running the script?
> Yes, and it works.
>
>> * rg_test?
>
> Works.
>
>
>> * clusvcadm -e?
>
> Sometimes works, sometimes not. I need to disable service first, and sometimes
> when I try to re-enable works and other not.
>>
>> -- Lon
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
> --
> CL Martinez
> carlopmart {at} gmail {d0t} com
>
>
>
> ------------------------------
>
> Message: 3
> Date: Tue, 27 Nov 2007 17:26:01 +0100
> From: Alain Moulle <Alain.Moulle at bull.net>
> Subject: [Linux-cluster] Re: Re: CS4 : problem with multiple IP
> addresses
> To: linux-cluster at redhat.com
> Message-ID: <474C4519.8060205 at bull.net>
> Content-Type: text/plain; charset=us-ascii
>
> Hi Patrick
>
> you mean like this in cluster.conf :
> <clusternodes>
> <clusternode name="192.168.1.2" votes="1">
> <fence>
> <method name="1">
> <device name="NODE_NAMEfence" option="reboot"/>
> </method>
> </fence>
> </clusternode>
> ..
>
> ???
>
> and if so, we should use "cman_tool join -d -n 192.168.1.2" instead
> of "service cman start"
>
> Is this right ?
>
> Thanks
> Regards
> Alain
>
>> Setting the IP address in cluster.conf and starting the cluster like
>> this works:
>> cman_tool join -d -n 192.168.1.2
>> What you have sounds like a bug, can you give us some more information
>> please ? cluster.conf files, errors from 'cman_tool join -d' and output
>> from dnslookup/host ?
>> Thanks
>> Patrick
>
>
>
> ------------------------------
>
> Message: 4
> Date: Tue, 27 Nov 2007 16:34:09 +0000
> From: Patrick Caulfield <pcaulfie at redhat.com>
> Subject: Re: [Linux-cluster] Re: Re: CS4 : problem with multiple IP
> addresses
> To: linux clustering <linux-cluster at redhat.com>
> Message-ID: <474C4701.30602 at redhat.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Alain Moulle wrote:
>> Hi Patrick
>>
>> you mean like this in cluster.conf :
>> <clusternodes>
>> <clusternode name="192.168.1.2" votes="1">
>> <fence>
>> <method name="1">
>> <device name="NODE_NAMEfence" option="reboot"/>
>> </method>
>> </fence>
>> </clusternode>
>> ...
>>
>> ???
>>
>> and if so, we should use "cman_tool join -d -n 192.168.1.2" instead
>> of "service cman start"
>>
>> Is this right ?
>
> Well, it's nasty but it worked for me. I'm happy to actually fix the bug
> if I can reproduce it.
>
> Patrick
>
>
>
> ------------------------------
>
> Message: 5
> Date: Tue, 27 Nov 2007 10:34:18 -0600
> From: "isplist at logicore.net" <isplist at logicore.net>
> Subject: [Linux-cluster] Any thoughts on losing mount?
> To: linux-cluster <linux-cluster at redhat.com>
> Message-ID: <20071127103418.715496 at leena>
> Content-Type: text/plain; charset="iso-8859-1"
>
> I'm pulling my hair out here :).
> One node in my cluster has decided that it doesn't want to mount a storage
> partition which other nodes are not having a problem with. The console
> messages say that there is an inconsistency in the filesystem yet none of the
> other nodes are complaining.
>
> I cannot figure this one out so am hoping someone on the list can give me some
> leads on what else to look for as I do not want to cause any new problems.
>
> Mike
>
>
> Nov 27 10:29:26 compdev kernel: GFS: Trying to join cluster "lock_dlm",
> "vgcomp:web"
> Nov 27 10:29:28 compdev kernel: GFS: fsid=vgcomp:web.3: Joined cluster. Now
> mounting FS...
> Nov 27 10:29:28 compdev kernel: GFS: fsid=vgcomp:web.3: jid=3: Trying to
> acquire journal lock...
> Nov 27 10:29:28 compdev kernel: GFS: fsid=vgcomp:web.3: jid=3: Looking at
> journal...
> Nov 27 10:29:28 compdev kernel: GFS: fsid=vgcomp:web.3: jid=3: Done
> Nov 27 10:29:28 compdev kernel: GFS: fsid=vgcomp:web.3: Scanning for log
> elements...
> Nov 27 10:29:28 compdev kernel: GFS: fsid=vgcomp:web.3: Found 1 unlinked
> inodes
> Nov 27 10:29:28 compdev kernel: GFS: fsid=vgcomp:web.3: Found quota changes
> for 0 IDs
> Nov 27 10:29:28 compdev kernel: GFS: fsid=vgcomp:web.3: Done
> Nov 27 10:29:35 compdev kernel: GFS: fsid=vgcomp:web.3: fatal: filesystem
> consistency error
> Nov 27 10:29:35 compdev kernel: GFS: fsid=vgcomp:web.3: RG = 31104599
> Nov 27 10:29:35 compdev kernel: GFS: fsid=vgcomp:web.3: function =
> gfs_setbit
> Nov 27 10:29:35 compdev kernel: GFS: fsid=vgcomp:web.3: file =
> /home/xos/gen/updates-2007-11/xlrpm29472/rpm/BUILD/gfs-kernel-2.6.9-72/up/src/
> gfs/bits.c, line = 71
> Nov 27 10:29:35 compdev kernel: GFS: fsid=vgcomp:web.3: time = 1196180975
> Nov 27 10:29:35 compdev kernel: GFS: fsid=vgcomp:web.3: about to withdraw from
> the cluster
> Nov 27 10:29:35 compdev kernel: GFS: fsid=vgcomp:web.3: waiting for
> outstanding I/O
> Nov 27 10:29:35 compdev kernel: GFS: fsid=vgcomp:web.3: telling LM to withdraw
> Nov 27 10:29:37 compdev kernel: lock_dlm: withdraw abandoned memory
> Nov 27 10:29:37 compdev kernel: GFS: fsid=vgcomp:web.3: withdrawn
>
>
>
>
>
>
> ------------------------------
>
> Message: 6
> Date: Tue, 27 Nov 2007 08:52:34 -0800
> From: Scott Becker <scottb at bxwa.com>
> Subject: Re: [Linux-cluster] Service Recovery Failure
> To: linux clustering <linux-cluster at redhat.com>
> Message-ID: <474C4B52.6080400 at bxwa.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
>
>
> Lon Hohberger wrote:
>> On Mon, 2007-11-26 at 14:36 -0800, Scott Becker wrote:
>>
>>
>>> openais[9498]: [CLM ] CLM CONFIGURATION CHANGE
>>> openais[9498]: [CLM ] New Configuration:
>>> kernel: dlm: closing connection to node 3
>>> fenced[9568]: 205.234.65.133 not a cluster member after 0 sec
>>> post_fail_delay
>>> openais[9498]: [CLM ] r(0) ip(205.234.65.132)
>>> openais[9498]: [CLM ] Members Left:
>>> openais[9498]: [CLM ] r(0) ip(205.234.65.133)
>>> openais[9498]: [CLM ] Members Joined:
>>> openais[9498]: [CLM ] CLM CONFIGURATION CHANGE
>>> openais[9498]: [CLM ] New Configuration:
>>> openais[9498]: [CLM ] r(0) ip(205.234.65.132)
>>> openais[9498]: [CLM ] Members Left:
>>> openais[9498]: [CLM ] Members Joined:
>>> openais[9498]: [SYNC ] This node is within the primary component and
>>> will provide service.
>>> openais[9498]: [TOTEM] entering OPERATIONAL state.
>>> openais[9498]: [CLM ] got nodejoin message 205.234.65.132
>>> openais[9498]: [CPG ] got joinlist message from node 2
>>>
>>
>> Did it even try to run the fence_apc agent? It should have done
>> *something* - it didn't even look like it tried to fence.
>>
>> -- Lon
>>
>>
> No sign of an attempt. How do I turn up the verbosity of fenced? I'll
> repeat the test. The only mention I can find is -D but I don't know how
> I can use that. I'll browse the source and see if I can learn anything.
> I'm using 2.0.73.
>
> thanks
> scottb
>
>
>
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: https://www.redhat.com/archives/linux-cluster/attachments/20071127/09991466/attachment.html
>
> ------------------------------
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> End of Linux-cluster Digest, Vol 43, Issue 37
> *********************************************
>
More information about the Linux-cluster
mailing list