[Linux-cluster] Problem in virtual cluster [SOLVED]

Fri Jan 25 12:37:15 UTC 2008

Just for the record...

Solved adding tiebreaker ip.

Thanks
Nuno Fernandes

On Friday 25 January 2008 12:18:07 Nuno Fernandes wrote:
> Ahh.. forgot cluster.xml and i'm using 2.6.18-8.1.14.el5xen kernel.
>
> <?xml version="1.0"?>
> <cluconfig version="3.0">
>   <clumembd broadcast="yes" interval="750000" loglevel="5" multicast="no"
> multicast_ipaddress="" thread="yes" tko_count="20"/>
>   <cluquorumd loglevel="5" pinginterval="" tiebreaker_ip=""/>
>   <clurmtabd loglevel="5" pollinterval="4"/>
>   <clusvcmgrd loglevel="5"/>
>   <clulockd loglevel="5"/>
>   <cluster config_viewnumber="3" key="975b29840bb8835ce57b0fff3354fabc"
> name="Cluster"/>
>   <sharedstate driver="libsharedraw.so" rawprimary="/dev/raw/raw1"
> rawshadow="/dev/raw/raw2" type="raw"/>
>   <members>
>     <member id="0" name="cl1" watchdog="yes">
>     </member>
>     <member id="1" name="cl2" watchdog="yes"/>
>   </members>
>   <services>
>     <service checkinterval="20" failoverdomain="None" id="0"
> maxfalsestarts="0" maxrestarts="0" name="mysql"
> userscript="/etc/init.d/mysql1">
>       <service_ipaddresses>
>         <service_ipaddress broadcast="172.30.5.255" id="0"
> ipaddress="172.30.5.113" monitor_link="0" netmask="255.255.255.0"/>
>       </service_ipaddresses>
>       <device id="0" name="/dev/hda5" sharename="">
>         <mount forceunmount="yes" fstype="ext3"
> mountpoint="/var/lib/mysql1" options="sync,rw,nosuid"/>
>       </device>
>     </service>
>     <service checkinterval="0" failoverdomain="None" id="1"
> maxfalsestarts="0" maxrestarts="0" name="nfs" userscript="None">
>       <service_ipaddresses>
>         <service_ipaddress broadcast="172.30.5.255" id="0"
> ipaddress="172.30.5.114" monitor_link="0" netmask="255.255.255.0"/>
>       </service_ipaddresses>
>     </service>
>   </services>
>   <failoverdomains/>
> </cluconfig>
>
> Thanks
> Nuno Fernandes
>
> On Friday 25 January 2008 12:00:06 Nuno Fernandes wrote:
> > Hi,
> >
> > I'm in the process of migrating a cluster of two nodes to two virtual
> > machines.
> >
> > The real servers have clumanager-1.0.28-1 (RHEL3/CentOS3).
> > I've migrated all the filesystems and started of the process of
> > reconfiguring the cluster.
> >
> > The real servers clustat:
> >
> > Cluster Status Monitor (Cluster)                              11:50:14
> >
> > Cluster alias: Not Configured
> >
> > =========================  M e m b e r   S t a t u s
> > ==========================
> >
> >   Member         Status     Node Id    Power Switch
> >   -------------- ---------- ---------- ------------
> >   cl1            Up         0          Good
> >   cl2            Up         1          Good
> >
> > =========================  H e a r t b e a t   S t a t u s
> > ====================
> >
> >   Name                           Type       Status
> >   ------------------------------ ---------- ------------
> >   cl1          <--> cl2          network    ONLINE
> >   cln1         <--> cln2         network    ONLINE
> >
> > =========================  S e r v i c e   S t a t u s
> > ========================
> >
> >                                          Last             Monitor 
> > Restart Service        Status   Owner          Transition       Interval
> > Count -------------- -------- -------------- ---------------- --------
> > ------- mysql1         started  cl2            00:16:28 Oct 23  10      
> > 1 nfs            started  cl2            23:20:58 Oct 08  10       0
> >
> >
> > Everything is about the same in the virtual cluster, except that they
> > don't have any powerwitch, there is only one network. They both use
> > network and quorum to check if the other node is ok.
> >
> > The problem is in the virtual cluster. I've upgraded to
> > clumanager-1.2.34-3 in the virtual cluster to check if it was an bug in
> > the previous one. Both nodes can't see each other through the network.
> > They think the other is Inactive. As i start cl1 clumanager i get:
> >
> > Jan 25 11:52:22 cl1 clumanager: [15039]: <notice> Starting Red Hat
> > Cluster Manager...
> > Jan 25 11:52:22 cl1 cluquorumd[15053]: <warning> STONITH: No drivers
> > configured for host 'cl1'!
> > Jan 25 11:52:22 cl1 cluquorumd[15053]: <warning> STONITH: Data integrity
> > may be compromised!
> > Jan 25 11:52:22 cl1 cluquorumd[15053]: <warning> STONITH: No drivers
> > configured for host 'cl2'!
> > Jan 25 11:52:22 cl1 cluquorumd[15053]: <warning> STONITH: Data integrity
> > may be compromised!
> > Jan 25 11:52:22 cl1 clumanager: cluquorumd startup succeeded
> > Jan 25 11:52:33 cl1 clumembd[15056]: <notice> Member cl1 UP
> > Jan 25 11:52:34 cl1 cluquorumd[15054]: <notice> Quorum Formed; Starting
> > Service Manager
> > Jan 25 11:52:34 cl1 clusvcmgrd: [15067]: <notice> service notice:
> > Stopping service mysql ...
> > Jan 25 11:52:35 cl1 clusvcmgrd: [15067]: <notice> service notice: Running
> > user script '/etc/init.d/mysql1 stop'
> > Jan 25 11:52:37 cl1 clusvcmgrd: [15067]: <notice> service notice: Stopped
> > service mysql ...
> > Jan 25 11:52:37 cl1 clusvcmgrd: [15244]: <notice> service notice:
> > Stopping service nfs ...
> > Jan 25 11:52:37 cl1 clusvcmgrd: [15244]: <notice> service notice: Stopped
> > service nfs ...
> > Jan 25 11:52:37 cl1 clusvcmgrd[15381]: <notice> Starting stopped service
> > mysql Jan 25 11:52:37 cl1 clusvcmgrd[15395]: <notice> Starting stopped
> > service nfs Jan 25 11:52:37 cl1 clusvcmgrd: [15382]: <notice> service
> > notice: Starting service mysql ...
> > Jan 25 11:52:37 cl1 clusvcmgrd: [15420]: <notice> service notice:
> > Starting service nfs ...
> > Jan 25 11:52:37 cl1 kernel: kjournald starting.  Commit interval 5
> > seconds Jan 25 11:52:37 cl1 kernel: EXT3 FS on hda5, internal journal
> > Jan 25 11:52:37 cl1 kernel: EXT3-fs: mounted filesystem with ordered data
> > mode.
> > Jan 25 11:52:37 cl1 /sbin/hotplug: no runnable /etc/hotplug/block.agent
> > is installed
> > Jan 25 11:52:38 cl1 clusvcmgrd: [15382]: <notice> service notice: Running
> > user script '/etc/init.d/mysql1 start'
> > Jan 25 11:52:38 cl1 clusvcmgrd: [15382]: <notice> service notice: Started
> > service mysql ...
> > Jan 25 11:52:38 cl1 clusvcmgrd: [15420]: <notice> service notice: Started
> > service nfs ...
> >
> > Everything seems ok... Then i start cl2's clumanager:
> >
> > cl2 -bash: (1836) [root.root] |.| /etc/init.d/clumanager start
> > Jan 25 11:54:56 cl2 clumanager: [7651]: <notice> Starting Red Hat Cluster
> > Manager...
> > Jan 25 11:54:56 cl2 cluquorumd[7665]: <warning> STONITH: No drivers
> > configured for host 'cl1'!
> > Jan 25 11:54:56 cl2 cluquorumd[7665]: <warning> STONITH: Data integrity
> > may be compromised!
> > Jan 25 11:54:56 cl2 cluquorumd[7665]: <warning> STONITH: No drivers
> > configured for host 'cl2'!
> > Jan 25 11:54:56 cl2 cluquorumd[7665]: <warning> STONITH: Data integrity
> > may be compromised!
> > Jan 25 11:54:56 cl2 clumanager: cluquorumd startup succeeded
> > Jan 25 11:55:07 cl2 clumembd[7670]: <notice> Member cl2 UP
> > Jan 25 11:55:08 cl2 cluquorumd[7666]: <warning> Membership reports #0 as
> > down, but disk reports as up: State uncertain!
> > Jan 25 11:55:08 cl2 cluquorumd[7666]: <notice> Quorum Formed; Starting
> > Service Manager
> > Jan 25 11:55:08 cl2 clusvcmgrd: [7679]: <notice> service notice: Stopping
> > service mysql ...
> > Jan 25 11:55:08 cl2 clusvcmgrd: [7679]: <notice> service notice: Running
> > user script '/etc/init.d/mysql1 stop'
> > Jan 25 11:55:10 cl2 clusvcmgrd: [7679]: <notice> service notice: Stopped
> > service mysql ...
> > Jan 25 11:55:10 cl2 clusvcmgrd: [7856]: <notice> service notice: Stopping
> > service nfs ...
> > Jan 25 11:55:10 cl2 clusvcmgrd: [7856]: <notice> service notice: Stopped
> > service nfs ...
> >
> > Now we have a problem...
> > "cluquorumd[7666]: <warning> Membership reports #0 as down, but disk
> > reports as up: State uncertain!"
> >
> > Clustat from cl1 reports:
> >
> > Cluster Status - Cluster                                      11:54:16
> > Cluster Quorum Incarnation #1
> > Shared State: Shared Raw Device Driver v1.2
> >
> >   Member             Status
> >   ------------------ ----------
> >   cl1                Active     <-- You are here
> >   cl2                Inactive
> >
> >   Service        Status   Owner (Last)     Last Transition Chk Restarts
> >   -------------- -------- ---------------- --------------- --- --------
> >   mysql          started  cl1              11:52:37 Jan 25  20        0
> >   nfs            started  cl1              11:52:37 Jan 25   0        0
> >
> > Clustat from cl2 reports:
> > Cluster Status - Cluster                                      11:56:30
> > Cluster Quorum Incarnation #1
> > Shared State: Shared Raw Device Driver v1.2
> >
> >   Member             Status
> >   ------------------ ----------
> >   cl1                Inactive
> >   cl2                Active     <-- You are here
> >
> >   Service        Status   Owner (Last)     Last Transition Chk Restarts
> >   -------------- -------- ---------------- --------------- --- --------
> >   mysql          started  cl1              11:52:37 Jan 25  20        0
> >   nfs            started  cl1              11:52:37 Jan 25   0        0
> >
> > I have network connectivity working:
> >
> > [root at cl1 root]# ping -c2 -s30000 cl2
> > PING cl2 (172.30.5.112) 30000(30028) bytes of data.
> > 30008 bytes from cl2 (172.30.5.112): icmp_seq=0 ttl=64 time=1.08 ms
> > 30008 bytes from cl2 (172.30.5.112): icmp_seq=1 ttl=64 time=1.09 ms
> >
> > [root at cl2 root]# ping -c2 -s30000 cl1
> > PING cl1 (172.30.5.111) 30000(30028) bytes of data.
> > 30008 bytes from cl1 (172.30.5.111): icmp_seq=0 ttl=64 time=1.09 ms
> > 30008 bytes from cl1 (172.30.5.111): icmp_seq=1 ttl=64 time=0.998 ms
> >
> > Quorum seems ok, but network doesn't.
> >
> > [root at cl1 root]# shutil -p /cluster/header
> > /cluster/header is 144 bytes long
> > SharedStateHeader {
> >         ss_magic = 0x39119fcd
> >         ss_timestamp = 0x000000004798e63b (19:25:47 Jan 24 2008)
> >         ss_updateHost = cl1.datacenter.imoportal.pt
> > }
> >
> > [root at cl2 root]# shutil -p /cluster/header
> > /cluster/header is 144 bytes long
> > SharedStateHeader {
> >         ss_magic = 0x39119fcd
> >         ss_timestamp = 0x000000004798e63b (19:25:47 Jan 24 2008)
> >         ss_updateHost = cl1.datacenter.imoportal.pt
> > }
> >
> >
> > Any ideas? Thanks
> > Nuno Fernandes
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster