[Linux-cluster] Problem in virtual cluster
Nuno Fernandes
npf-mlists at eurotux.com
Fri Jan 25 12:18:07 UTC 2008
Ahh.. forgot cluster.xml and i'm using 2.6.18-8.1.14.el5xen kernel.
<?xml version="1.0"?>
<cluconfig version="3.0">
<clumembd broadcast="yes" interval="750000" loglevel="5" multicast="no"
multicast_ipaddress="" thread="yes" tko_count="20"/>
<cluquorumd loglevel="5" pinginterval="" tiebreaker_ip=""/>
<clurmtabd loglevel="5" pollinterval="4"/>
<clusvcmgrd loglevel="5"/>
<clulockd loglevel="5"/>
<cluster config_viewnumber="3" key="975b29840bb8835ce57b0fff3354fabc"
name="Cluster"/>
<sharedstate driver="libsharedraw.so" rawprimary="/dev/raw/raw1"
rawshadow="/dev/raw/raw2" type="raw"/>
<members>
<member id="0" name="cl1" watchdog="yes">
</member>
<member id="1" name="cl2" watchdog="yes"/>
</members>
<services>
<service checkinterval="20" failoverdomain="None" id="0"
maxfalsestarts="0" maxrestarts="0" name="mysql"
userscript="/etc/init.d/mysql1">
<service_ipaddresses>
<service_ipaddress broadcast="172.30.5.255" id="0"
ipaddress="172.30.5.113" monitor_link="0" netmask="255.255.255.0"/>
</service_ipaddresses>
<device id="0" name="/dev/hda5" sharename="">
<mount forceunmount="yes" fstype="ext3" mountpoint="/var/lib/mysql1"
options="sync,rw,nosuid"/>
</device>
</service>
<service checkinterval="0" failoverdomain="None" id="1" maxfalsestarts="0"
maxrestarts="0" name="nfs" userscript="None">
<service_ipaddresses>
<service_ipaddress broadcast="172.30.5.255" id="0"
ipaddress="172.30.5.114" monitor_link="0" netmask="255.255.255.0"/>
</service_ipaddresses>
</service>
</services>
<failoverdomains/>
</cluconfig>
Thanks
Nuno Fernandes
On Friday 25 January 2008 12:00:06 Nuno Fernandes wrote:
> Hi,
>
> I'm in the process of migrating a cluster of two nodes to two virtual
> machines.
>
> The real servers have clumanager-1.0.28-1 (RHEL3/CentOS3).
> I've migrated all the filesystems and started of the process of
> reconfiguring the cluster.
>
> The real servers clustat:
>
> Cluster Status Monitor (Cluster) 11:50:14
>
> Cluster alias: Not Configured
>
> ========================= M e m b e r S t a t u s
> ==========================
>
> Member Status Node Id Power Switch
> -------------- ---------- ---------- ------------
> cl1 Up 0 Good
> cl2 Up 1 Good
>
> ========================= H e a r t b e a t S t a t u s
> ====================
>
> Name Type Status
> ------------------------------ ---------- ------------
> cl1 <--> cl2 network ONLINE
> cln1 <--> cln2 network ONLINE
>
> ========================= S e r v i c e S t a t u s
> ========================
>
> Last Monitor Restart
> Service Status Owner Transition Interval Count
> -------------- -------- -------------- ---------------- -------- -------
> mysql1 started cl2 00:16:28 Oct 23 10 1
> nfs started cl2 23:20:58 Oct 08 10 0
>
>
> Everything is about the same in the virtual cluster, except that they don't
> have any powerwitch, there is only one network. They both use network and
> quorum to check if the other node is ok.
>
> The problem is in the virtual cluster. I've upgraded to clumanager-1.2.34-3
> in the virtual cluster to check if it was an bug in the previous one. Both
> nodes can't see each other through the network. They think the other is
> Inactive. As i start cl1 clumanager i get:
>
> Jan 25 11:52:22 cl1 clumanager: [15039]: <notice> Starting Red Hat Cluster
> Manager...
> Jan 25 11:52:22 cl1 cluquorumd[15053]: <warning> STONITH: No drivers
> configured for host 'cl1'!
> Jan 25 11:52:22 cl1 cluquorumd[15053]: <warning> STONITH: Data integrity
> may be compromised!
> Jan 25 11:52:22 cl1 cluquorumd[15053]: <warning> STONITH: No drivers
> configured for host 'cl2'!
> Jan 25 11:52:22 cl1 cluquorumd[15053]: <warning> STONITH: Data integrity
> may be compromised!
> Jan 25 11:52:22 cl1 clumanager: cluquorumd startup succeeded
> Jan 25 11:52:33 cl1 clumembd[15056]: <notice> Member cl1 UP
> Jan 25 11:52:34 cl1 cluquorumd[15054]: <notice> Quorum Formed; Starting
> Service Manager
> Jan 25 11:52:34 cl1 clusvcmgrd: [15067]: <notice> service notice: Stopping
> service mysql ...
> Jan 25 11:52:35 cl1 clusvcmgrd: [15067]: <notice> service notice: Running
> user script '/etc/init.d/mysql1 stop'
> Jan 25 11:52:37 cl1 clusvcmgrd: [15067]: <notice> service notice: Stopped
> service mysql ...
> Jan 25 11:52:37 cl1 clusvcmgrd: [15244]: <notice> service notice: Stopping
> service nfs ...
> Jan 25 11:52:37 cl1 clusvcmgrd: [15244]: <notice> service notice: Stopped
> service nfs ...
> Jan 25 11:52:37 cl1 clusvcmgrd[15381]: <notice> Starting stopped service
> mysql Jan 25 11:52:37 cl1 clusvcmgrd[15395]: <notice> Starting stopped
> service nfs Jan 25 11:52:37 cl1 clusvcmgrd: [15382]: <notice> service
> notice: Starting service mysql ...
> Jan 25 11:52:37 cl1 clusvcmgrd: [15420]: <notice> service notice: Starting
> service nfs ...
> Jan 25 11:52:37 cl1 kernel: kjournald starting. Commit interval 5 seconds
> Jan 25 11:52:37 cl1 kernel: EXT3 FS on hda5, internal journal
> Jan 25 11:52:37 cl1 kernel: EXT3-fs: mounted filesystem with ordered data
> mode.
> Jan 25 11:52:37 cl1 /sbin/hotplug: no runnable /etc/hotplug/block.agent is
> installed
> Jan 25 11:52:38 cl1 clusvcmgrd: [15382]: <notice> service notice: Running
> user script '/etc/init.d/mysql1 start'
> Jan 25 11:52:38 cl1 clusvcmgrd: [15382]: <notice> service notice: Started
> service mysql ...
> Jan 25 11:52:38 cl1 clusvcmgrd: [15420]: <notice> service notice: Started
> service nfs ...
>
> Everything seems ok... Then i start cl2's clumanager:
>
> cl2 -bash: (1836) [root.root] |.| /etc/init.d/clumanager start
> Jan 25 11:54:56 cl2 clumanager: [7651]: <notice> Starting Red Hat Cluster
> Manager...
> Jan 25 11:54:56 cl2 cluquorumd[7665]: <warning> STONITH: No drivers
> configured for host 'cl1'!
> Jan 25 11:54:56 cl2 cluquorumd[7665]: <warning> STONITH: Data integrity may
> be compromised!
> Jan 25 11:54:56 cl2 cluquorumd[7665]: <warning> STONITH: No drivers
> configured for host 'cl2'!
> Jan 25 11:54:56 cl2 cluquorumd[7665]: <warning> STONITH: Data integrity may
> be compromised!
> Jan 25 11:54:56 cl2 clumanager: cluquorumd startup succeeded
> Jan 25 11:55:07 cl2 clumembd[7670]: <notice> Member cl2 UP
> Jan 25 11:55:08 cl2 cluquorumd[7666]: <warning> Membership reports #0 as
> down, but disk reports as up: State uncertain!
> Jan 25 11:55:08 cl2 cluquorumd[7666]: <notice> Quorum Formed; Starting
> Service Manager
> Jan 25 11:55:08 cl2 clusvcmgrd: [7679]: <notice> service notice: Stopping
> service mysql ...
> Jan 25 11:55:08 cl2 clusvcmgrd: [7679]: <notice> service notice: Running
> user script '/etc/init.d/mysql1 stop'
> Jan 25 11:55:10 cl2 clusvcmgrd: [7679]: <notice> service notice: Stopped
> service mysql ...
> Jan 25 11:55:10 cl2 clusvcmgrd: [7856]: <notice> service notice: Stopping
> service nfs ...
> Jan 25 11:55:10 cl2 clusvcmgrd: [7856]: <notice> service notice: Stopped
> service nfs ...
>
> Now we have a problem...
> "cluquorumd[7666]: <warning> Membership reports #0 as down, but disk
> reports as up: State uncertain!"
>
> Clustat from cl1 reports:
>
> Cluster Status - Cluster 11:54:16
> Cluster Quorum Incarnation #1
> Shared State: Shared Raw Device Driver v1.2
>
> Member Status
> ------------------ ----------
> cl1 Active <-- You are here
> cl2 Inactive
>
> Service Status Owner (Last) Last Transition Chk Restarts
> -------------- -------- ---------------- --------------- --- --------
> mysql started cl1 11:52:37 Jan 25 20 0
> nfs started cl1 11:52:37 Jan 25 0 0
>
> Clustat from cl2 reports:
> Cluster Status - Cluster 11:56:30
> Cluster Quorum Incarnation #1
> Shared State: Shared Raw Device Driver v1.2
>
> Member Status
> ------------------ ----------
> cl1 Inactive
> cl2 Active <-- You are here
>
> Service Status Owner (Last) Last Transition Chk Restarts
> -------------- -------- ---------------- --------------- --- --------
> mysql started cl1 11:52:37 Jan 25 20 0
> nfs started cl1 11:52:37 Jan 25 0 0
>
> I have network connectivity working:
>
> [root at cl1 root]# ping -c2 -s30000 cl2
> PING cl2 (172.30.5.112) 30000(30028) bytes of data.
> 30008 bytes from cl2 (172.30.5.112): icmp_seq=0 ttl=64 time=1.08 ms
> 30008 bytes from cl2 (172.30.5.112): icmp_seq=1 ttl=64 time=1.09 ms
>
> [root at cl2 root]# ping -c2 -s30000 cl1
> PING cl1 (172.30.5.111) 30000(30028) bytes of data.
> 30008 bytes from cl1 (172.30.5.111): icmp_seq=0 ttl=64 time=1.09 ms
> 30008 bytes from cl1 (172.30.5.111): icmp_seq=1 ttl=64 time=0.998 ms
>
> Quorum seems ok, but network doesn't.
>
> [root at cl1 root]# shutil -p /cluster/header
> /cluster/header is 144 bytes long
> SharedStateHeader {
> ss_magic = 0x39119fcd
> ss_timestamp = 0x000000004798e63b (19:25:47 Jan 24 2008)
> ss_updateHost = cl1.datacenter.imoportal.pt
> }
>
> [root at cl2 root]# shutil -p /cluster/header
> /cluster/header is 144 bytes long
> SharedStateHeader {
> ss_magic = 0x39119fcd
> ss_timestamp = 0x000000004798e63b (19:25:47 Jan 24 2008)
> ss_updateHost = cl1.datacenter.imoportal.pt
> }
>
>
> Any ideas? Thanks
> Nuno Fernandes
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
More information about the Linux-cluster
mailing list