[Linux-cluster] Problem in virtual cluster

Fri Jan 25 12:18:07 UTC 2008

Ahh.. forgot cluster.xml and i'm using 2.6.18-8.1.14.el5xen kernel.

<?xml version="1.0"?>
<cluconfig version="3.0">
  <clumembd broadcast="yes" interval="750000" loglevel="5" multicast="no" 
multicast_ipaddress="" thread="yes" tko_count="20"/>
  <cluquorumd loglevel="5" pinginterval="" tiebreaker_ip=""/>
  <clurmtabd loglevel="5" pollinterval="4"/>
  <clusvcmgrd loglevel="5"/>
  <clulockd loglevel="5"/>
  <cluster config_viewnumber="3" key="975b29840bb8835ce57b0fff3354fabc" 
name="Cluster"/>
  <sharedstate driver="libsharedraw.so" rawprimary="/dev/raw/raw1" 
rawshadow="/dev/raw/raw2" type="raw"/>
  <members>
    <member id="0" name="cl1" watchdog="yes">
    </member>
    <member id="1" name="cl2" watchdog="yes"/>
  </members>
  <services>
    <service checkinterval="20" failoverdomain="None" id="0" 
maxfalsestarts="0" maxrestarts="0" name="mysql" 
userscript="/etc/init.d/mysql1">
      <service_ipaddresses>
        <service_ipaddress broadcast="172.30.5.255" id="0" 
ipaddress="172.30.5.113" monitor_link="0" netmask="255.255.255.0"/>
      </service_ipaddresses>
      <device id="0" name="/dev/hda5" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/var/lib/mysql1" 
options="sync,rw,nosuid"/>
      </device>
    </service>
    <service checkinterval="0" failoverdomain="None" id="1" maxfalsestarts="0" 
maxrestarts="0" name="nfs" userscript="None">
      <service_ipaddresses>
        <service_ipaddress broadcast="172.30.5.255" id="0" 
ipaddress="172.30.5.114" monitor_link="0" netmask="255.255.255.0"/>
      </service_ipaddresses>
    </service>
  </services>
  <failoverdomains/>
</cluconfig>

Thanks
Nuno Fernandes

On Friday 25 January 2008 12:00:06 Nuno Fernandes wrote:
> Hi,
>
> I'm in the process of migrating a cluster of two nodes to two virtual
> machines.
>
> The real servers have clumanager-1.0.28-1 (RHEL3/CentOS3).
> I've migrated all the filesystems and started of the process of
> reconfiguring the cluster.
>
> The real servers clustat:
>
> Cluster Status Monitor (Cluster)                              11:50:14
>
> Cluster alias: Not Configured
>
> =========================  M e m b e r   S t a t u s
> ==========================
>
>   Member         Status     Node Id    Power Switch
>   -------------- ---------- ---------- ------------
>   cl1            Up         0          Good
>   cl2            Up         1          Good
>
> =========================  H e a r t b e a t   S t a t u s
> ====================
>
>   Name                           Type       Status
>   ------------------------------ ---------- ------------
>   cl1          <--> cl2          network    ONLINE
>   cln1         <--> cln2         network    ONLINE
>
> =========================  S e r v i c e   S t a t u s
> ========================
>
>                                          Last             Monitor  Restart
>   Service        Status   Owner          Transition       Interval Count
>   -------------- -------- -------------- ---------------- -------- -------
>   mysql1         started  cl2            00:16:28 Oct 23  10       1
>   nfs            started  cl2            23:20:58 Oct 08  10       0
>
>
> Everything is about the same in the virtual cluster, except that they don't
> have any powerwitch, there is only one network. They both use network and
> quorum to check if the other node is ok.
>
> The problem is in the virtual cluster. I've upgraded to clumanager-1.2.34-3
> in the virtual cluster to check if it was an bug in the previous one. Both
> nodes can't see each other through the network. They think the other is
> Inactive. As i start cl1 clumanager i get:
>
> Jan 25 11:52:22 cl1 clumanager: [15039]: <notice> Starting Red Hat Cluster
> Manager...
> Jan 25 11:52:22 cl1 cluquorumd[15053]: <warning> STONITH: No drivers
> configured for host 'cl1'!
> Jan 25 11:52:22 cl1 cluquorumd[15053]: <warning> STONITH: Data integrity
> may be compromised!
> Jan 25 11:52:22 cl1 cluquorumd[15053]: <warning> STONITH: No drivers
> configured for host 'cl2'!
> Jan 25 11:52:22 cl1 cluquorumd[15053]: <warning> STONITH: Data integrity
> may be compromised!
> Jan 25 11:52:22 cl1 clumanager: cluquorumd startup succeeded
> Jan 25 11:52:33 cl1 clumembd[15056]: <notice> Member cl1 UP
> Jan 25 11:52:34 cl1 cluquorumd[15054]: <notice> Quorum Formed; Starting
> Service Manager
> Jan 25 11:52:34 cl1 clusvcmgrd: [15067]: <notice> service notice: Stopping
> service mysql ...
> Jan 25 11:52:35 cl1 clusvcmgrd: [15067]: <notice> service notice: Running
> user script '/etc/init.d/mysql1 stop'
> Jan 25 11:52:37 cl1 clusvcmgrd: [15067]: <notice> service notice: Stopped
> service mysql ...
> Jan 25 11:52:37 cl1 clusvcmgrd: [15244]: <notice> service notice: Stopping
> service nfs ...
> Jan 25 11:52:37 cl1 clusvcmgrd: [15244]: <notice> service notice: Stopped
> service nfs ...
> Jan 25 11:52:37 cl1 clusvcmgrd[15381]: <notice> Starting stopped service
> mysql Jan 25 11:52:37 cl1 clusvcmgrd[15395]: <notice> Starting stopped
> service nfs Jan 25 11:52:37 cl1 clusvcmgrd: [15382]: <notice> service
> notice: Starting service mysql ...
> Jan 25 11:52:37 cl1 clusvcmgrd: [15420]: <notice> service notice: Starting
> service nfs ...
> Jan 25 11:52:37 cl1 kernel: kjournald starting.  Commit interval 5 seconds
> Jan 25 11:52:37 cl1 kernel: EXT3 FS on hda5, internal journal
> Jan 25 11:52:37 cl1 kernel: EXT3-fs: mounted filesystem with ordered data
> mode.
> Jan 25 11:52:37 cl1 /sbin/hotplug: no runnable /etc/hotplug/block.agent is
> installed
> Jan 25 11:52:38 cl1 clusvcmgrd: [15382]: <notice> service notice: Running
> user script '/etc/init.d/mysql1 start'
> Jan 25 11:52:38 cl1 clusvcmgrd: [15382]: <notice> service notice: Started
> service mysql ...
> Jan 25 11:52:38 cl1 clusvcmgrd: [15420]: <notice> service notice: Started
> service nfs ...
>
> Everything seems ok... Then i start cl2's clumanager:
>
> cl2 -bash: (1836) [root.root] |.| /etc/init.d/clumanager start
> Jan 25 11:54:56 cl2 clumanager: [7651]: <notice> Starting Red Hat Cluster
> Manager...
> Jan 25 11:54:56 cl2 cluquorumd[7665]: <warning> STONITH: No drivers
> configured for host 'cl1'!
> Jan 25 11:54:56 cl2 cluquorumd[7665]: <warning> STONITH: Data integrity may
> be compromised!
> Jan 25 11:54:56 cl2 cluquorumd[7665]: <warning> STONITH: No drivers
> configured for host 'cl2'!
> Jan 25 11:54:56 cl2 cluquorumd[7665]: <warning> STONITH: Data integrity may
> be compromised!
> Jan 25 11:54:56 cl2 clumanager: cluquorumd startup succeeded
> Jan 25 11:55:07 cl2 clumembd[7670]: <notice> Member cl2 UP
> Jan 25 11:55:08 cl2 cluquorumd[7666]: <warning> Membership reports #0 as
> down, but disk reports as up: State uncertain!
> Jan 25 11:55:08 cl2 cluquorumd[7666]: <notice> Quorum Formed; Starting
> Service Manager
> Jan 25 11:55:08 cl2 clusvcmgrd: [7679]: <notice> service notice: Stopping
> service mysql ...
> Jan 25 11:55:08 cl2 clusvcmgrd: [7679]: <notice> service notice: Running
> user script '/etc/init.d/mysql1 stop'
> Jan 25 11:55:10 cl2 clusvcmgrd: [7679]: <notice> service notice: Stopped
> service mysql ...
> Jan 25 11:55:10 cl2 clusvcmgrd: [7856]: <notice> service notice: Stopping
> service nfs ...
> Jan 25 11:55:10 cl2 clusvcmgrd: [7856]: <notice> service notice: Stopped
> service nfs ...
>
> Now we have a problem...
> "cluquorumd[7666]: <warning> Membership reports #0 as down, but disk
> reports as up: State uncertain!"
>
> Clustat from cl1 reports:
>
> Cluster Status - Cluster                                      11:54:16
> Cluster Quorum Incarnation #1
> Shared State: Shared Raw Device Driver v1.2
>
>   Member             Status
>   ------------------ ----------
>   cl1                Active     <-- You are here
>   cl2                Inactive
>
>   Service        Status   Owner (Last)     Last Transition Chk Restarts
>   -------------- -------- ---------------- --------------- --- --------
>   mysql          started  cl1              11:52:37 Jan 25  20        0
>   nfs            started  cl1              11:52:37 Jan 25   0        0
>
> Clustat from cl2 reports:
> Cluster Status - Cluster                                      11:56:30
> Cluster Quorum Incarnation #1
> Shared State: Shared Raw Device Driver v1.2
>
>   Member             Status
>   ------------------ ----------
>   cl1                Inactive
>   cl2                Active     <-- You are here
>
>   Service        Status   Owner (Last)     Last Transition Chk Restarts
>   -------------- -------- ---------------- --------------- --- --------
>   mysql          started  cl1              11:52:37 Jan 25  20        0
>   nfs            started  cl1              11:52:37 Jan 25   0        0
>
> I have network connectivity working:
>
> [root at cl1 root]# ping -c2 -s30000 cl2
> PING cl2 (172.30.5.112) 30000(30028) bytes of data.
> 30008 bytes from cl2 (172.30.5.112): icmp_seq=0 ttl=64 time=1.08 ms
> 30008 bytes from cl2 (172.30.5.112): icmp_seq=1 ttl=64 time=1.09 ms
>
> [root at cl2 root]# ping -c2 -s30000 cl1
> PING cl1 (172.30.5.111) 30000(30028) bytes of data.
> 30008 bytes from cl1 (172.30.5.111): icmp_seq=0 ttl=64 time=1.09 ms
> 30008 bytes from cl1 (172.30.5.111): icmp_seq=1 ttl=64 time=0.998 ms
>
> Quorum seems ok, but network doesn't.
>
> [root at cl1 root]# shutil -p /cluster/header
> /cluster/header is 144 bytes long
> SharedStateHeader {
>         ss_magic = 0x39119fcd
>         ss_timestamp = 0x000000004798e63b (19:25:47 Jan 24 2008)
>         ss_updateHost = cl1.datacenter.imoportal.pt
> }
>
> [root at cl2 root]# shutil -p /cluster/header
> /cluster/header is 144 bytes long
> SharedStateHeader {
>         ss_magic = 0x39119fcd
>         ss_timestamp = 0x000000004798e63b (19:25:47 Jan 24 2008)
>         ss_updateHost = cl1.datacenter.imoportal.pt
> }
>
>
> Any ideas? Thanks
> Nuno Fernandes
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster