[Linux-cluster] Problem in virtual cluster

Nuno Fernandes npf-mlists at eurotux.com
Fri Jan 25 12:00:06 UTC 2008


Hi,

I'm in the process of migrating a cluster of two nodes to two virtual 
machines.

The real servers have clumanager-1.0.28-1 (RHEL3/CentOS3).
I've migrated all the filesystems and started of the process of reconfiguring 
the cluster.

The real servers clustat:

Cluster Status Monitor (Cluster)                              11:50:14

Cluster alias: Not Configured

=========================  M e m b e r   S t a t u s  
==========================

  Member         Status     Node Id    Power Switch
  -------------- ---------- ---------- ------------
  cl1            Up         0          Good
  cl2            Up         1          Good

=========================  H e a r t b e a t   S t a t u s  
====================

  Name                           Type       Status
  ------------------------------ ---------- ------------
  cl1          <--> cl2          network    ONLINE
  cln1         <--> cln2         network    ONLINE

=========================  S e r v i c e   S t a t u s  
========================

                                         Last             Monitor  Restart
  Service        Status   Owner          Transition       Interval Count
  -------------- -------- -------------- ---------------- -------- -------
  mysql1         started  cl2            00:16:28 Oct 23  10       1
  nfs            started  cl2            23:20:58 Oct 08  10       0


Everything is about the same in the virtual cluster, except that they don't 
have any powerwitch, there is only one network. They both use network and 
quorum to check if the other node is ok.

The problem is in the virtual cluster. I've upgraded to clumanager-1.2.34-3 in 
the virtual cluster to check if it was an bug in the previous one. Both nodes 
can't see each other through the network. They think the other is Inactive.
As i start cl1 clumanager i get:

Jan 25 11:52:22 cl1 clumanager: [15039]: <notice> Starting Red Hat Cluster 
Manager...
Jan 25 11:52:22 cl1 cluquorumd[15053]: <warning> STONITH: No drivers 
configured for host 'cl1'!
Jan 25 11:52:22 cl1 cluquorumd[15053]: <warning> STONITH: Data integrity may 
be compromised!
Jan 25 11:52:22 cl1 cluquorumd[15053]: <warning> STONITH: No drivers 
configured for host 'cl2'!
Jan 25 11:52:22 cl1 cluquorumd[15053]: <warning> STONITH: Data integrity may 
be compromised!
Jan 25 11:52:22 cl1 clumanager: cluquorumd startup succeeded
Jan 25 11:52:33 cl1 clumembd[15056]: <notice> Member cl1 UP
Jan 25 11:52:34 cl1 cluquorumd[15054]: <notice> Quorum Formed; Starting 
Service Manager
Jan 25 11:52:34 cl1 clusvcmgrd: [15067]: <notice> service notice: Stopping 
service mysql ...
Jan 25 11:52:35 cl1 clusvcmgrd: [15067]: <notice> service notice: Running user 
script '/etc/init.d/mysql1 stop'
Jan 25 11:52:37 cl1 clusvcmgrd: [15067]: <notice> service notice: Stopped 
service mysql ...
Jan 25 11:52:37 cl1 clusvcmgrd: [15244]: <notice> service notice: Stopping 
service nfs ...
Jan 25 11:52:37 cl1 clusvcmgrd: [15244]: <notice> service notice: Stopped 
service nfs ...
Jan 25 11:52:37 cl1 clusvcmgrd[15381]: <notice> Starting stopped service mysql
Jan 25 11:52:37 cl1 clusvcmgrd[15395]: <notice> Starting stopped service nfs
Jan 25 11:52:37 cl1 clusvcmgrd: [15382]: <notice> service notice: Starting 
service mysql ...
Jan 25 11:52:37 cl1 clusvcmgrd: [15420]: <notice> service notice: Starting 
service nfs ...
Jan 25 11:52:37 cl1 kernel: kjournald starting.  Commit interval 5 seconds
Jan 25 11:52:37 cl1 kernel: EXT3 FS on hda5, internal journal
Jan 25 11:52:37 cl1 kernel: EXT3-fs: mounted filesystem with ordered data 
mode.
Jan 25 11:52:37 cl1 /sbin/hotplug: no runnable /etc/hotplug/block.agent is 
installed
Jan 25 11:52:38 cl1 clusvcmgrd: [15382]: <notice> service notice: Running user 
script '/etc/init.d/mysql1 start'
Jan 25 11:52:38 cl1 clusvcmgrd: [15382]: <notice> service notice: Started 
service mysql ...
Jan 25 11:52:38 cl1 clusvcmgrd: [15420]: <notice> service notice: Started 
service nfs ...

Everything seems ok... Then i start cl2's clumanager:

cl2 -bash: (1836) [root.root] |.| /etc/init.d/clumanager start
Jan 25 11:54:56 cl2 clumanager: [7651]: <notice> Starting Red Hat Cluster 
Manager...
Jan 25 11:54:56 cl2 cluquorumd[7665]: <warning> STONITH: No drivers configured 
for host 'cl1'!
Jan 25 11:54:56 cl2 cluquorumd[7665]: <warning> STONITH: Data integrity may be 
compromised!
Jan 25 11:54:56 cl2 cluquorumd[7665]: <warning> STONITH: No drivers configured 
for host 'cl2'!
Jan 25 11:54:56 cl2 cluquorumd[7665]: <warning> STONITH: Data integrity may be 
compromised!
Jan 25 11:54:56 cl2 clumanager: cluquorumd startup succeeded
Jan 25 11:55:07 cl2 clumembd[7670]: <notice> Member cl2 UP
Jan 25 11:55:08 cl2 cluquorumd[7666]: <warning> Membership reports #0 as down, 
but disk reports as up: State uncertain!
Jan 25 11:55:08 cl2 cluquorumd[7666]: <notice> Quorum Formed; Starting Service 
Manager
Jan 25 11:55:08 cl2 clusvcmgrd: [7679]: <notice> service notice: Stopping 
service mysql ...
Jan 25 11:55:08 cl2 clusvcmgrd: [7679]: <notice> service notice: Running user 
script '/etc/init.d/mysql1 stop'
Jan 25 11:55:10 cl2 clusvcmgrd: [7679]: <notice> service notice: Stopped 
service mysql ...
Jan 25 11:55:10 cl2 clusvcmgrd: [7856]: <notice> service notice: Stopping 
service nfs ...
Jan 25 11:55:10 cl2 clusvcmgrd: [7856]: <notice> service notice: Stopped 
service nfs ...

Now we have a problem...
"cluquorumd[7666]: <warning> Membership reports #0 as down, but disk reports 
as up: State uncertain!"

Clustat from cl1 reports:

Cluster Status - Cluster                                      11:54:16
Cluster Quorum Incarnation #1
Shared State: Shared Raw Device Driver v1.2

  Member             Status
  ------------------ ----------
  cl1                Active     <-- You are here
  cl2                Inactive

  Service        Status   Owner (Last)     Last Transition Chk Restarts
  -------------- -------- ---------------- --------------- --- --------
  mysql          started  cl1              11:52:37 Jan 25  20        0
  nfs            started  cl1              11:52:37 Jan 25   0        0

Clustat from cl2 reports:
Cluster Status - Cluster                                      11:56:30
Cluster Quorum Incarnation #1
Shared State: Shared Raw Device Driver v1.2

  Member             Status
  ------------------ ----------
  cl1                Inactive
  cl2                Active     <-- You are here

  Service        Status   Owner (Last)     Last Transition Chk Restarts
  -------------- -------- ---------------- --------------- --- --------
  mysql          started  cl1              11:52:37 Jan 25  20        0
  nfs            started  cl1              11:52:37 Jan 25   0        0

I have network connectivity working:

[root at cl1 root]# ping -c2 -s30000 cl2
PING cl2 (172.30.5.112) 30000(30028) bytes of data.
30008 bytes from cl2 (172.30.5.112): icmp_seq=0 ttl=64 time=1.08 ms
30008 bytes from cl2 (172.30.5.112): icmp_seq=1 ttl=64 time=1.09 ms

[root at cl2 root]# ping -c2 -s30000 cl1
PING cl1 (172.30.5.111) 30000(30028) bytes of data.
30008 bytes from cl1 (172.30.5.111): icmp_seq=0 ttl=64 time=1.09 ms
30008 bytes from cl1 (172.30.5.111): icmp_seq=1 ttl=64 time=0.998 ms

Quorum seems ok, but network doesn't.

[root at cl1 root]# shutil -p /cluster/header
/cluster/header is 144 bytes long
SharedStateHeader {
        ss_magic = 0x39119fcd
        ss_timestamp = 0x000000004798e63b (19:25:47 Jan 24 2008)
        ss_updateHost = cl1.datacenter.imoportal.pt
}

[root at cl2 root]# shutil -p /cluster/header
/cluster/header is 144 bytes long
SharedStateHeader {
        ss_magic = 0x39119fcd
        ss_timestamp = 0x000000004798e63b (19:25:47 Jan 24 2008)
        ss_updateHost = cl1.datacenter.imoportal.pt
}


Any ideas? Thanks
Nuno Fernandes




More information about the Linux-cluster mailing list