[Linux-cluster] Problem in virtual cluster
Nuno Fernandes
npf-mlists at eurotux.com
Fri Jan 25 12:00:06 UTC 2008
Hi,
I'm in the process of migrating a cluster of two nodes to two virtual
machines.
The real servers have clumanager-1.0.28-1 (RHEL3/CentOS3).
I've migrated all the filesystems and started of the process of reconfiguring
the cluster.
The real servers clustat:
Cluster Status Monitor (Cluster) 11:50:14
Cluster alias: Not Configured
========================= M e m b e r S t a t u s
==========================
Member Status Node Id Power Switch
-------------- ---------- ---------- ------------
cl1 Up 0 Good
cl2 Up 1 Good
========================= H e a r t b e a t S t a t u s
====================
Name Type Status
------------------------------ ---------- ------------
cl1 <--> cl2 network ONLINE
cln1 <--> cln2 network ONLINE
========================= S e r v i c e S t a t u s
========================
Last Monitor Restart
Service Status Owner Transition Interval Count
-------------- -------- -------------- ---------------- -------- -------
mysql1 started cl2 00:16:28 Oct 23 10 1
nfs started cl2 23:20:58 Oct 08 10 0
Everything is about the same in the virtual cluster, except that they don't
have any powerwitch, there is only one network. They both use network and
quorum to check if the other node is ok.
The problem is in the virtual cluster. I've upgraded to clumanager-1.2.34-3 in
the virtual cluster to check if it was an bug in the previous one. Both nodes
can't see each other through the network. They think the other is Inactive.
As i start cl1 clumanager i get:
Jan 25 11:52:22 cl1 clumanager: [15039]: <notice> Starting Red Hat Cluster
Manager...
Jan 25 11:52:22 cl1 cluquorumd[15053]: <warning> STONITH: No drivers
configured for host 'cl1'!
Jan 25 11:52:22 cl1 cluquorumd[15053]: <warning> STONITH: Data integrity may
be compromised!
Jan 25 11:52:22 cl1 cluquorumd[15053]: <warning> STONITH: No drivers
configured for host 'cl2'!
Jan 25 11:52:22 cl1 cluquorumd[15053]: <warning> STONITH: Data integrity may
be compromised!
Jan 25 11:52:22 cl1 clumanager: cluquorumd startup succeeded
Jan 25 11:52:33 cl1 clumembd[15056]: <notice> Member cl1 UP
Jan 25 11:52:34 cl1 cluquorumd[15054]: <notice> Quorum Formed; Starting
Service Manager
Jan 25 11:52:34 cl1 clusvcmgrd: [15067]: <notice> service notice: Stopping
service mysql ...
Jan 25 11:52:35 cl1 clusvcmgrd: [15067]: <notice> service notice: Running user
script '/etc/init.d/mysql1 stop'
Jan 25 11:52:37 cl1 clusvcmgrd: [15067]: <notice> service notice: Stopped
service mysql ...
Jan 25 11:52:37 cl1 clusvcmgrd: [15244]: <notice> service notice: Stopping
service nfs ...
Jan 25 11:52:37 cl1 clusvcmgrd: [15244]: <notice> service notice: Stopped
service nfs ...
Jan 25 11:52:37 cl1 clusvcmgrd[15381]: <notice> Starting stopped service mysql
Jan 25 11:52:37 cl1 clusvcmgrd[15395]: <notice> Starting stopped service nfs
Jan 25 11:52:37 cl1 clusvcmgrd: [15382]: <notice> service notice: Starting
service mysql ...
Jan 25 11:52:37 cl1 clusvcmgrd: [15420]: <notice> service notice: Starting
service nfs ...
Jan 25 11:52:37 cl1 kernel: kjournald starting. Commit interval 5 seconds
Jan 25 11:52:37 cl1 kernel: EXT3 FS on hda5, internal journal
Jan 25 11:52:37 cl1 kernel: EXT3-fs: mounted filesystem with ordered data
mode.
Jan 25 11:52:37 cl1 /sbin/hotplug: no runnable /etc/hotplug/block.agent is
installed
Jan 25 11:52:38 cl1 clusvcmgrd: [15382]: <notice> service notice: Running user
script '/etc/init.d/mysql1 start'
Jan 25 11:52:38 cl1 clusvcmgrd: [15382]: <notice> service notice: Started
service mysql ...
Jan 25 11:52:38 cl1 clusvcmgrd: [15420]: <notice> service notice: Started
service nfs ...
Everything seems ok... Then i start cl2's clumanager:
cl2 -bash: (1836) [root.root] |.| /etc/init.d/clumanager start
Jan 25 11:54:56 cl2 clumanager: [7651]: <notice> Starting Red Hat Cluster
Manager...
Jan 25 11:54:56 cl2 cluquorumd[7665]: <warning> STONITH: No drivers configured
for host 'cl1'!
Jan 25 11:54:56 cl2 cluquorumd[7665]: <warning> STONITH: Data integrity may be
compromised!
Jan 25 11:54:56 cl2 cluquorumd[7665]: <warning> STONITH: No drivers configured
for host 'cl2'!
Jan 25 11:54:56 cl2 cluquorumd[7665]: <warning> STONITH: Data integrity may be
compromised!
Jan 25 11:54:56 cl2 clumanager: cluquorumd startup succeeded
Jan 25 11:55:07 cl2 clumembd[7670]: <notice> Member cl2 UP
Jan 25 11:55:08 cl2 cluquorumd[7666]: <warning> Membership reports #0 as down,
but disk reports as up: State uncertain!
Jan 25 11:55:08 cl2 cluquorumd[7666]: <notice> Quorum Formed; Starting Service
Manager
Jan 25 11:55:08 cl2 clusvcmgrd: [7679]: <notice> service notice: Stopping
service mysql ...
Jan 25 11:55:08 cl2 clusvcmgrd: [7679]: <notice> service notice: Running user
script '/etc/init.d/mysql1 stop'
Jan 25 11:55:10 cl2 clusvcmgrd: [7679]: <notice> service notice: Stopped
service mysql ...
Jan 25 11:55:10 cl2 clusvcmgrd: [7856]: <notice> service notice: Stopping
service nfs ...
Jan 25 11:55:10 cl2 clusvcmgrd: [7856]: <notice> service notice: Stopped
service nfs ...
Now we have a problem...
"cluquorumd[7666]: <warning> Membership reports #0 as down, but disk reports
as up: State uncertain!"
Clustat from cl1 reports:
Cluster Status - Cluster 11:54:16
Cluster Quorum Incarnation #1
Shared State: Shared Raw Device Driver v1.2
Member Status
------------------ ----------
cl1 Active <-- You are here
cl2 Inactive
Service Status Owner (Last) Last Transition Chk Restarts
-------------- -------- ---------------- --------------- --- --------
mysql started cl1 11:52:37 Jan 25 20 0
nfs started cl1 11:52:37 Jan 25 0 0
Clustat from cl2 reports:
Cluster Status - Cluster 11:56:30
Cluster Quorum Incarnation #1
Shared State: Shared Raw Device Driver v1.2
Member Status
------------------ ----------
cl1 Inactive
cl2 Active <-- You are here
Service Status Owner (Last) Last Transition Chk Restarts
-------------- -------- ---------------- --------------- --- --------
mysql started cl1 11:52:37 Jan 25 20 0
nfs started cl1 11:52:37 Jan 25 0 0
I have network connectivity working:
[root at cl1 root]# ping -c2 -s30000 cl2
PING cl2 (172.30.5.112) 30000(30028) bytes of data.
30008 bytes from cl2 (172.30.5.112): icmp_seq=0 ttl=64 time=1.08 ms
30008 bytes from cl2 (172.30.5.112): icmp_seq=1 ttl=64 time=1.09 ms
[root at cl2 root]# ping -c2 -s30000 cl1
PING cl1 (172.30.5.111) 30000(30028) bytes of data.
30008 bytes from cl1 (172.30.5.111): icmp_seq=0 ttl=64 time=1.09 ms
30008 bytes from cl1 (172.30.5.111): icmp_seq=1 ttl=64 time=0.998 ms
Quorum seems ok, but network doesn't.
[root at cl1 root]# shutil -p /cluster/header
/cluster/header is 144 bytes long
SharedStateHeader {
ss_magic = 0x39119fcd
ss_timestamp = 0x000000004798e63b (19:25:47 Jan 24 2008)
ss_updateHost = cl1.datacenter.imoportal.pt
}
[root at cl2 root]# shutil -p /cluster/header
/cluster/header is 144 bytes long
SharedStateHeader {
ss_magic = 0x39119fcd
ss_timestamp = 0x000000004798e63b (19:25:47 Jan 24 2008)
ss_updateHost = cl1.datacenter.imoportal.pt
}
Any ideas? Thanks
Nuno Fernandes
More information about the Linux-cluster
mailing list