[Linux-cluster] Cluster fails after fencing by DRAC
MARY, Mathieu
Mathieu.MARY at neufcegetel.fr
Fri Jan 11 11:00:09 UTC 2008
hello,
sorry to ask but is the "none" state a normal state for services?
I have issues with cluster services too and I've been told that this state
is not normal and indicates that the nodes didn't join the fence domain that causing issues with rgmanager too.
what does show clustat and cman_tool services at startup ?
regards,
Mathieu
-----Message d'origine-----
De : linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] De la part de Jorge Gonzalez
Envoyé : jeudi 10 janvier 2008 17:18
À : linux-cluster at redhat.com
Objet : [Linux-cluster] Cluster fails after fencing by DRAC
Hi all!
I have a problem with 3 nodes cluster. When I run "fence_node node1" the
node1 reeboot by drac succesfully. When node1 restarts then gets frozen:
------------------
starting clvmd: dlm: got connection fron 32
dlm: connecting to 33
dlm: got connection fron 33
[frozen]
* cman_tool services shows:
type level name id state
fence 0 default 0001001f none
[31 32 33]
dlm 1 clvmd 00010020 none
[31 32 33]
dlm 1 rgmanager 00020020 none
[32 33]
It seems rgmanager has not 31 (?)
* clustat shows:
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
xenr3u1.domain.com 31 Online
xenr3u2.domain.com 32 Online, Local
xenr3u3.domain.com 33 Online
-------------------
Then I rebooted again the node1:
Starting cluster
Loading modules DLM .......
done
starting ccsd
starting cman
starting daemons
starting fencing
[frozen again]
after long time starting fencing [done] but cman_tool services fails
* cman_tool services shows:
type level name id state
fence 0 default 0001001f FAIL_ALL_STOPPED
[31 32 33]
dlm 1 clvmd 00010020 FAIL_STOP_WAIT
[31 32 33]
dlm 1 rgmanager 00020020 FAIL_STOP_WAIT
* clustat shows:
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
xenr3u1.domain.com 31 Online
xenr3u2.domain.com 32 Online, Local
xenr3u3.domain.com 33 Online
/etc/init.d/rgmanager restart
Shutting down Cluster Service Manager...
Waiting for services to stop:
[long timeeeeeeee]
----------------------------------
I saw this page translated to english
(http://translate.google.com/translate?u=http%3A%2F%2Fken-etsu-tech.blogspot.com%2F2007%2F11%2Fred-hat-cluster-kernel-xen.html&langpair=ja%7Cen&hl=es&ie=UTF-8).
It's exactly the same. A kernel bug? clvmd bug?
Linux xenr3u2 2.6.18-8.1.15.el5xen #1 SMP Mon Oct 22 09:01:12 EDT 2007
x86_64 x86_64 x86_64 GNU/Linux
cman-2.0.64-1.0.1.el5
rgmanager-2.0.24-1.el5.centos
lvm2-cluster-2.02.16-3.el5
Sometimes the node starts ok and cman_tool is also ok.
* /etc/lvm.conf:
devices {
dir = "/dev"
scan = [ "/dev" ]
filter = [ "a/.*/" ]
cache = "/etc/lvm/.cache"
write_cache_state = 1
sysfs_scan = 1
md_component_detection = 1
}
log {
verbose = 0
syslog = 1
overwrite = 0
level = 0
indent = 1
command_names = 0
prefix = " "
}
backup {
backup = 1
backup_dir = "/etc/lvm/backup"
archive = 1
archive_dir = "/etc/lvm/archive"
retain_min = 10
retain_days = 30
}
shell {
history_size = 100
}
global {
library_dir = "/usr/lib64"
umask = 077
test = 0
activation = 1
proc = "/proc"
locking_type = 3
fallback_to_clustered_locking = 1
fallback_to_local_locking = 1
locking_dir = "/var/lock/lvm"
}
activation {
missing_stripe_filler = "/dev/ioerror"
reserved_stack = 256
reserved_memory = 8192
process_priority = -18
mirror_region_size = 512
mirror_log_fault_policy = "allocate"
mirror_device_fault_policy = "remove"
}
That's all ;-)
Thanks in advance
More information about the Linux-cluster
mailing list