[Linux-cluster] Freeze with cluster-2.03.11
Kadlecsik Jozsef
kadlec at mail.kfki.hu
Thu Mar 26 22:47:00 UTC 2009
Hi,
Freshly built cluster-2.03.11 reproducibly freezes as mailman started.
The versions are:
linux-2.6.27.21
cluster-2.03.11
openais from svn, subrev 1152 version 0.80
LVM2.2.02.44
This is a five node cluster wich was just upgraded from cluster-2.01.00,
node by node. All nodes went fine except when the last one, which runs the
mailman queue manager was upgraded: after the upgrade as the manager is
started, the system freezes completely. No error message in the screen or
in the kernel log. The system responds to ping, that's all, but nothing
can be done at the console except rebooting. Usually when this node is
fenced off, shortly after the fencing node freezes as well. What I could
find in the kernel log of this second machine is as follows:
Mar 26 23:09:24 lxserv1 kernel: dlm: closing connection to node 1
Mar 26 23:09:25 lxserv1 kernel: GFS: fsid=kfki:home.1: jid=3: Trying to
acquire journal lock...
Mar 26 23:09:25 lxserv1 kernel: GFS: fsid=kfki:services.1: jid=3: Trying
to acquire journal lock...
Mar 26 23:09:25 lxserv1 kernel: GFS: fsid=kfki:home.1: jid=3: Looking at
journal...
Mar 26 23:09:25 lxserv1 kernel: GFS: fsid=kfki:services.1: jid=3: Looking
at journal...
Mar 26 23:09:25 lxserv1 kernel: GFS: fsid=kfki:services.1: jid=3:
Acquiring the transaction lock...
Mar 26 23:09:25 lxserv1 kernel: GFS: fsid=kfki:home.1: jid=3: Acquiring
the transaction lock...
Mar 26 23:09:26 lxserv1 kernel: GFS: fsid=kfki:services.1: jid=3:
Replaying journal...
Mar 26 23:09:26 lxserv1 kernel: GFS: fsid=kfki:home.1: jid=3: Replaying
journal...
Mar 26 23:09:26 lxserv1 kernel: GFS: fsid=kfki:home.1: jid=3: Replayed 65
of 85 blocks
Mar 26 23:09:26 lxserv1 kernel: GFS: fsid=kfki:home.1: jid=3: replays =
65, skips = 12, sames = 8
Mar 26 23:09:26 lxserv1 kernel: GFS: fsid=kfki:services.1: jid=3: Replayed
888 of 994 blocks
Mar 26 23:09:26 lxserv1 kernel: GFS: fsid=kfki:services.1: jid=3: replays
= 888, skips = 66, sames = 40
Mar 26 23:09:26 lxserv1 kernel: GFS: fsid=kfki:home.1: jid=3: Journal
replayed in 1s
Mar 26 23:09:26 lxserv1 kernel: GFS: fsid=kfki:services.1: jid=3: Done
Does it indicate anything, which could help to fix the cluster?
Best regards,
Jozsef
--
E-mail : kadlec at mail.kfki.hu, kadlec at blackhole.kfki.hu
PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address: KFKI Research Institute for Particle and Nuclear Physics
H-1525 Budapest 114, POB. 49, Hungary
More information about the Linux-cluster
mailing list