[Linux-cluster] Re: SMP and GFS

Axel Thimm Axel.Thimm at ATrpms.net
Sun Oct 2 10:23:05 UTC 2005


On Thu, Jul 14, 2005 at 04:57:51PM -0400, Manuel Bujan wrote:
> Is there any  issue I should be aware of if SMP is enabled in
> my kernel ? What if I compile my kernel to be pre-emptible ? Any problem with that and GFS ?
> 
> I am running GFS in a dual Xeon server from DELL.

> After a lot of time running my GFS setup I got the following error
> in one of our cluster servers, and I had to reboot it in order to
> restablish the service:

> 
> #################################################################################
> Jul 14 14:19:35 atmail-2 kernel:  2
> Jul 14 14:19:35 atmail-2 kernel: gfs001 (18044) req reply einval ae2c0092 fr 1 r 1        2
> Jul 14 14:19:35 atmail-2 kernel: gfs001 (31381) req reply einval bf9901e7 fr 1 r 1        2
> Jul 14 14:19:35 atmail-2 kernel: gfs001 (2023) req reply einval d6c30333 fr 1 r 1        2
> Jul 14 14:19:35 atmail-2 kernel: gfs001 send einval to 1
> Jul 14 14:19:35 atmail-2 last message repeated 2 times

I found similar log sniplets on a RHEL4U1 machine with dual Xeons (HP
Proliant). The machine crashed with a kernel panic shortly after
telling the other nodes to leave the cluster (sorry the staff was
under pressure and noone wrote down the panic's output):

Sep 30 05:08:11 zs01 kernel: nval to 1 (P:kernel)
Sep 30 05:08:11 zs01 kernel: data send einval to 1 (P:kernel)
Sep 30 05:08:11 zs01 kernel: Magma send einval to 1 (P:kernel)
Sep 30 05:08:11 zs01 kernel: data send einval to 1 (P:kernel)
Sep 30 05:08:11 zs01 kernel: Magma send einval to 1 (P:kernel)
Sep 30 05:08:33 zs03 kernel: CMAN: removing node zs02 from the cluster : Missed too many heartbeats (P:kernel)
Sep 30 05:08:39 zs03 kernel: CMAN: removing node zs01 from the cluster : No response to messages (P:kernel)
Sep 30 05:08:45 zs03 kernel: CMAN: quorum lost, blocking activity (P:kernel)

Seeking for the einval messages I found only this post here. So it
doesn't seem to happen that often. OTOH it's the same hardware,
perhaps dual Xeons are not good for GFS and/or the cluster
infrastructure?

In my case kernel and GFS bits are all from Red Hat, no self built
components other than a qla2xxx driver, but the issue is on the
cluster communication side.
-- 
Axel.Thimm at ATrpms.net
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20051002/4fba432f/attachment.sig>


More information about the Linux-cluster mailing list