[Linux-cluster] Probably some silly mistake setting up a cluster ?

Kevin Anderson kanderso at redhat.com
Wed Feb 27 22:39:38 UTC 2008


On Wed, 2008-02-27 at 16:12 +0100, Petr Tuma wrote:
> Greetings,
> 
> I am trying to set up a cluster with (for now) two nodes, reason being
> the semantic guarantees of GFS when accessing shared files (that is, I
> am not interested in fault tolerance, performance or anything else).
> Unfortunately, I keep running into all sorts of problems, for
> example:
> 
>     - After a few hours of intensive workload, the cluster sometimes
> simply stops. All file system calls block, but things like cman_tool
> status or group_tool status insist everything is all right. Soft reboot
> is not possible due to various services waiting infinitely, after power
> cycling fsck finds inconsistencies on the file system.

It would be helpful to generate a stack trace of all processes when this
happens to see what they are waiting on.
  
> 
>     - Sometimes, when trying to execute a binary on the file system, I get
> execvp returning permission denied when it should not, but when I try
> again, everything is all right. I sometimes even observe this when
> trying to start a script on the file system, as if the interpreter of
> the script (which is on a different file system altogether) had wrong
> permissions. Again, simply trying one more time makes everything work.

This is a defect that has been fixed in the upstream version of GFS2.
Which version are you using?  In general, although much better than it
used to be, GFS2 is not yet stable.  You should use gfs-kmod if you are
looking for a stable solution.

Thanks
Kevin





More information about the Linux-cluster mailing list