[Linux-cluster] GFS problems!!!
Steven Dake
sdake at redhat.com
Wed Oct 10 02:07:47 UTC 2007
please include /var/log/messages from one system as well as group_tool
dump on one of the crashed nodes. What brand/model of switch are you
using?
Regards
-steve
Could you use On Tue, 2007-10-09 at 16:52 -0700, James Fillman wrote:
> Ok. I'm trying to implement GFS on two different clusters: 9 nodes, 17
> nodes.
>
> I'm having nothing but troubles. The gfs volumes are freezing and
> throwing the cluster into a bad state. Currently, this is the state of
> my cluster:
>
> [root at plxp01md-new log]# cman_tool services
> type level name id state
> fence 0 default 00010004 none
> [1 2 3 4 5 6 7 8 9]
> dlm 1 clvmd 00010003 none
> [1 2 3 4 5 6 7 8 9]
> dlm 1 mdi_log 00020001 FAIL_START_WAIT
> [1 2 3 4 6 7 8 9]
> dlm 1 deploy 00040001 FAIL_START_WAIT
> [1 4 6 7 8 9]
> gfs 2 mdi_log 00010001 FAIL_START_WAIT
> [1 2 3 4 6 7 8 9]
> gfs 2 deploy 00030001 FAIL_START_WAIT
>
> I have no idea what happened. I've got users who are writing to a gfs
> volume and just came and reported to me that the volumes not responding.
> /var/log/messages has been outputting the following message, about 50
> times a second, since Friday:
>
> Oct 9 13:54:35 plxp01deploy kernel: dlm: recover_master_copy -53 401ce
>
> Can someone tell me what FAIL_START_WAIT means and how to recover from
> it? Also, does anyone know what the log message above means?
>
> All my servers in the cluster are showing the same service states.
>
> I'm running RHEL5-64 bit.
>
> please help. I'm almost ready to give up on GFS. It seems way too
> unstable.
>
> James Fillman
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
More information about the Linux-cluster
mailing list