[Linux-cluster] Problems with GFS2 faulting.

Steven Whitehouse swhiteho at redhat.com
Thu Jan 13 20:52:38 UTC 2011


Hi,

On Thu, 2011-01-13 at 10:41 -0500, Eric Renfro wrote:
> It's not RHEL, as I stated in my post. It's Ubuntu 10.04.1 with the 
> ubuntu-ha-maintainer PPA for pacemaker 1.0.8, open-iscsi 2.0.871, and 
> gfs2-tools 3.0.7.
> 
> I have, also stated, tried without multipathed iSCSI and just used a 
> singular iSCSI target for the nodes having problems, with the same 
> situation. The issue is strictly with GFS2 somehow after locking and 
> unlocking files. In fact, it can't be iSCSI at all, because the root 
> filesystem of both nodes are iSCSI targets provided by kvm on the host 
> OS, and they have given no issues as a result to iSCSI related issues. 
> If it would be caused by iSCSI blocking, it would happen to the root 
> filesystem as well I'm sure.
> 
> Eric Renfro
> 
I don't know a lot about pacemaker, and in fact the pcmk versions of
dlm_controld and gfs_controld have been removed from more recent
packages. Can you recreate the issue using the "normal" dlm and gfs
controlds?

The back trace you posted earlier seemed to show that there was a
process stuck waiting for a glock. The next part of the debug process is
to figure out which glock is being waited for and why. If you could get
a dump of the glocks from debugfs, that should help point the way,

Steve.


> On 1/13/2011 10:13 AM, Gordan Bobic wrote:
> > Eric Renfro wrote:
> >
> >> Here's the stack traces I'm getting when it faults:
> >>
> >> Jan 13 03:31:27 cweb1 kernel: [1387920.160141] INFO: task 
> >> flush-251:1:27497 blocked for more than 120 seconds.
> >> Jan 13 03:31:27 cweb1 kernel: [1387920.160802] "echo 0 > 
> >> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >
> > As the error says, this isn't an actual kernel ooops, it means that 
> > something in your stack (likely the iSCSI implementation since that is 
> > the fixed thing throughout what you tested) is blocking somewhere.
> >
> > What version of RHEL, gfs2 and iscsi are you using? My guess is that 
> > iscsi might be getting into a race somewhere and locks up. Have you 
> > tried connecting both clients to just a single server via iscsi to see 
> > if the problem goes away?
> >
> > Gordan
> >
> > -- 
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster





More information about the Linux-cluster mailing list