[Cluster-devel] fatal: assertion "!atomic_read(&gl->gl_ail_count)" failed

Mon Feb 26 20:55:25 UTC 2007

Ok I've filed bz 230143 for this in order to help track it.  I'm working on a
umount panic currently, but when I figure that out I will try to reproduce and
work on this issue.

Josef

On Mon, Feb 26, 2007 at 03:43:04PM -0000, David Craigon wrote:
> This happens when I create a file on one computer, then quickly delete
> it on the other.
> 
> It doesn't happen if 1) I wait a long period of time between creating
> the file and deleting it 2) if I delete the file on the same computer as
> I made it, no matter how fast I do it. I seem to be able to create files
> on both computers as much as I like.
> 
> David
> 
> 
> > -----Original Message-----
> > From: Josef Whiter [mailto:jwhiter at redhat.com] 
> > Sent: 23 February 2007 17:59
> > To: David Craigon
> > Cc: cluster-devel at redhat.com
> > Subject: Re: [Cluster-devel] fatal: assertion 
> > "!atomic_read(&gl->gl_ail_count)" failed
> > 
> > On Fri, Feb 23, 2007 at 04:17:57PM -0000, David Craigon wrote:
> > > Hello,
> > > 
> > > I'm trying to use GFS2. I'm trying to use all latest parts- so I've 
> > > tried it using Fedora 7 test1 with a check out from CVS, 
> > and I've also 
> > > tried Fedora 6. I have an equal lack of success with both.
> > > 
> > > My set up is that I am trying to set up a simple cluster 
> > featuring two 
> > > servers attached using open-iSCSI to a backend SAN. The iSCSI part 
> > > works
> > > fine- I have the drive as a device on both computers. I'm using the 
> > > iSCSI that comes with the linux distro. I've turned off 
> > SELinux. I'm 
> > > using DLM locking
> > > 
> > > When I've got both servers attached to it works for a short while 
> > > (circa 10 seconds or so).
> > > What I typically do is create files and then delete them 
> > from the two 
> > > servers. After a while I get this....
> > > 
> > > Feb 23 15:54:28 a kernel: GFS2: fsid=: Trying to join cluster 
> > > "lock_dlm", "alpha_cluster:a"
> > > Feb 23 15:54:28 a kernel: GFS2: fsid=alpha_cluster:a.0: 
> > Joined cluster.
> > > Now mounting FS...
> > > Feb 23 15:54:28 a kernel: GFS2: fsid=alpha_cluster:a.0: 
> > jid=0, already 
> > > locked for use Feb 23 15:54:28 a kernel: GFS2: 
> > fsid=alpha_cluster:a.0: 
> > > jid=0: Looking at journal...
> > > Feb 23 15:54:28 a kernel: GFS2: fsid=alpha_cluster:a.0: jid=0: Done 
> > > Feb 23 15:54:55 a kernel: GFS2: fsid=alpha_cluster:a.0: fatal: 
> > > assertion "!atomic_read(&gl->gl_ail_count)" failed
> > > Feb 23 15:54:55 a kernel: GFS2: fsid=alpha_cluster:a.0:   function =
> > > gfs2_meta_inval, file = fs/gfs2/meta_io.c, line = 101 Feb 
> > 23 15:54:55 
> > > a kernel: GFS2: fsid=alpha_cluster:a.0: about to withdraw this file 
> > > system Feb 23 15:54:55 a kernel: GFS2: 
> > fsid=alpha_cluster:a.0: telling 
> > > LM to withdraw
> > > 
> > > At that point, this server can't look at the mount point anymore.
> > > 
> > > Can anyone offer any assistance?
> > >  
> > 
> > I hit this same bug as well, but haven't gone back to try and 
> > reproduce it yet.
> > Could you possibly try to narrow down an exact (or heck even 
> > general) sequence of commands that will trigger the problem?  
> > If not open a bugzilla and CC me to it and I'll try to get 
> > some time next week to reproduce it again.  Thanks,
> > 
> > Josef
> > 
>