[Linux-cluster] DLM Problem

Patrick Caulfeld pcaulfie at redhat.com
Wed Jan 30 08:40:31 UTC 2008


isplist at logicore.net wrote:
> New thread since this is a DLM issue :)
> 
>> As I understand it, basically the first node to access a file must do some
>> lock checks - see if anyone has already locked it, attempt to lock it, let
>> the other nodes know about it, etc.
> 
> 
> I think I see what's going on. I see the following error in my logs which 
> suggests to me that every connection is being checked since locking is not 
> happening.
> 
> Jan 29 11:24:04 compdev kernel: GFS: fsid=compweb:web.0: jid=3: replays = 0, 
> skips = 1, sames = 24
> Jan 29 11:24:04 compdev kernel: GFS: fsid=compweb:web.0: jid=3: Journal 
> replayed in 1s
> Jan 29 11:24:04 compdev kernel: GFS: fsid=compweb:web.0: jid=3: Done
> Jan 29 11:30:10 compdev kernel: dlm: could not bind to local address for 
> connect: -98
> Jan 29 11:35:40 compdev kernel: dlm: could not bind to local address for 
> connect: -98
> Jan 29 11:38:35 compdev kernel: dlm: could not bind to local address for 
> connect: -98
> 
> I'm not finding a lot on google about how to go about finding the problem, 
> fixing it.

That means that something else is using port 21064 - the TCP port that
the DLM uses. If the DLM can't bind to its port then it cannot start.

Use netstat -tap or lsof to find out what is using that port. If you
can't stop that particular application that is using it, then you'll
need to move the DLM to another port ON ALL CLUSTER NODES by echoing a
port number into /proc/cluster/dlm/tcp_port.

Patrick




More information about the Linux-cluster mailing list