[Linux-cluster] disabling DLM and GFS kernel modules

Tue Sep 18 15:00:06 UTC 2007

The only other thing I can think of is that I started NTPd and there was 
likely a big time adjustment as it had not been running.

Sep 17 10:27:32 ntpd[1118]: synchronized to 206.222.28.90, stratum 2
Sep 17 15:53:38 ntpd[1118]: time reset +18217.299628 s
Sep 17 15:53:38 ntpd[1118]: kernel time sync enabled 0001
Sep 17 15:53:38 openais[4457]: [TOTEM] The token was lost in the 
OPERATIONAL state.
Sep 17 15:53:38 dlm_controld[4480]: cluster is down, exiting
Sep 17 15:53:38 gfs_controld[4486]: cluster is down, exiting
Sep 17 15:53:38 fenced[4474]: cluster is down, exiting
Sep 17 15:53:38 kernel: dlm: closing connection to node 1
Sep 17 15:53:48 named[8732]: *** POKED TIMER ***
Sep 17 15:53:48 named[8733]: *** POKED TIMER ***
Sep 17 15:54:04 ccsd[4437]: Unable to connect to cluster infrastructure 
after 30 seconds.

David Teigland wrote:
> On Tue, Sep 18, 2007 at 09:34:45AM -0500, Chris Harms wrote:
>   
>> It said something about an out of memory condition.   This was logged 
>> just prior to where it would have panicked:
>>
>> groupd[9639]: found uncontrolled kernel object rgmanager in /sys/kernel/dlm
>> groupd[9639]: local node must be reset to clear 1 uncontrolled instances 
>> of gfs and/or dlm
>> openais[9625]: [CMAN ] cman killed by node 1 because we were killed by 
>> cman_tool or other application
>> fenced[9647]: cman_init error 0 111
>> dlm_controld[9653]: cman_init error 0 111
>> gfs_controld[9659]: cman_init error 111
>>     
>
> These messages mean that the userspace cluster software all exited for
> some unknown reason, leaving behind a dlm lockspace (in the kernel) from
> rgmanager.  At this point, you needed to reboot the machine, but instead
> you restarted the userspace cluster software, which rightly complained
> that you hadn't rebooted the machine, and refused to do operate.
>
> This probably doesn't help, though, because it doesn't tell us anything
> about the original problem(s) you had.  The original problem(s) probably
> caused the cluster software to exit the first time, and was probably
> related to the runaway processes.
>
>
>   
>> There were 2 runaway processes related to GFS / DLM before I tried to 
>> shut it down.  We had not encountered any issues like this until now.  
>> The only changes to our setup were a superficial change to some cluster 
>> services, and an upgrade of the DRBD kernel module.
>>
>> Kevin Anderson wrote:
>>     
>>> On Mon, 2007-09-17 at 17:50 -0500, Chris Harms wrote:
>>>       
>>>> Is there an easy way to disable GFS and related kernel modules if one 
>>>> does not need GFS?  We are running the 5.1 Beta 1 version of the cluster 
>>>> and had a mysterious crash of the cluster suite.  There were issues with 
>>>> the GFS and dlm modules.  The kernel panicked on shutdown.
>>>>
>>>>    
>>>>         
>>> Do you have any details on the panic?
>>>
>>> Kevin
>>> ------------------------------------------------------------------------
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>       
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>