[Linux-cluster] rgmanager is jamed
Fabio M. Di Nitto
fdinitto at redhat.com
Sat May 26 07:05:37 UTC 2012
On 05/25/2012 06:20 PM, Nicolas Ross wrote:
> I am in the process of upgrading one of our cluster from RHEL 6.1 to
> 6.2. It's an 8-node cluster.
>
> I started with one node. Stop all cluster resources, cman, rgmanager et
> al. yum update, reboot, move to next. The first one did ok.
>
> On the second one, rgmanager started, but doesn't seem to connect to
> other nodes. I found this in dmesg :
>
> INFO: task rgmanager:2901 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> rgmanager D 0000000000000000 0 2901 2900 0x00000080
> ffff880667299d48 0000000000000082 0000000000000000 ffff8806656aa318
> ffff88066729c378 0000000000000001 ffff880665bb31b0 00007fffc6c6fa20
> ffff88066635a678 ffff880667299fd8 000000000000f4e8 ffff88066635a678
> Call Trace:
> [<ffffffff814ee6fe>] __mutex_lock_slowpath+0x13e/0x180
> [<ffffffff814ee59b>] mutex_lock+0x2b/0x50
> [<ffffffffa02c192c>] dlm_new_lockspace+0x3c/0xa30 [dlm]
> [<ffffffff8115f74c>] ? __kmalloc+0x20c/0x220
> [<ffffffffa02ca94d>] device_write+0x30d/0x7d0 [dlm]
> [<ffffffff8105ea30>] ? default_wake_function+0x0/0x20
> [<ffffffff8120c646>] ? security_file_permission+0x16/0x20
> [<ffffffff81176918>] vfs_write+0xb8/0x1a0
> [<ffffffff810d4932>] ? audit_syscall_entry+0x272/0x2a0
> [<ffffffff81177321>] sys_write+0x51/0x90
> [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
>
> Tried rebooting, but the shutdown staled on stoping rgmanager. Fenced
> the node, same outcome.
>
> Any hints ?
This looks like a kernel dlm problem. I can see you found a workaround,
but that should not be necessary since upgrades between releases should
work.
can you please file a ticket with GSS and escalate it? Might be a good
idea to grab sosreports before those logs are flushed away in rotate.
Thanks
Fabio
More information about the Linux-cluster
mailing list