[linux-lvm] clvmd leaving kernel dlm uncontrolled lockspace
Andreas Pflug
andreas.pflug at web.de
Wed Jun 5 17:29:22 UTC 2013
On 06/05/13 17:13, David Teigland wrote:
> On Wed, Jun 05, 2013 at 03:23:32PM +0200, Andreas Pflug wrote:
> A few different topics wrapped together there:
>
> - With kill -9 clvmd (possibly combined with dlm_tool leave clvmd),
> you can manually clear/remove a userland lockspace like clvmd.
I had some clvmd instances not starting up correctly, remaining in
nowhereland...
>
> - If clvmd is blocked in the kernel in uninterruptible sleep, then
> the kill above will not work. To make kill work, you'd locate the
> particular sleep in the kernel and determine if there's a way to
> make it interruptible, and cleanly back it out.
>
> - If clvmd is blocked in the kernel for >120s, you probably want to
> investigate what is causing that, rather than being too hasty
> killing clvmd.
>
> - If corosync or dlm_controld are killed while dlm lockspaces exist,
> they become "uncontrolled" and would need to be forcibly cleaned up.
> This cleanup may be possible to implement for userland lockspaces,
> but it's not been clear that the benefits would greatly outweigh
> using reboot for this.
Any of those programs might get a problem, so either they should
re-attach to the lockspace, or a cleanup should be possible. If (as in
my case) the host is a xen host with san storage you wouldn't like to
reboot it... In my naive imagination, an orphaned lockspace is just some
allocated memory that should't be too hard to free.
>
> - Killing either corosync or dlm_controld is very unlikely help
> anything, and more likely to cause further problems, so it should
> be avoided as far as possible.
Apparently the problem started with corosync running correctly, but
dlm_controld wasn't up; clvmd then blocked somewhere. I now have still
four hosts with 60VMs or so to reboot. So any hint how to kill that
lockspace is greatly appreciated.
Regards,
Andreas
More information about the linux-lvm
mailing list