[Linux-cluster] Deadlock when using clvmd + OpenAIS + Corosync

Fri Jan 22 11:41:59 UTC 2010

On 21/01/10 15:17, Evan Broder wrote:
> On Wed, Jan 13, 2010 at 4:59 AM, Christine Caulfield
> <ccaulfie at redhat.com>  wrote:
>> On 12/01/10 16:21, Evan Broder wrote:
>>>
>>> On Tue, Jan 12, 2010 at 3:54 AM, Christine Caulfield
>>> <ccaulfie at redhat.com>    wrote:
>>>>
>>>> On 11/01/10 09:38, Christine Caulfield wrote:
>>>>>
>>>>> On 11/01/10 09:32, Evan Broder wrote:
>>>>>>
>>>>>> On Mon, Jan 11, 2010 at 4:03 AM, Christine Caulfield
>>>>>> <ccaulfie at redhat.com>    wrote:
>>>>>>>
>>>>>>> On 08/01/10 22:58, Evan Broder wrote:
>>>>>>>>
>>>>>>>> [please preserve the CC when replying, thanks]
>>>>>>>>
>>>>>>>> Hi -
>>>>>>>> We're attempting to setup a clvm (2.02.56) cluster using OpenAIS
>>>>>>>> (1.1.1) and Corosync (1.1.2). We've gotten bitten hard in the past by
>>>>>>>> crashes leaving DLM state around and forcing us to reboot our nodes,
>>>>>>>> so we're specifically looking for a solution that doesn't involve
>>>>>>>> in-kernel locking.
>>>>>>>>
>>>>>>>> We're also running the Pacemaker OpenAIS service, as we're hoping to
>>>>>>>> use it for management of some other resources going forward.
>>>>>>>>
>>>>>>>> We've managed to form the OpenAIS cluster, and get clvmd running on
>>>>>>>> both of our nodes. Operations using LVM succeed, so long as only one
>>>>>>>> operation runs at a time. However, if we attempt to run two
>>>>>>>> operations
>>>>>>>> (say, one lvcreate on each host) at a time, they both hang, and both
>>>>>>>> clvmd processes appear to deadlock.
>>>>>>>>
>>>>>>>> When they deadlock, it doesn't appear to affect the other clustering
>>>>>>>> processes - both corosync and pacemaker still report a fully formed
>>>>>>>> cluster, so it seems the issue is localized to clvmd.
>>>>>>>>
>>>>>>>> I've looked at logs from corosync and pacemaker, and I've straced
>>>>>>>> various processes, but I don't want to blast a bunch of useless
>>>>>>>> information at the list. What information can I provide to make it
>>>>>>>> easier to debug and fix this deadlock?
>>>>>>>>
>>>>>>>
>>>>>>> To start with, the best logging to produce is the clvmd logs which
>>>>>>> can be
>>>>>>> got with clvmd -d (see the man page for details). Ideally these
>>>>>>> should be
>>>>>>> from all nodes in the cluster so they can be correlated. If you're
>>>>>>> still
>>>>>>> using DLM then a dlm lock dump from all nodes is often helpful in
>>>>>>> conjunction with the clvmd logs.
>>>>>>
>>>>>> Sure, no problem. I've posted the logs from clvmd on both processes in
>>>>>> <http://web.mit.edu/broder/Public/clvmd/>. I've annotated them at a
>>>>>> few points with what I was doing - the annotations all start with "
>>>>>>>>
>>>>>>>> ", so they should be easy to spot.
>>>>
>>>>
>>>> Ironically it looks like a bug in the clvmd-openais code. I can reproduce
>>>> it
>>>> on my systems here. I don't see the problem when using the dlm!
>>>>
>>>> Can you try -Icorosync and see if that helps? In the meantime I'll have a
>>>> look at the openais bits to try and find out what is wrong.
>>>>
>>>> Chrissie
>>>>
>>>
>>> I'll see what we can pull together, but the nodes running the clvm
>>> cluster are also Xen dom0's. They're currently running on (Ubuntu
>>> Hardy's) 2.6.24, so upgrading them to something new enough to support
>>> DLM 3 would be...challenging.
>>>
>>> It would be much, much better for us if we could get clvmd-openais
>>> working.
>>>
>>> Is there any chance this would work better if we dropped back to
>>> openais whitetank instead of corosync + openais wilson?
>>>
>>
>>
>> OK, I've found the bug and it IS in openais. The attached patch will fix it.
>>
>> Chrissie
>>
>
> Awesome. That patch fixed our problem.
>
> We are running into one other problem - performing LVM operations on
> one node is substantially slower than performing them on the other
> node:
>
> root at black-mesa:~# time lvcreate -n test -L 1G xenvg
>    Logical volume "test" created
>
> real	0m0.309s
> user	0m0.000s
> sys	0m0.008s
> root at black-mesa:~# time lvremove -f /dev/xenvg/test
>    Logical volume "test" successfully removed
>
> real	0m0.254s
> user	0m0.004s
> sys	0m0.008s
>
>
> root at torchwood-institute:~# time lvcreate -n test -L 1G xenvg
>    Logical volume "test" created
>
> real	0m7.282s
> user	0m6.396s
> sys	0m0.312s
> root at torchwood-institute:~# time lvremove -f /dev/xenvg/test
>    Logical volume "test" successfully removed
>
> real	0m7.277s
> user	0m6.420s
> sys	0m0.292s
>
> Any idea why this is happening and if there's anything we can do about it?
>

I'm not at all sure why that should be happening. I suppose the best 
thing to do would be to enable clvmd logging (clvmd -d) and see what is 
taking the time.

Chrissie