[Linux-cluster] Deadlock when using clvmd + OpenAIS + Corosync

Thu Jan 21 15:17:25 UTC 2010

On Wed, Jan 13, 2010 at 4:59 AM, Christine Caulfield
<ccaulfie at redhat.com> wrote:
> On 12/01/10 16:21, Evan Broder wrote:
>>
>> On Tue, Jan 12, 2010 at 3:54 AM, Christine Caulfield
>> <ccaulfie at redhat.com>  wrote:
>>>
>>> On 11/01/10 09:38, Christine Caulfield wrote:
>>>>
>>>> On 11/01/10 09:32, Evan Broder wrote:
>>>>>
>>>>> On Mon, Jan 11, 2010 at 4:03 AM, Christine Caulfield
>>>>> <ccaulfie at redhat.com>  wrote:
>>>>>>
>>>>>> On 08/01/10 22:58, Evan Broder wrote:
>>>>>>>
>>>>>>> [please preserve the CC when replying, thanks]
>>>>>>>
>>>>>>> Hi -
>>>>>>> We're attempting to setup a clvm (2.02.56) cluster using OpenAIS
>>>>>>> (1.1.1) and Corosync (1.1.2). We've gotten bitten hard in the past by
>>>>>>> crashes leaving DLM state around and forcing us to reboot our nodes,
>>>>>>> so we're specifically looking for a solution that doesn't involve
>>>>>>> in-kernel locking.
>>>>>>>
>>>>>>> We're also running the Pacemaker OpenAIS service, as we're hoping to
>>>>>>> use it for management of some other resources going forward.
>>>>>>>
>>>>>>> We've managed to form the OpenAIS cluster, and get clvmd running on
>>>>>>> both of our nodes. Operations using LVM succeed, so long as only one
>>>>>>> operation runs at a time. However, if we attempt to run two
>>>>>>> operations
>>>>>>> (say, one lvcreate on each host) at a time, they both hang, and both
>>>>>>> clvmd processes appear to deadlock.
>>>>>>>
>>>>>>> When they deadlock, it doesn't appear to affect the other clustering
>>>>>>> processes - both corosync and pacemaker still report a fully formed
>>>>>>> cluster, so it seems the issue is localized to clvmd.
>>>>>>>
>>>>>>> I've looked at logs from corosync and pacemaker, and I've straced
>>>>>>> various processes, but I don't want to blast a bunch of useless
>>>>>>> information at the list. What information can I provide to make it
>>>>>>> easier to debug and fix this deadlock?
>>>>>>>
>>>>>>
>>>>>> To start with, the best logging to produce is the clvmd logs which
>>>>>> can be
>>>>>> got with clvmd -d (see the man page for details). Ideally these
>>>>>> should be
>>>>>> from all nodes in the cluster so they can be correlated. If you're
>>>>>> still
>>>>>> using DLM then a dlm lock dump from all nodes is often helpful in
>>>>>> conjunction with the clvmd logs.
>>>>>
>>>>> Sure, no problem. I've posted the logs from clvmd on both processes in
>>>>> <http://web.mit.edu/broder/Public/clvmd/>. I've annotated them at a
>>>>> few points with what I was doing - the annotations all start with "
>>>>>>>
>>>>>>> ", so they should be easy to spot.
>>>
>>>
>>> Ironically it looks like a bug in the clvmd-openais code. I can reproduce
>>> it
>>> on my systems here. I don't see the problem when using the dlm!
>>>
>>> Can you try -Icorosync and see if that helps? In the meantime I'll have a
>>> look at the openais bits to try and find out what is wrong.
>>>
>>> Chrissie
>>>
>>
>> I'll see what we can pull together, but the nodes running the clvm
>> cluster are also Xen dom0's. They're currently running on (Ubuntu
>> Hardy's) 2.6.24, so upgrading them to something new enough to support
>> DLM 3 would be...challenging.
>>
>> It would be much, much better for us if we could get clvmd-openais
>> working.
>>
>> Is there any chance this would work better if we dropped back to
>> openais whitetank instead of corosync + openais wilson?
>>
>
>
> OK, I've found the bug and it IS in openais. The attached patch will fix it.
>
> Chrissie
>

Awesome. That patch fixed our problem.

We are running into one other problem - performing LVM operations on
one node is substantially slower than performing them on the other
node:

root at black-mesa:~# time lvcreate -n test -L 1G xenvg
  Logical volume "test" created

real	0m0.309s
user	0m0.000s
sys	0m0.008s
root at black-mesa:~# time lvremove -f /dev/xenvg/test
  Logical volume "test" successfully removed

real	0m0.254s
user	0m0.004s
sys	0m0.008s

root at torchwood-institute:~# time lvcreate -n test -L 1G xenvg
  Logical volume "test" created

real	0m7.282s
user	0m6.396s
sys	0m0.312s
root at torchwood-institute:~# time lvremove -f /dev/xenvg/test
  Logical volume "test" successfully removed

real	0m7.277s
user	0m6.420s
sys	0m0.292s

Any idea why this is happening and if there's anything we can do about it?

Thanks again for your help,
 - Evan