[libvirt] [PATCH 5/5] rpc: switch virtlockd and virtlogd to use single-threaded dispatch

Jim Fehlig jfehlig at suse.com
Wed Mar 7 15:48:35 UTC 2018


On 03/07/2018 06:07 AM, Daniel P. Berrangé wrote:
> On Wed, Mar 07, 2018 at 10:10:29AM +0000, Daniel P. Berrangé wrote:
>> On Tue, Mar 06, 2018 at 04:46:05PM -0700, Jim Fehlig wrote:
>>> On 03/06/2018 10:58 AM, Daniel P. Berrangé wrote:
>>>> Currently both virtlogd and virtlockd use a single worker thread for
>>>> dispatching RPC messages. Even this is overkill and their RPC message
>>>> handling callbacks all run in short, finite time and so blocking the
>>>> main loop is not an issue like you'd see in libvirtd with long running
>>>> QEMU commands.
>>>>
>>>> By setting max_workers==0, we can turn off the worker thread and run
>>>> these daemons single threaded. This in turn fixes a serious problem in
>>>> the virtlockd daemon whereby it looses all fcntl() locks at re-exec due
>>>> to multiple threads existing. fcntl() locks only get preserved if the
>>>> process is single threaded at time of exec().
>>>
>>> I suppose this change has no affect when e.g. starting many domains in
>>> parallel when locking is enabled. Before the change, there's still only one
>>> worker thread to process requests.
>>>
>>> I've tested the series and locks are now preserved across re-execs of
>>> virtlockd. Question is whether we want this change or pursue fixing the
>>> underlying kernel bug?
>>>
>>> FYI, via the non-public bug I asked a glibc maintainer about the lost lock
>>> behavior. He agreed it is a kernel bug and posted the below comment to the
>>> bug.
>>>
>>> Regards,
>>> Jim
>>>
>>> First, I agree that POSIX file record locks (i.e. the fcntl F_SETLK ones, which
>>> you're using) _are_ to be preserved over execve (absent any FD_CLOEXEC of
>>> course, which you aren't using).  (Relevant quote from fcntl(2):
>>>
>>>         Record locks are not inherited by  a  child  created  via  fork(2),
>>>         but  are  preserved  across  an execve(2).
>>>
>>> Second I agree that the existence or non-existence of threads must not play
>>> a role in the above.
>>
>> I've asked some Red Hat experts too and they suggest it looks like a kernel
>> bug. The question is whether this is a recent kernel regression, that is easily
>> fixed, or whether its a long term problem.
>>
>> I've at least verified that this broken behaviour existed in RHEL-7 (but its
>> possible it was backported when OFD locks were implemented). I still want to
>> test RHEL-6 and RHEL-5 to see if this problem goes back indefinitely.
> 
> I've checked RHEL6 & RHEL5 and both are affected, so this a long time Linux
> problem, and so we'll need to workaround it.

We have some vintage distros around for long term support and I managed to 
"bisect" the problem a bit: The reproducer works on kernel 2.6.16 but breaks on 
2.6.32.

> FYI I've got kernel bug open here to track it from RHEL side:
> 
>    https://bugzilla.redhat.com/show_bug.cgi?id=1552621

Thanks!

Regards,
Jim




More information about the libvir-list mailing list