[libvirt] [PATCH 5/5] rpc: switch virtlockd and virtlogd to use single-threaded dispatch

Wed Mar 7 10:10:29 UTC 2018

On Tue, Mar 06, 2018 at 04:46:05PM -0700, Jim Fehlig wrote:
> On 03/06/2018 10:58 AM, Daniel P. Berrangé wrote:
> > Currently both virtlogd and virtlockd use a single worker thread for
> > dispatching RPC messages. Even this is overkill and their RPC message
> > handling callbacks all run in short, finite time and so blocking the
> > main loop is not an issue like you'd see in libvirtd with long running
> > QEMU commands.
> > 
> > By setting max_workers==0, we can turn off the worker thread and run
> > these daemons single threaded. This in turn fixes a serious problem in
> > the virtlockd daemon whereby it looses all fcntl() locks at re-exec due
> > to multiple threads existing. fcntl() locks only get preserved if the
> > process is single threaded at time of exec().
> 
> I suppose this change has no affect when e.g. starting many domains in
> parallel when locking is enabled. Before the change, there's still only one
> worker thread to process requests.
> 
> I've tested the series and locks are now preserved across re-execs of
> virtlockd. Question is whether we want this change or pursue fixing the
> underlying kernel bug?
> 
> FYI, via the non-public bug I asked a glibc maintainer about the lost lock
> behavior. He agreed it is a kernel bug and posted the below comment to the
> bug.
> 
> Regards,
> Jim
> 
> First, I agree that POSIX file record locks (i.e. the fcntl F_SETLK ones, which
> you're using) _are_ to be preserved over execve (absent any FD_CLOEXEC of
> course, which you aren't using).  (Relevant quote from fcntl(2):
> 
>        Record locks are not inherited by  a  child  created  via  fork(2),
>        but  are  preserved  across  an execve(2).
> 
> Second I agree that the existence or non-existence of threads must not play
> a role in the above.

I've asked some Red Hat experts too and they suggest it looks like a kernel
bug. The question is whether this is a recent kernel regression, that is easily
fixed, or whether its a long term problem.

I've at least verified that this broken behaviour existed in RHEL-7 (but its
possible it was backported when OFD locks were implemented). I still want to
test RHEL-6 and RHEL-5 to see if this problem goes back indefinitely.

My inclination though is that we'll need to work around the problem in
libvirt regardless.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|