[libvirt] [PATCH v3 09/12] qemu: Start PR daemon on domain startup
Daniel P. Berrangé
berrange at redhat.com
Thu Mar 22 10:28:28 UTC 2018
On Thu, Mar 22, 2018 at 11:16:47AM +0100, Michal Privoznik wrote:
> On 03/15/2018 01:31 PM, Peter Krempa wrote:
> > On Wed, Mar 14, 2018 at 17:05:38 +0100, Michal Privoznik wrote:
> >> Before we exec() qemu we have to spawn pr-helper processes for
> >> all managed reservations (well, technically there can only one).
> >> The only caveat there is that we should place the process into
> >> the same namespace and cgroup as qemu (so that it shares the same
> >> view of the system). But we can do that only after we've forked.
> >> That means calling the setup function between fork() and exec().
> >>
> >> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
> >> ---
> >> src/qemu/qemu_process.c | 169 ++++++++++++++++++++++++++++++++++++++++++++++++
> >> 1 file changed, 169 insertions(+)
> >>
>
>
> >
> >> +
> >> + priv->prPid = cpid;
> >> + ret = 0;
> >> + cleanup:
> >> + if (ret < 0) {
> >> + virCommandAbort(cmd);
> >> + virProcessKillPainfully(cpid, true);
> >> + }
> >> + virCommandFree(cmd);
> >> + if (pidfile) {
> >> + if (unlink(pidfile) < 0 &&
> >> + errno != ENOENT &&
> >> + !virGetLastError())
> >> + virReportSystemError(errno,
> >> + _("Cannot remove stale PID file %s"),
> >> + pidfile);
> >> + VIR_FREE(pidfile);
> >
> > By removing the pidfile, you can't make sure later that the PR helper
> > program is still the same process, thus when killing the VM by the pid
> > number only you might actually kill a different process.
> >
> > I unfortunately already deleted the top of the message so you'll need to
> > fix the function qemuProcessKillPRDaemon to see whether it's the same
> > process, similarly to what we do when reconnecting to qemu.
In general when pidfiles are locked using lockf()/fcntl(), you should never
unlink them unless you have first acquired the lock, and even then the code
that acquires the lock needs to be made robust against races.
a race condition
Our virPidFileAcquirePath copes with race'ing unlinks, but QEMU's code does
not. We should fix QEMU in this respect too.
>
> Okay, let me write here what I told you in person. I've tried couple of
> approaches to do proper locking of pidfile:
>
> a) run qemu-pr-helper -f $pidfile in combination with
> virPidFileForceCleanupPath(). This doesn't work because qemu-pr-helper
> uses lockf() to lock the pidfile. Libvirt uses F_SETLK. These two locks
> don't know about each other. Sure, we can adapt virPid* to do
> lockf()-ing. But we will fail miserably if qemu ever decides to change
> that to say fnctl().
I don't think that is correct - lockf() is just a thing wrapper around
fnctl() F_SETLK from what I understand. Many systems do this equivalance,
although technically POSIX leaves it undefined.
> b) I've changed qemu-pr-helper to use fnctl() to lock pidfile. They have
> a nice wrapper over it - qemu_lock_fd(). This still did not fly. Their
> wrapper prefers F_OFD_SETLK over F_SETLK. OFD stands for Open File
> Description Locks. The lock is not associated with process but with FD
> and thus survives fork(). Again, incompatible locks. If file is locked
> with F_OFD_SETLK another process can F_SETLK it successfully.
QEMU does the OFD_SETLK/SETLK magic because its APIs were primarily
design for use with the block layer, where it must have sane semantics
to prevent accidental loss of the lock.
The limitations of F_SETLK are not really an issue for pidfiles which
are opened once and never closed. So it would be valid to change
QEMU to use fcntl(F_SETLK) exclusively for pidfiles. This would be
semantically compatible with lockf() on most systems, and get QEMU
away from under-specified semantics of lockf().
> c) My third attempt was to not rely on qemu-pr-helper locking at all
> (after all they can change it anytime which could break our tailored
> solution). So I've modified qemuProcessStartPRDaemonHook() so it calls
> virPidFileAcquirePath() and leaks the pidfile FD to qemu-pr-helper. This
> looked promising: command hooks are called just before exec(), after all
> FDs are closed. Unfortunately, virPidFileAcquirePath() sets FD_CLOEXEC
> flag, so no leaking is possible. Sure I can call virSetInherit() to
> cancel the CLOEXEC flag. BUT, here the proposal:
>
> IIUC the only problem we have with storing pid and killing it later is
> that we might end up killing a different process than qemu-pr-helper
> because qemu-pr-helper may die and another process might be spawn with
> its PID. Well, this would be solved with the events I'm mentioning in
> cover letter. The idea is that qemu sends us an event as soon as
> qemu-pr-helper process dies. So we would spawn a new one and update the
> stored PID. This way we would never kill wrong process.
>
> BTW: don't look into qemuProcessReconnect. It suffers from the same
> problem. If libvirtd is stopped, qemu quits and another process is
> spawned with its PID, we kill the new process as soon as we try to
> reconnect to the monitor, because connecting to the monitor fails
> (obviously) so we jump onto error label where qemuProcessStop() is
> called. Worse, if there's OOM situation and we fail to asprintf()
> pidfile path we jump onto the label too (and massacre the process we
> think is qemu).
>
>
> My proposal is to:
> 1) keep tracking of PID as-is now,
> 2) forbid starting domains with <reservations managed='yes'/> until the
> events are implemented.
>
> Alternatively, I can go with c) (in the end I have it working locally)
> and do all the changes necessary. Frankly, I don't have preference. I
> also wonder if API design for events and query-* command for reconnect
> might makes us lean towards one of the options.
I would suggest doing both a) and b).
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
More information about the libvir-list
mailing list