[libvirt] New QEMU daemon for persistent reservations

Sun Sep 10 09:38:39 UTC 2017

On 28/08/2017 13:11, Michal Privoznik wrote:
> On 08/25/2017 12:41 AM, Paolo Bonzini wrote:
>> On 22/08/2017 18:27, Paolo Bonzini wrote:
>>> Hi all,
> 
> Hey, sorry for late reply. I was enjoying my PTO by not reading e-mails :-)

Me too!

>>>
>>> I am adding a new daemon to QEMU, that QEMU can connect to in order to
>>> issue persistent reservation commands.
> 
> Persistent reservation of what?

SCSI persistent reservation commands are commands that support multiple
initiators accessing the same device.

>> Thinking more about it, Libvirt could start the daemon on its own,
>> assigning a socket per virtual machine.  SELinux MCS should then just
>> work, because the same category is assigned to the daemon instance and QEMU.
> 
> Whoa, libvirt is not really prepared for qemu spawning processes, or
> having more than one process per qemu domain. But it shouldn't be that
> hard to prepare internal structs for that.

As Daniel remarked, QEMU should not be spawning processes with elevated
privileges.  In fact, it definitely should not be doing that in this
case.  The two usecases are:

1) for user libvirtd, the daemon is started by init and uses a global
socket.  To talk to the daemon, the user must have been given access to
the socket by the administrator via Unix groups or ACLs or whatever.
Once you have access, you can send PRs to whatever device you have
access (unless the administrator hasn't also configured e.g. devices
cgroups for the daemon---but this is outside libvirt's scope).  While
this mode can be used for system libvirtd too, it would be less secure
and it would be hard to make it work with SELinux, so...

2) ... for system libvirtd, a copy of the daemon is started for each VM
by libvirtd, together with QEMU.  This mode cannot be used for user
libvirtd because the helper is _not_ suid root.  The daemon can then be
placed in the same devices cgroup and SELinux MCS category as QEMU.

>> In particular, Libvirt could create the socket itself, label it, and
>> pass it to the daemon through the systemd socket activation protocol
>> (which qemu-pr-helper supports).
> 
> We can pass FDs to qemu (in fact any process). Even if its running. So
> that shouldn't be a problem.

Yeah, the systemd socket activation protocol should be trivial to
support in libvirtd (compared to other changes to use >1 process per VM).

>>
>> The XML to use the helper with a predefined socket could be:
>>
>> 	<disk ...>
>> 	   <pr mode='connect'>/path/to/unix.socket'</pr>
>> 	</disk>
> 
> Do we want to/need to expose the path here? I mean, is user expected to
> do something with it? We don't expose monitor path anywhere but keep it
> private (of course we store it in so called status XML which is a
> persistent storage solely for purpose of reloading the internal state
> when daemon is restarted).

In this case, yes.  This is for the case of a global daemon.

>>
>> while to use it with a dedicated daemon
>>
>> 	<disk ...>
>> 	   <pr mode='private'>/path/to/qemu-pr-helper</pr>
>> 	</disk>
> 
> 
> Ah, so there isn't 1:1 relationship with qemu process and the daemon
> helper. One daemon can serve multiple qemu processes, am I right?

Yes, but it need not be the case.

> Also, how would libvirt know if the daemon helper dies?

The daemon shouldn't die.  Last famous words, sure, but even if the
daemon isn't respawned, persistent reservations commands fail but
everything else works normally.

> I mean, if libvirt is
> to start it again (there are some systemd-less distros), we have to know
> that such situation happened. For instance, we can get an event on the
> monitor to which we start the daemon and pass new FD to its socket to
> qemu? Although, this would mean a significant work on libvirt side in
> case there's 1:N relationship. Because on delivery of the event from two
> domains we have to figure out if the domains are supposed to have their
> own daemons or one shared.

Yeah, this is why for the 1:N model (user libvirtd) I prefer to just
leave it to systemd and systemd-less distros will rely on the last
famous words above.  For the 1:1 model (system libvirtd) libvirtd
_could_ respawn, but it is not big deal if it doesn't.

> Also, what happens when the daemon dies? What's the qemu state at that
> point? Is the guest still running or is it paused (e.g. like on ENOSPC
> error on disks)?

See above.

Paolo