[PATCH 4/4] qemu_passt: Don't let passt fork off

Thu Feb 16 08:52:27 UTC 2023

On 2/15/23 19:30, Stefano Brivio wrote:
> On Wed, 15 Feb 2023 18:04:56 +0100
> Michal Prívozník <mprivozn at redhat.com> wrote:
> 
>> On 2/15/23 08:50, Laine Stump wrote:
>>> On 2/14/23 8:02 AM, Stefano Brivio wrote:  
>>>> On Tue, 14 Feb 2023 12:51:22 +0100
>>>> Michal Privoznik <mprivozn at redhat.com> wrote:
>>>>  
>>>>> When passt starts it tries to do some security measures to
>>>>> restrict itself. For instance, it creates its own namespaces,
>>>>> umounts basically everything, drops capabilities, forks off to
>>>>> further restrict itself (the child is where all interesting work
>>>>> takes place now). This is sound, except it's causing two
>>>>> problems:
>>>>>
>>>>> 1) the PID file FD, which we leak into the passt process, gets
>>>>>     closed (and thus our virPidFile*() helpers see unlocked PID
>>>>>     file, which makes them think the process is gone),  
>>>>
>>>> I didn't realise this was the case, but giving passt write (unless I'm
>>>> missing something) access to a file created by libvirtd doesn't look
>>>> desirable to me.  
>>>   
>>>>  
>>>>> 2) the PID file no longer reflects true PID of the process.
>>>>>
>>>>> Worse, the child calls setsid() so we can't even kill the whole
>>>>> process group. I mean, we can but it won't be any good.  
>>>
>>> I think that (incorrect PID in the pidfile) is  happening because Michal
>>> is using the original version of my patches that were pushed - I had
>>> mimicked the behavior of slirp, where libvirt deamonizes the new
>>> process. If that process then daemonizes itself, we have some sort of
>>> "double daemon"; libvirt has saved off the pid of what it thinks is
>>> going to be the final process, but then that process further forks and
>>> exits from the process whose pid libvirt saved. But because passt was
>>> cleaning up after itself I hadn't noticed the discrepancy in pids when
>>> testing.
>>>
>>> Without going into all the details of the pidfile and locking and etc, I
>>> just want to say that if we can fork/exec dnsmasq and let it daemonize
>>> itself and create its own pidfile, then certainly we can do the same
>>> thing for passt. (and if there's a fundamental problem, then it's a
>>> fundamental problem for dnsmasq as well).  
>>
>> Alright. I think I have a solution that would please everybody involved.
>> I'll post it tomorrow though. I need to test it thoroughly. We would be
>> able to get passt's PID (which is needed not only for killing it, but
>> also for CGroup placement), NOT use --foreground and still pass errors
>> from it to users (that is unless logfile was specified, because
>> unfortunately, --log-file and --stderr are mutually exclusive).
> 
> That doesn't need to be the case (--log-file and --stderr being
> mutually exclusive)... if you have a use case for it, let's change that
> in passt. I just wanted to keep it simple for users ("give a log file,
> and be sure it won't spam").
> 
> Also mind that Laine's series:
>   https://archives.passt.top/passt-dev/20230215082437.110151-1-laine@redhat.com/

Thanks, this looks exactly like what we need. So for now I can just pass
--stderr if there's no --log-file, to deal with those "releases" that
don't have those patches merged yet.

> 
> *should* already cover all the cases where libvirt is interested in
> relaying "early" errors back to the user.
> 
> By the way, the one below is pretty much the patch I would have proposed
> for libvirt. I prepared it earlier today and didn't have a chance to
> test it yet, it's compile-tested only, and doesn't take cgroups into
> account (which, it seems, is needed no matter the lifecycle).
> 
> So I'm sharing it here as reference (that's how simple I wanted it to
> be -- minus cgroups), or if it's convenient for you to copy and paste
> something.
> 

This effectively disables placing passt into the CGroup set up for
emulator thread. And I don't think we want that. Firstly, it makes
statistics gathering report incorrect values. Secondly, these helper
processes are "implementation detail" - I mean, users don't really care
(from accounting POV) whether a task runs in emulator thread inside of
QEMU or in a separate process. It's still an emulation and as such
should be accounted for. And also, on NUMA machines we definitely want
to place passt as close to the emulator as possible (i.e. if emulator
thread is pinned than helper processes should be pinned too).

Furthermore, it enhances security, because libvirt sets up devices
controller in such way, that only devices from domain XML are allowed
and everything else is forbidden.

I could go on with other controllers but I believe you get the picture.
Now true, for qemu:///session we don't set any CGroups as we lack the
permissions to do so [1], and this is probably the target audience for
this feature anyway, but for qemu:///system (when running
libvirtd/virtqemud as root) we do set up CGroups and MUST place helper
processes into them. I mean, if we are concerned about security (just
look at the discussion about --foreground), then CGroups are definitely
step in the right direction.

1: and even this might change in the future as there are some plans to
let a privileged component create the CGroup for us (e.g. systemd).

Michal