[libvirt] [PATCH] docs: add a mention for start a vm with rawio = 'yes'

lhuang lhuang at redhat.com
Tue Mar 3 02:13:57 UTC 2015


On 03/02/2015 06:43 PM, Daniel P. Berrange wrote:
> On Mon, Mar 02, 2015 at 06:04:44PM +0800, Luyao Huang wrote:
>> When we start a vm which have rawio = 'yes' settings without
>> any file caps settings for qemu, qemu process still cannot use
>> this caps (CAP_SYS_RAWIO) and the /proc/pidofqemu/status like
>> this:
>>
>>    CapInh: 0000000000020000
>>    CapPrm: 0000000000000000
>>    CapEff: 0000000000000000
>>    CapBnd: 0000001fffffffff
>>
>> this is because we do not set file caps for qemu (see man 7
>> capabilities), although laine have mentioned this in commit
>> e11451, i think it will be good if we add this in docs.
> This is only true if you are starting the guest under the
> qemu:///session URI. In such a case I think it is expected
> that the QEMU lacks rawio capabilities, because the whole
> point of qemu:///session is that the VM has no elevated
> privileges.
>
> In the case of qemu:///system libvirt should ensure that
> it does the right thing with passing on raw io capability
> flag. If it does not, then we must fix that in the code,
> not the docs.

Hmm, what i show is the test result in qemu:///system, and we already 
set the right cap flag before we do execv() or execve(), however we run 
qemu process in qemu(107) not root(0) in most case, so only set this cap 
flags cannot make qemu to use this flag, because from capabilities(7):

    Transformation of capabilities during execve()
        During an execve(2), the kernel calculates the new capabilities 
of the process using the following algorithm:

            P'(permitted) = (P(inheritable) & F(inheritable)) |
                            (F(permitted) & cap_bset)

            P'(effective) = F(effective) ? P'(permitted) : 0

            P'(inheritable) = P(inheritable)    [i.e., unchanged]

        where:

            P         denotes the value of a thread capability set 
before the execve(2)

            P'        denotes the value of a capability set after the 
execve(2)

            F         denotes a file capability set

            cap_bset  is the value of the capability bounding set 
(described below).

So if not set any file cap to qemu program (/usr/libexec/qemu-kvm), the 
qemu process will get this cap flags:

Uid:    107    107    107    107
Gid:    107    107    107    107
...
CapInh:    0000000000020000
CapPrm:    0000000000000000
CapEff:    0000000000000000
CapBnd:    0000001fffffffff

and qemu process do not have this cap as the CapEff is for kernel do 
permission check.

I think libvirt already do the right things here although running qemu 
process do not have rawio capability
flag in this case, because i think it is not a good idea for libvirt to 
set file cap to qemu program, libvirt is not the only user which use or 
call qemu program, set a file cap to qemu program will affect other 
callers (although set a small file cap will not be a big deal :) ), so i 
guess maybe it is good to make the users to set this instead of libvirt 
use cap_set_file() to do this.

BTW, if we make qemu process run with root(0) uid and gid, the cap flags 
will like this:
...
Uid:    0    0    0    0
Gid:    0    0    0    0
...
CapInh:    0000000000020000
CapPrm:    0000000000020000
CapEff:    0000000000020000
CapBnd:    0000000000020000

>
> Regards,
> Daniel

Thanks,
Luyao




More information about the libvir-list mailing list