[libvirt] [PATCH 01/14] virstoragefile: Introduce virStoragePRDef

Paolo Bonzini pbonzini at redhat.com
Mon Feb 5 19:38:24 UTC 2018


On 29/01/2018 14:03, Michal Privoznik wrote:
> On 01/26/2018 04:07 AM, John Ferlan wrote:
>>
>>
>> On 01/18/2018 11:04 AM, Michal Privoznik wrote:
>>> This is a definition that holds information on SCSI persistent
>>> reservation settings. The XML part looks like this:
>>>
>>>   <reservations enabled='yes' managed='no'>
>>>     <source type='unix' path='/path/to/qemu-pr-helper.sock' mode='client'/>
>>>   </reservations>
>>>
>>> If @managed is set to 'yes' then the <source/> is not parsed.
>>> This design was agreed on here:
>>>
>>> https://www.redhat.com/archives/libvir-list/2017-November/msg01005.html
>>>
>>> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
>>> ---
>>>  docs/formatdomain.html.in                          |  25 +++-
>>>  docs/schemas/domaincommon.rng                      |  19 +--
>>>  docs/schemas/storagecommon.rng                     |  34 +++++
>>>  src/conf/domain_conf.c                             |  36 +++++
>>>  src/libvirt_private.syms                           |   3 +
>>>  src/util/virstoragefile.c                          | 148 +++++++++++++++++++++
>>>  src/util/virstoragefile.h                          |  15 +++
>>>  .../disk-virtio-scsi-reservations-not-managed.xml  |  40 ++++++
>>>  .../disk-virtio-scsi-reservations.xml              |  38 ++++++
>>>  .../disk-virtio-scsi-reservations-not-managed.xml  |   1 +
>>>  .../disk-virtio-scsi-reservations.xml              |   1 +
>>>  tests/qemuxml2xmltest.c                            |   4 +
>>>  12 files changed, 348 insertions(+), 16 deletions(-)
>>>  create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.xml
>>>  create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations.xml
>>>  create mode 120000 tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations-not-managed.xml
>>>  create mode 120000 tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations.xml
>>>
>>
>> Before digging too deep into this...
>>
>>  - I assume we're avoiding <disk> iSCSI mainly because those
>> reservations would take place elsewhere, safe assumption?
> 
> I believe so, but I'll let Paolo answer that. The way I understand
> reservations is that qemu needs to issue 'privileged' SCSI commands and
> thus for regular SCSI (which for purpose of this argument involves iSCSI
> emulated by kernel) either qemu needs CAP_SYS_RAWIO or a helper process
> to which it'll pass the FD and which will issue the 'privileged' SCSI
> commands on qemu's behalf.

Yes.  There are two reasons for QEMU to access the helper.  First, in
order to be able to issue the command without CAP_SYS_RAWIO.  Second, in
order to access /dev/mapper/control and issue the command to all targets
in a multipath setup.

iSCSI in kernel, including multipath over iSCSI is included.  iSCSI in
userspace does not need qemu-pr-manager because QEMU 1) can just send
the command down a TCP socket without needing CAP_SYS_RAWIO 2) does not
support multipath for iSCSI in userspace.

>>  - What about using lun's from a storage pool (and what could become
>> your favorite, NPIV devices ;-))
>>
>>    <disk type='volume' device='lun'>
>>      <driver name='qemu' type='raw'/>
>>      <source pool='sourcepool' volume='unit:0:4:0'/>
>>      <target dev='sda' bus='scsi'/>
>>    </disk>
> 
> These should work too with my patches (not tested though - I don't have
> any real SCSI machine).
>
>>  - What about <hostdev>'s?
>>
>>    <hostdev mode='subsystem' type='scsi'>
>>
>>    but not iSCSI or vHost hostdev's. I think that creates the SCSI
>> generic LUN, but it's been a while since I've thought about the
>> terminology used for hostdev's...
> 
> I think these don't need the feature since qemu can access the device
> directly.

They actually need the feature, but it can be added later.

>> And finally... I assume there is one qemu-pr-manager (qemu.conf changes
>> eventually)... Eventually there's magic that allows/adds per domain
>> *and* per LUN some socket path. If libvirt provided it's generated via
>> the domain temporary directory; however, what's not really clear is how
>> that unmanaged path really works.  Need a virtual whiteboard...
> 
> So, in case of unmanaged path, here are the assumptions that my patches
> are built on:
> 
> 1) unmanaged helper process (UHP) is spawned by somebody else's than
> libvirtd (hence unmanaged) - it doesn't have to be user, it can be
> systemd for instance.
> 
> 2) path to UHP's socket has to be labeled correctly - libvirt doesn't
> touch that
> 
> 3) in future, when UHP dies, libvirt will NOT spawn it again. It's
> unmanaged after all. It's user/sysadmin responsibility to spawn it
> again.

Correct.

> Now, for the managed helper process (MHP) the assumptions are:
> 
> 1) there's one MHP per domain (all SCSI disks in the domain share the
> same MHP).
> 
> 2) the MHP runs as root, but is placed into the same CGroup, mount
> namespace as qemu process it serves
> 
> 3) MHP is lives and dies with the domain it is associated with.

Correct, with the caveat that QEMU must provide the MHP state and death
event for this to be complete.

Thanks,

Paolo

> The code might be complicated more than needed - it is prepared to have
> one MHP per disk rather than domain (should we ever need it). Therefore
> instead of storing one pid_t, we store them in a hash table where more
> can be stored.
> 
> Michal
> 




More information about the libvir-list mailing list