[Virtio-fs] (no subject)

German Maglione gmaglione at redhat.com
Tue Oct 10 10:04:18 UTC 2023


On Tue, Oct 10, 2023 at 4:57 AM Yajun Wu <yajunw at nvidia.com> wrote:
>
>
> On 10/9/2023 6:28 PM, German Maglione wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On Sat, Oct 7, 2023 at 4:23 AM Yajun Wu <yajunw at nvidia.com> wrote:
> >>
> >> On 10/6/2023 6:34 PM, Michael S. Tsirkin wrote:
> >>> External email: Use caution opening links or attachments
> >>>
> >>>
> >>> On Fri, Oct 06, 2023 at 11:47:55AM +0200, Hanna Czenczek wrote:
> >>>> On 06.10.23 11:26, Michael S. Tsirkin wrote:
> >>>>> On Fri, Oct 06, 2023 at 11:15:55AM +0200, Hanna Czenczek wrote:
> >>>>>> On 06.10.23 10:45, Michael S. Tsirkin wrote:
> >>>>>>> On Fri, Oct 06, 2023 at 09:48:14AM +0200, Hanna Czenczek wrote:
> >>>>>>>> On 05.10.23 19:15, Michael S. Tsirkin wrote:
> >>>>>>>>> On Thu, Oct 05, 2023 at 01:08:52PM -0400, Stefan Hajnoczi wrote:
> >>>>>>>>>> On Wed, Oct 04, 2023 at 02:58:57PM +0200, Hanna Czenczek wrote:
> >>>>>>>>>>> There is no clearly defined purpose for the virtio status byte in
> >>>>>>>>>>> vhost-user: For resetting, we already have RESET_DEVICE; and for virtio
> >>>>>>>>>>> feature negotiation, we have [GS]ET_FEATURES.  With the REPLY_ACK
> >>>>>>>>>>> protocol extension, it is possible for SET_FEATURES to return errors
> >>>>>>>>>>> (SET_PROTOCOL_FEATURES may be called before SET_FEATURES).
> >>>>>>>>>>>
> >>>>>>>>>>> As for implementations, SET_STATUS is not widely implemented.  dpdk does
> >>>>>>>>>>> implement it, but only uses it to signal feature negotiation failure.
> >>>>>>>>>>> While it does log reset requests (SET_STATUS 0) as such, it effectively
> >>>>>>>>>>> ignores them, in contrast to RESET_OWNER (which is deprecated, and today
> >>>>>>>>>>> means the same thing as RESET_DEVICE).
> >>>>>>>>>>>
> >>>>>>>>>>> While qemu superficially has support for [GS]ET_STATUS, it does not
> >>>>>>>>>>> forward the guest-set status byte, but instead just makes it up
> >>>>>>>>>>> internally, and actually completely ignores what the back-end returns,
> >>>>>>>>>>> only using it as the template for a subsequent SET_STATUS to add single
> >>>>>>>>>>> bits to it.  Notably, after setting FEATURES_OK, it never reads it back
> >>>>>>>>>>> to see whether the flag is still set, which is the only way in which
> >>>>>>>>>>> dpdk uses the status byte.
> >>>>>>>>>>>
> >>>>>>>>>>> As-is, no front-end or back-end can rely on the other side handling this
> >>>>>>>>>>> field in a useful manner, and it also provides no practical use over
> >>>>>>>>>>> other mechanisms the vhost-user protocol has, which are more clearly
> >>>>>>>>>>> defined.  Deprecate it.
> >>>>>>>>>>>
> >>>>>>>>>>> Suggested-by: Stefan Hajnoczi <stefanha at redhat.com>
> >>>>>>>>>>> Signed-off-by: Hanna Czenczek <hreitz at redhat.com>
> >>>>>>>>>>> ---
> >>>>>>>>>>>       docs/interop/vhost-user.rst | 28 +++++++++++++++++++++-------
> >>>>>>>>>>>       1 file changed, 21 insertions(+), 7 deletions(-)
> >>>>>>>>>> Reviewed-by: Stefan Hajnoczi <stefanha at redhat.com>
> >>>>>>>>> SET_STATUS is the only way to signal failure to acknowledge FEATURES_OK.
> >>>>>>>>> The fact current backends never check errors does not mean they never
> >>>>>>>>> will. So no, not applying this.
> >>>>>>>> Can this not be done with REPLY_ACK?  I.e., with the following message
> >>>>>>>> order:
> >>>>>>>>
> >>>>>>>> 1. GET_FEATURES to find out whether VHOST_USER_F_PROTOCOL_FEATURES is
> >>>>>>>> present
> >>>>>>>> 2. GET_PROTOCOL_FEATURES to hopefully get VHOST_USER_PROTOCOL_F_REPLY_ACK
> >>>>>>>> 3. SET_PROTOCOL_FEATURES to set VHOST_USER_PROTOCOL_F_REPLY_ACK
> >>>>>>>> 4. SET_FEATURES with need_reply
> >>>>>>>>
> >>>>>>>> If not, the problem is that qemu has sent SET_STATUS 0 for a while when the
> >>>>>>>> vCPUs are stopped, which generally seems to request a device reset.  If we
> >>>>>>>> don’t state at least that SET_STATUS 0 is to be ignored, back-ends that will
> >>>>>>>> implement SET_STATUS later may break with at least these qemu versions.  But
> >>>>>>>> documenting that a particular use of the status byte is to be ignored would
> >>>>>>>> be really strange.
> >>>>>>>>
> >>>>>>>> Hanna
> >>>>>>> Hmm I guess. Though just following virtio spec seems cleaner to me...
> >>>>>>> vhost-user reconfigures the state fully on start.
> >>>>>> Not the internal device state, though.  virtiofsd has internal state, and
> >>>>>> other devices like vhost-gpu back-ends would probably, too.
> >>>>>>
> >>>>>> Stefan has recently sent a series
> >>>>>> (https://lists.nongnu.org/archive/html/qemu-devel/2023-10/msg00709.html) to
> >>>>>> put the reset (RESET_DEVICE) into virtio_reset() (when we really need a
> >>>>>> reset).
> >>>>>>
> >>>>>> I really don’t like our current approach with the status byte. Following the
> >>>>>> virtio specification to me would mean that the guest directly controls this
> >>>>>> byte, which it does not.  qemu makes up values as it deems appropriate, and
> >>>>>> this includes sending a SET_STATUS 0 when the guest is just paused, i.e.
> >>>>>> when the guest really doesn’t want a device reset.
> >>>>>>
> >>>>>> That means that qemu does not treat this as a virtio device field (because
> >>>>>> that would mean exposing it to the guest driver), but instead treats it as
> >>>>>> part of the vhost(-user) protocol.  It doesn’t feel right to me that we use
> >>>>>> a virtio-defined feature for communication on the vhost level, i.e. between
> >>>>>> front-end and back-end, and not between guest driver and device.  I think
> >>>>>> all vhost-level protocol features should be fully defined in the vhost-user
> >>>>>> specification, which REPLY_ACK is.
> >>>>> Hmm that makes sense. Maybe we should have done what stefan's patch
> >>>>> is doing.
> >>>>>
> >>>>> Do look at the original commit that introduced it to understand why
> >>>>> it was added.
> >>>> I don’t understand why this was added to the stop/cont code, though.  If it
> >>>> is time consuming to make these changes, why are they done every time the VM
> >>>> is paused
> >>>> and resumed?  It makes sense that this would be done for the initial
> >>>> configuration (where a reset also wouldn’t hurt), but here it seems wrong.
> >>>>
> >>>> (To be clear, a reset in the stop/cont code is wrong, because it breaks
> >>>> stateful devices.)
> >>>>
> >>>> Also, note the newer commits 6f8be29ec17 and c3716f260bf.  The reset as
> >>>> originally introduced was wrong even for non-stateful devices, because it
> >>>> occurred before we fetched the state (vring indices) so we could restore it
> >>>> later.  I don’t know how 923b8921d21 was tested, but if the back-end used
> >>>> for testing implemented SET_STATUS 0 as a reset, it could not have survived
> >>>> either migration or a stop/cont in general, because the vring indices would
> >>>> have been reset to 0.
> >>>>
> >>>> What I’m saying is, 923b8921d21 introduced SET_STATUS calls that broke all
> >>>> devices that would implement them as per virtio spec, and even today it’s
> >>>> broken for stateful devices.  The mentioned performance issue is likely
> >>>> real, but we can’t address it by making up SET_STATUS calls that are wrong.
> >>>>
> >>>> I concede that I didn’t think about DRIVER_OK.  Personally, I would do all
> >>>> final configuration that would happen upon a DRIVER_OK once the first vring
> >>>> is started (i.e. receives a kick).  That has the added benefit of being
> >>>> asynchronous because it doesn’t block any vhost-user messages (which are
> >>>> synchronous, and thus block downtime).
> >>>>
> >>>> Hanna
> >>> For better or worse kick is per ring. It's out of spec to start rings
> >>> that were not kicked but I guess you could do configuration ...
> >>> Seems somewhat asymmetrical though.
> >>>
> >>> Let's wait until next week, hopefully Yajun Wu will answer.
> >> The main motivation of adding VHOST_USER_SET_STATUS is to let backend
> >> DPDK know
> >> when DRIVER_OK bit is valid. It's an indication of all VQ configuration
> >> has sent,
> >> otherwise DPDK has to rely on first queue pair is ready, then
> >> receiving/applying
> >> VQ configuration one by one.
> >>
> >> During live migration, configuring VQ one by one is very time consuming.
> >> For VIRTIO
> >> net vDPA, HW needs to know how many VQs are enabled to set
> >> RSS(Receive-Side Scaling).
> >>
> >> If you don’t want SET_STATUS message, backend can remove protocol
> >> feature bit
> >> VHOST_USER_PROTOCOL_F_STATUS.
> >> DPDK is ignoring SET_STATUS 0, but using GET_VRING_BASE to do device
> >> close/reset.
> > This is incorrect, resetting the device on GET_VRING_BASE breaks
> > the stop/cont. Since you don't want to reset the VQs on stop/cont.
> Sorry for the misunderstanding, dpdk vhost backend framework doesn't
> have RESET concept(only device level .dev_conf and .dev_close). On
> receiving DRIVER_OK does dev_conf, on receiving GET_VRING_BASE does
> dev_close. For every VM suspend/resume, dpdk issues dev_close then dev_conf.

(sorry I did not explain myself well)
I meant that resetting the VQs upon receiveng GET_VRING_BASE makes the
backend to fail if qemu continues after a "stop". I notice that in dpdk,
when it receives a GET_VRING_BASE[0], it calls 'vring_invalidate(dev, vq);'[1],
resetting the VQ[2], doing that is incorrect.

[0] https://github.com/DPDK/dpdk/blob/main/lib/vhost/vhost_user.c#L2135
[1] https://github.com/DPDK/dpdk/blob/main/lib/vhost/vhost_user.c#L2201
[2] https://github.com/DPDK/dpdk/blob/main/lib/vhost/vhost.c#L580

> >
> >> I'm not involved in discussion about adding SET_STATUS in Vhost
> >> protocol. This feature
> >> is essential for vDPA(same as vhost-vdpa implements VHOST_VDPA_SET_STATUS).
> >>
> >> Thanks,
> >> Yajun
> >>>>>> Now, we could hand full control of the status byte to the guest, and that
> >>>>>> would make me content.  But I feel like that doesn’t really work, because
> >>>>>> qemu needs to intercept the status byte anyway (it needs to know when there
> >>>>>> is a reset, probably wants to know when the device is configured, etc.), so
> >>>>>> I don’t think having the status byte in vhost-user really gains us much when
> >>>>>> qemu could translate status byte changes to/from other vhost-user commands.
> >>>>>>
> >>>>>> Hanna
> >>>>> well it intercepts it but I think it could pass it on unchanged.
> >>>>>
> >>>>>
> >>>>>>> I guess symmetry was the
> >>>>>>> point. So I don't see why SET_STATUS 0 has to be ignored.
> >>>>>>>
> >>>>>>>
> >>>>>>> SET_STATUS was introduced by:
> >>>>>>>
> >>>>>>> commit 923b8921d210763359e96246a58658ac0db6c645
> >>>>>>> Author: Yajun Wu <yajunw at nvidia.com>
> >>>>>>> Date:   Mon Oct 17 14:44:52 2022 +0800
> >>>>>>>
> >>>>>>>         vhost-user: Support vhost_dev_start
> >>>>>>>
> >>>>>>> CC the author.
> >>>>>>>
> >> _______________________________________________
> >> Virtio-fs mailing list
> >> Virtio-fs at redhat.com
> >> https://listman.redhat.com/mailman/listinfo/virtio-fs
> >
> >
> > --
> > German
> >
>


-- 
German



More information about the Virtio-fs mailing list