[Virtio-fs] (no subject)

Hanna Czenczek hreitz at redhat.com
Tue Oct 10 08:18:29 UTC 2023


On 10.10.23 06:00, Yajun Wu wrote:
>
> On 10/9/2023 5:13 PM, Hanna Czenczek wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 09.10.23 11:07, Hanna Czenczek wrote:
>>> On 09.10.23 10:21, Hanna Czenczek wrote:
>>>> On 07.10.23 04:22, Yajun Wu wrote:
>>> [...]
>>>
>>>>> The main motivation of adding VHOST_USER_SET_STATUS is to let
>>>>> backend DPDK know
>>>>> when DRIVER_OK bit is valid. It's an indication of all VQ
>>>>> configuration has sent,
>>>>> otherwise DPDK has to rely on first queue pair is ready, then
>>>>> receiving/applying
>>>>> VQ configuration one by one.
>>>>>
>>>>> During live migration, configuring VQ one by one is very time
>>>>> consuming.
>>>> One question I have here is why it wasn’t then introduced in the live
>>>> migration code, but in the general VM stop/cont code instead. It does
>>>> seem time-consuming to do this every time the VM is paused and 
>>>> resumed.
>
> Yes, VM stop/cont will call vhost_net_stop/vhost_net_start. Maybe 
> because there's no device level stop/cont vhost message?

No, it is because qemu will reset the status in stop/cont*, which it 
should not do.  Aside from guest-initiated resets, the only thing where 
a reset comes into play is when the back-end is changed, e.g. during 
migration.  In that case, the source back-end will see a disconnect on 
the vhost-user socket and can then do whatever uninitialization it needs 
to do, and the destination front-end will need to be reconfigured by 
qemu anyway, because it’s just a case of the destination qemu initiating 
a fresh connection to a new back-end (except that it will need to 
restore the state from the source).

*Yes, technically, dpdk will ignore that reset, but it still stops the 
device on a different message (when it should just pause processing 
vrings), so the outcome is the same.

>>>>
>>>>> For VIRTIO
>>>>> net vDPA, HW needs to know how many VQs are enabled to set
>>>>> RSS(Receive-Side Scaling).
>>>>>
>>>>> If you don’t want SET_STATUS message, backend can remove protocol
>>>>> feature bit
>>>>> VHOST_USER_PROTOCOL_F_STATUS.
>>>> The problem isn’t back-ends that don’t want the message, the problem
>>>> is that qemu uses the message wrongly, which prevents well-behaving
>>>> back-ends from implementing the message.
>>>>
>>>>> DPDK is ignoring SET_STATUS 0, but using GET_VRING_BASE to do device
>>>>> close/reset.
>>>> So the right thing to do for back-ends is to announce STATUS support
>>>> and then not implement it correctly?
>>>>
>>>> GET_VRING_BASE should not reset the close or reset the device, by the
>>>> way.  It should stop that one vring, not more.  We have a
>>>> RESET_DEVICE command for resetting.
> I believe dpdk uses GET_VRING_BASE long before qemu has RESET_DEVICE? 

I don’t think it matters who came first.  What matters is the 
specification, and that dpdk decided to rely on implementation-specific 
behavior without having all involved parties agree by matters of putting 
that in the specification.  And now dpdk clearly deviates from the 
specification as a result of that action, which can result in problems 
if the front-end doesn’t do what qemu always used to do.  (E.g. the 
front-end might just send GET_VRING_BASE for all vrings when suspending 
the guest, and then only send kicks on resume to re-start the vrings.  
dpdk would most likely be left in a state where the whole device is 
stopped, expecting DRIVER_OK.  Same thing in general for front-ends that 
don’t support F_STATUS.)

> It's a compatible issue. For new backend implements, we can have 
> better solution, right?

The fact that dpdk and qemu deviate from the specification is a problem 
as-is.

>>>>> I'm not involved in discussion about adding SET_STATUS in Vhost
>>>>> protocol. This feature
>>>>> is essential for vDPA(same as vhost-vdpa implements
>>>>> VHOST_VDPA_SET_STATUS).
>>>> So from what I gather from your response is that there is only a
>>>> single use for SET_STATUS, which is the DRIVER_OK bit.  If so,
>>>> documenting that all other bits are to be ignored by both back-end
>>>> and front-end would be fine by me.
>>>>
>>>> I’m not fully serious about that suggestion, but I hear the strong
>>>> implication that nothing but DRIVER_OK was of any concern, and this
>>>> is really important to note when we talk about the status of the
>>>> STATUS feature in vhost today.  It seems to me now that it was not
>>>> intended to be the virtio-level status byte, but just a DRIVER_OK
>>>> signalling path from front-end to back-end.  That makes it a
>>>> vhost-level protocol feature to me.
>>> On second thought, it just is a pure vhost-level protocol feature, and
>>> has nothing to do with the virtio status byte as-is.  The only stated
>>> purpose is for the front-end to send DRIVER_OK after migration, but
>>> migration is transparent to the guest, so the guest would never change
>>> the status byte during migration.  Therefore, if this feature is
>>> essential, we will never be able to have a status byte that is
>>> transparently shared between guest and back-end device, i.e. the
>>> virtio status byte.
>> On third thought, scratch that.  The guest wouldn’t set it, but
>> naturally, after migration, the front-end will need to restore the
>> status byte from the source, so the front-end will always need to set
>> it, even if it were otherwise used controlled only by the guest and the
>> back-end device.  So technically, this doesn’t prevent such a use case.
>> (In practice, it isn’t controlled by the guest right now, but that could
>> be fixed.)
> I only tested the feature with DPDK(the only backend use it today?). 
> Max defined the protocol and added the corresponding code in DPDK 
> before I added QEMU support. If other backend or different device type 
> want to use this, we can have further discussion?

So as far as I understand, the feature is supposed to rely on 
implementation-specific behavior between specifically qemu as a 
front-end and dpdk as a back-end, nothing else.  Honestly, that to me is 
a very good reason to deprecate it.  That would make it clear that any 
implementation that implements it does so because it relies on 
implementation-specific behavior from other implementations.

Option 2 is to fix it.  It is not right to use this broadly defined 
feature with its clear protocol as given in the virtio specification 
just to set and clear a single bit (DRIVER_OK).  The vhost-user 
specification points to that virtio protocol.  We must adhere to the 
protocol.  And note that we must not reset devices just because the VM 
is paused/resumed.  (That is why I wanted to deprecate SET_STATUS, so 
that Stefan’s series would introduce RESET_DEVICE where we need it, and 
we can (for now) ignore the SET_STATUS 0 in vhost_dev_stop().)

Option 3 would be to just be honest in the specification, and limit the 
scope of F_STATUS to say the only bit that matters is DRIVER_OK.  I 
would say this is not really different from deprecating, though it 
wouldn’t affect your case.  However, I understand Alex relies on a full 
status byte.  I’m still interested to know why that is.

Option 4 is of course not to do anything, and leave everything as-is, 
waiting for the next person to stir the hornet’s nest.

>>> Cc-ing Alex on this mail, because to me, this seems like an important
>>> detail when he plans on using the byte in the future. If we need a
>>> virtio status byte, I can’t see how we could use the existing F_STATUS
>>> for it.
>>>
>>> Hanna
>



More information about the Virtio-fs mailing list