[edk2-devel] [PATCH v2 15/17] OvmfPkg/PvScsiDxe: Support sending SCSI request and receive response

Laszlo Ersek lersek at redhat.com
Fri Mar 27 21:05:17 UTC 2020


On 03/27/20 14:04, Liran Alon wrote:
> 
> On 27/03/2020 14:26, Laszlo Ersek wrote:
>> On 03/25/20 17:10, Liran Alon wrote:
>>> +/**
>>> +  Returns if PVSCSI request ring is full
>>> +**/
>>> +STATIC
>>> +BOOLEAN
>>> +PvScsiIsReqRingFull (
>>> +  IN CONST PVSCSI_DEV   *Dev
>>> +  )
>>> +{
>>> +  PVSCSI_RINGS_STATE *RingsState;
>>> +  UINT32             ReqNumEntries;
>>> +
>>> +  RingsState = Dev->RingDesc.RingState;
>>> +  ReqNumEntries = 1U << RingsState->ReqNumEntriesLog2;
>>> +  return (RingsState->ReqProdIdx - RingsState->CmpConsIdx) >=
>>> ReqNumEntries;
>>> +}
>> (Just some thoughts, not a request for changing the code.)
>>
>> Normally I prefer accessing buffers shared with the device though
>> volatile-qualified  pointers.
>>
>> Meaning, in this case, that every "PCI host" pointer (i.e., each pointer
>> that is associated with a PVSCSI_DMA_DESC) would have to be
>> volatile-qualified. In particular:
>>
>> - in patch#13, PVSCSI_RING_DESC would have to be updated like this:
>>
>>> typedef struct {
>>>    volatile PVSCSI_RINGS_STATE   *RingState;
>>>    PVSCSI_DMA_DESC               RingStateDmaDesc;
>>>
>>>    volatile PVSCSI_RING_REQ_DESC *RingReqs;
>>>    PVSCSI_DMA_DESC               RingReqsDmaDesc;
>>>
>>>    volatile PVSCSI_RING_CMP_DESC *RingCmps;
>>>    PVSCSI_DMA_DESC               RingCmpsDmaDesc;
>>> } PVSCSI_RING_DESC;
>> - in patch#14, PVSCSI_DEV would change as follows:
>>
>>> typedef struct {
>>>    UINT32                          Signature;
>>>    EFI_PCI_IO_PROTOCOL             *PciIo;
>>>    EFI_EVENT                       ExitBoot;
>>>    UINT64                          OriginalPciAttributes;
>>>    PVSCSI_RING_DESC                RingDesc;
>>>    volatile PVSCSI_DMA_BUFFER      *DmaBuf;
>>>    PVSCSI_DMA_DESC                 DmaBufDmaDesc;
>>>    UINT8                           MaxTarget;
>>>    UINT8                           MaxLun;
>>>    UINTN                           WaitForCmpStallInUsecs;
>>>    EFI_EXT_SCSI_PASS_THRU_PROTOCOL PassThru;
>>>    EFI_EXT_SCSI_PASS_THRU_MODE     PassThruMode;
>>> } PVSCSI_DEV;
>> After these changes, the compiler would (justifiedly) flag a bunch of
>> code locations casting away the volatile qualification -- for example,
>> in the above function, in the assignment to the "RingsState" local
>> variable.
>>
>> Clearly, most of these compilation errors would have to be fixed (not
>> suppressed), because they would be valid. Meaning:
>>
>> - you'd have to volatile-qualify the "RingsState" local variable in all
>>    of PvScsiIsReqRingFull(), PvScsiGetCurrentRequest(),
>>    PvScsiWaitForRequestCompletion();
>>
>> - you'd also have to volatile-qualify the return types of
>>    PvScsiGetCurrentRequest() and PvScsiWaitForRequestCompletion();
>>
>> - you'd have to update PopulateRequest() and HandleResponse() too; and
>>    the most annoying part of that would be that you could no longer use
>>    CopyMem() and ZeroMem() -- because those functions take
>>    pointer-to-void parameters, rather than pointer-to-volatile-void ones.
>>
>> (FWIW, we wouldn't have to change the PvScsiFreeSharedPages() prototype
>> -- it would be OK to cast away volatile in those calls, as we wouldn't
>> dereference the pointers in that case.)
>>
>> So... the reason I'm not actually requesting these
>> volatile-qualifications is that (a) your use of MemoryFence() seems
>> mostly OK, and (b) the UEFI Driver Writer's guide recommends *either*
>> volatile *or* MemoryFence(). Of course using both techniques at the same
>> time is not a problem -- and in code I write I actually like to use both
>> at the same time --, but just one suffices too. (See section 4.2.6
>> "Memory ordering" in the DWG.)
>>
>> The reason I'm writing this up here is because I want the "record" (the
>> mailing list archive) to show that we have considered this topic
>> explicitly.

> I prefer to remain with only memory fences if that's OK by you.

Yes, that's fine.

> As the code is written now.
> As it's allows for potential compiler optimization and leads to more
> readable code in my opinion.

The UEFI Driver Writer's Guide makes the same argument -- it favors
explicit MemoryFence()s over volatile. So your suggestion is entirely
valid and I agree with it.

>> Back to your patch:
>>
>> On 03/25/20 17:10, Liran Alon wrote:
>>> +  //
>>> +  // This cast is safe as MaxLun is defined as UINT8
>>> +  //
>>> +  Request->Lun[1] = (UINT8)Lun;
>>> +  Request->SenseLen = Packet->SenseDataLength;
>> Ah, *now* I understand why you chose MAX_UINT8 as the size of
>> "PVSCSI_DMA_BUFFER.SenseData". Because, "Packet->SenseDataLength" has
>> type UINT8, and this way you guarantee that the SCSI client's
>> "Packet->SenseDataLength" will always fit in the DMA buffer.
>>
>> Good solution, but it *absolutely* needs to be documented in patch#14
>> ("OvmfPkg/PvScsiDxe: Introduce DMA communication buffer") -- in fact,
>> see my question (4) under patch#14.

> Please read the response I have written you to your patch#14 review.
> Where I suggest we define a constant in IndustryStandard/Scsi.h for the
> limit of the total length of SenseData that is defined to be 252
> according to SCSI specification.

MdePkg macro is good, but it should be decoupled from this series.

>>
>> (2) Also, please add a comment here that a "Dev->DmaBuf->SenseData"
>> overflow is not possible due to "Packet->SenseDataLength" having type
>> UINT8.
>>
>> This would be a comment in the same vein as the "MaxLun" reference just
>> above -- I find *that* comment very helpful, too.
> OK.
>>> +
>>> +  return EFI_SUCCESS;
>>> +}
>>> +
>>> +/**
>>> +  Handle the PVSCSI device response:
>>> +  - Copy returned data from DMA communication buffer.
>>> +  - Update fields in Extended SCSI Pass Thru Protocol packet as
>>> required.
>>> +  - Translate response code to EFI status code and host adapter status.
>>> +**/
>>> +STATIC
>>> +EFI_STATUS
>>> +HandleResponse (
>>> +  IN PVSCSI_DEV                                     *Dev,
>>> +  IN OUT EFI_EXT_SCSI_PASS_THRU_SCSI_REQUEST_PACKET *Packet,
>>> +  IN CONST PVSCSI_RING_CMP_DESC                     *Response
>>> +  )
>>> +{
>>> +  //
>>> +  // Check if device returned sense data
>>> +  //
>>> +  if (Response->ScsiStatus ==
>>> EFI_EXT_SCSI_STATUS_TARGET_CHECK_CONDITION) {
>>> +    //
>>> +    // Fix SenseDataLength to amount of data returned
>>> +    //
>>> +    if (Packet->SenseDataLength > Response->SenseLen) {
>>> +      Packet->SenseDataLength = (UINT8)Response->SenseLen;
>>> +    }
>>> +    //
>>> +    // Copy sense data from DMA communication buffer
>>> +    //
>>> +    CopyMem (
>>> +      Packet->SenseData,
>>> +      Dev->DmaBuf->SenseData,
>>> +      Packet->SenseDataLength
>>> +      );
>>> +  } else {
>>> +    //
>>> +    // Signal no sense data returned
>>> +    //
>>> +    Packet->SenseDataLength = 0;
>>> +  }
>>> +
>>> +  //
>>> +  // Copy device output from DMA communication buffer
>>> +  //
>>> +  if (Packet->DataDirection == EFI_EXT_SCSI_DATA_DIRECTION_READ) {
>>> +    CopyMem (Packet->InDataBuffer, Dev->DmaBuf->Data,
>>> Packet->InTransferLength);
>>> +  }
>> I'm unfamilar with the PVSCSI device model, but I think this is not
>> general enough. The "PVSCSI_RING_CMP_DESC.DataLen" field suggests that
>> short reads are possible at least in theory.
>>
>> (5) If a short read occurs (Response->DataLen <
>> Packet->InTransferLength), then we should adjust
>> "Packet->InTransferLength", and also copy that many bytes only.
>>
>> (6) I think it would be prudent to update "Packet->OutTransferLength"
>> too, for short writes.
> As you can see below, this is done in case device return
> Response->HostStatus as either PvScsiBtStatDatarun or
> PvScsiBtStatDataUnderrun.
>>
>>> +
>>> +  //
>>> +  // Report target status
>>> +  //
>>> +  Packet->TargetStatus = Response->ScsiStatus;
>>> +
>>> +  //
>>> +  // Host adapter status and function return value depend on
>>> +  // device response's host status
>>> +  //
>>> +  switch (Response->HostStatus) {
>>> +    case PvScsiBtStatSuccess:
>>> +    case PvScsiBtStatLinkedCommandCompleted:
>>> +    case PvScsiBtStatLinkedCommandCompletedWithFlag:
>>> +      Packet->HostAdapterStatus = EFI_EXT_SCSI_STATUS_HOST_ADAPTER_OK;
>>> +      return EFI_SUCCESS;
>>> +
>>> +    case PvScsiBtStatSelTimeout:
>>> +      Packet->HostAdapterStatus =
>>> +                EFI_EXT_SCSI_STATUS_HOST_ADAPTER_SELECTION_TIMEOUT;
>>> +      return EFI_TIMEOUT;
>>> +
>>> +    case PvScsiBtStatDatarun:
>>> +    case :
>>> +      //
>>> +      // Report residual data in overrun/underrun
>>> +      //
>>> +      if (Packet->DataDirection == EFI_EXT_SCSI_DATA_DIRECTION_READ) {
>>> +        Packet->InTransferLength = Response->DataLen;
>>> +      } else {
>>> +        Packet->OutTransferLength = Response->DataLen;
>>> +      }
>> OK, if we are sure that (a) the device will always report short
>> reads/writes like this, and that (b) the above assignments will never
>> cause InTransferLength / OutTransferLength to *grow*, then the
>> InTransferLength / OutTransferLength adjustments are sufficiently
>> covered.
> I believe both of these are indeed true.
> Even though that current QEMU VMware PVSCSI device emulation code have a
> bug that it never sets this in pvscsi_command_complete() when it does
> set BTSTAT_DATARUN...
>> Still:
>>
>> (8) The CopyMem() call above should not copy garbage (at the tail).
> I don't think it matters. We don't guarantee anything on the content in
> Packet->InDataBuffer beyond Packet->InTransferLength.
> I think the code is simpler how it is currently written.

I'm not convinced, but this is not a question I feel very strongly
about. I OK to go with your preference.

>>
>> Honestly, *if* the PVSCSI device model always sets "Response->DataLen",
> I don't think this is the case.
>> then I would prefer if:
>>
>> - we always updated InTransferLength / OutTransferLength (regardless of
>> "Response->HostStatus"),
>>
>> - and we only used these case labels (PvScsiBtStatDatarun /
>> PvScsiBtStatDataUnderrun) for setting "Packet->HostAdapterStatus".
>>
>>> +      Packet->HostAdapterStatus =
>>> +                EFI_EXT_SCSI_STATUS_HOST_ADAPTER_DATA_OVERRUN_UNDERRUN;
>>> +      return EFI_BAD_BUFFER_SIZE;
>> I think EFI_BAD_BUFFER_SIZE is invalid here. According to the UEFI spec,
>> EFI_BAD_BUFFER_SIZE means "The SCSI Request Packet was not executed".
>> But that's not the case here -- we do have a partially completed
>> transfer.
> 
> Hmm... According to the documentation above EFI_SCSI_PASS_THRU_PASSTHRU
> in MdePkg/Include/Protocol/ScsiPassThru.h:
> 
>   @retval EFI_BAD_BUFFER_SIZE       The SCSI Request Packet was
> executed, but the
>                                     entire DataBuffer could not be
> transferred.
>                                     The actual number of bytes
> transferred is returned
>                                     in TransferLength. See
> HostAdapterStatus,
>                                     TargetStatus, SenseDataLength, and
> SenseData in
>                                     that order for additional status
> information.
> 
> So I don't know who to believe... It does seem to me that this
> documentation in the code makes more sense
> and then my current code is correct. What do you think?

You are looking at the wrong protocol header file. The top of this
header file bears the comment

  SCSI Pass Through protocol as defined in EFI 1.1.

and the UEFI-2.8 spec does not define EFI_SCSI_PASS_THRU_PROTOCOL; it
only refers to Mantis ticket 845
<https://mantis.uefi.org/mantis/view.php?id=845> with subject
"EFI_SCSI_PASS_THRU_PROTOCOL replacement".

Instead, please consult EFI_EXT_SCSI_PASS_THRU_PASSTHRU in
"MdePkg/Include/Protocol/ScsiPassThruExt.h". There, the
EFI_BAD_BUFFER_SIZE return value conforms to the UEFI 2.8 spec ("The
SCSI Request Packet was not executed").

> 
>>
>> (9) Thus I feel we should use a "break" here.
>>
>>> +
>>> +    case PvScsiBtStatBusFree:
>>> +      Packet->HostAdapterStatus =
>>> EFI_EXT_SCSI_STATUS_HOST_ADAPTER_BUS_FREE;
>>> +      break;
>>> +
>>> +    case PvScsiBtStatInvPhase:
>>> +      Packet->HostAdapterStatus =
>>> EFI_EXT_SCSI_STATUS_HOST_ADAPTER_PHASE_ERROR;
>>> +      break;
>>> +
>>> +    case PvScsiBtStatSensFailed:
>>> +      Packet->HostAdapterStatus =
>>> +                EFI_EXT_SCSI_STATUS_HOST_ADAPTER_REQUEST_SENSE_FAILED;
>>> +      break;
>>> +
>>> +    case PvScsiBtStatTagReject:
>>> +    case PvScsiBtStatBadMsg:
>>> +      Packet->HostAdapterStatus =
>>> +          EFI_EXT_SCSI_STATUS_HOST_ADAPTER_MESSAGE_REJECT;
>>> +      break;
>>> +
>>> +    case PvScsiBtStatBusReset:
>>> +      Packet->HostAdapterStatus =
>>> EFI_EXT_SCSI_STATUS_HOST_ADAPTER_BUS_RESET;
>>> +      break;
>>> +
>>> +    case PvScsiBtStatHaTimeout:
>>> +      Packet->HostAdapterStatus =
>>> EFI_EXT_SCSI_STATUS_HOST_ADAPTER_TIMEOUT;
>>> +      return EFI_TIMEOUT;
>>> +
>>> +    case PvScsiBtStatScsiParity:
>>> +      Packet->HostAdapterStatus =
>>> EFI_EXT_SCSI_STATUS_HOST_ADAPTER_PARITY_ERROR;
>>> +      break;
>>> +
>>> +    default:
>>> +      Packet->HostAdapterStatus =
>>> EFI_EXT_SCSI_STATUS_HOST_ADAPTER_OTHER;
>>> +      break;
>>> +  }
>>> +
>>> +  return EFI_DEVICE_ERROR;
>>> +}
>>> +
>>>
>>>   //
>>>   // Ext SCSI Pass Thru
>>>   //
>>> @@ -144,7 +528,62 @@ PvScsiPassThru (
>>>     IN EFI_EVENT                                      Event    OPTIONAL
>>>     )
>>>   {
>>> -  return EFI_UNSUPPORTED;
>>> +  PVSCSI_DEV            *Dev;
>>> +  EFI_STATUS            Status;
>>> +  PVSCSI_RING_REQ_DESC *Request;
>>> +  PVSCSI_RING_CMP_DESC *Response;
>>> +
>>> +  Dev = PVSCSI_FROM_PASS_THRU (This);
>>> +
>>> +  if (PvScsiIsReqRingFull (Dev)) {
>>> +    return EFI_NOT_READY;
>>> +  }
>>> +
>>> +  Request = PvScsiGetCurrentRequest (Dev);
>>> +
>>> +  Status = PopulateRequest (Dev, Target, Lun, Packet, Request);
>>> +  if (EFI_ERROR (Status)) {
>>> +    return Status;
>>> +  }
>>> +
>>> +  //
>>> +  // Writes to Request must be globally visible before making request
>>> +  // available to device
>>> +  //
>>> +  MemoryFence ();
>>> +  Dev->RingDesc.RingState->ReqProdIdx++;
>>> +
>> (10) Please insert another MemoryFence () here.
> 
> That would be unnecessary and wrong.
> 
> The MemoryFence() here is used to make sure the request is globally
> visible before the update to the producer-index.

I agree.

> As in any
> circular-buffer implementation.
> There is no need for an additional MemoryFence() here.
> 
> Note that the MMIO access below is guaranteed to be globally visible
> only after the write to the producer-index.

Yes, that was the goal of my suggestion. What guarantees it?

> If EDK2 MMIO accessors wouldn't have guaranteed this, you would have a
> very broken code base...
> Similar to why Linux MMIO accessors (e.g. writel()) macros guarantee these.
> 
> For example, see how MdePkg/Library/BaseIoLibIntrinsic/IoLib.c
> MmioWrite32() internally calls MemoryFence() before and after MMIO
> access itself.

So basically you are saying that I proposed the right thing, except
there is no need to spell it out here, because the MMIO accessor
primitives already cover that internally :)

I admit that I have not been aware of the internal fences!

(And given that there is a specific commit in the git history to push
the fences into the source file you mention, namely 9de780dcd6208, I do
think my suggestion was not "wrong", only unnecessary.)

I do agree that the MemoryFence() need not be added in this spot. Thanks
for making me aware of the internal fences!

> 
>>
>>> +  Status = PvScsiMmioWrite32 (Dev, PvScsiRegOffsetKickRwIo, 0);
>>> +  if (EFI_ERROR (Status)) {
>>> +    //
>>> +    // If kicking the host fails, we must fake a host adapter error.
>>> +    // EFI_NOT_READY would save us the effort, but it would also
>>> suggest that
>>> +    // the caller retry.
>>> +    //
>>> +    return ReportHostAdapterError (Packet);
>>> +  }
>>> +
>>> +  Status = PvScsiWaitForRequestCompletion (Dev);
>>> +  if (EFI_ERROR (Status)) {
>>> +    //
>>> +    // If waiting for request completion fails, we must fake a host
>>> adapter
>>> +    // error. EFI_NOT_READY would save us the effort, but it would
>>> also suggest
>>> +    // that the caller retry.
>>> +    //
>>> +    return ReportHostAdapterError (Packet);
>>> +  }
>>> +
>> (11) Please insert a MemoryFence() here.
> 
> Why is a MemoryFence() needed here? I don't think that's true.
> 
> PvScsiWaitForRequestCompletion() ends with an MMIO write which is
> guaranteed to be a memory fence.

Yes, I see that now. My point was that a fence needed to *occur* here. I
didn't realize it was already covered, internally.

> Thus, there is no need for a MemoryFence() here (to serve as a rmb()) to
> make sure the completion-descriptor is globally visible.

Agreed.

Thanks,
Laszlo


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.

View/Reply Online (#56509): https://edk2.groups.io/g/devel/message/56509
Mute This Topic: https://groups.io/mt/72544127/1813853
Group Owner: devel+owner at edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub  [edk2-devel-archive at redhat.com]
-=-=-=-=-=-=-=-=-=-=-=-





More information about the edk2-devel-archive mailing list