[edk2-devel] [PATCH V1 1/1] OvmfPkg/AcpiPlatformDxe: Add back log and fix code cohesion

Laszlo Ersek lersek at redhat.com
Thu Jan 5 09:09:24 UTC 2023


On 1/4/23 15:13, Min Xu wrote:
> From: Min M Xu <min.m.xu at intel.com>
>
> BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=4237
>
> Commit 9fdc70af6ba8 breaks log analysis and code cohesion in
> AcpiPlatformDxe. See the detailed information in BZ#4237.
>
> There are below changes in this patch:
> 1) Add back debug log
> 2) Add error checking and handling if InstallProtocolInterface failed.
> 3) Delete the QemuAcpiTableNotify.h because it is superfluous.
> 4) Delete the global variable "mQemuAcpiHandle" instead of a local
>    variable.
> 5) Install the QEMU_ACPI_TABLE_NOTIFY_PROTOCOL with a NULL associated
>    interface.
>
> Above changes 2/4/5 are applied in both CloudHvAcpi.c and QemuFwCfgAcpi.c.
>
> Cc: Laszlo Ersek <lersek at redhat.com>
> Cc: Erdem Aktas <erdemaktas at google.com>
> Cc: James Bottomley <jejb at linux.ibm.com>
> Cc: Jiewen Yao <jiewen.yao at intel.com>
> Cc: Gerd Hoffmann <kraxel at redhat.com>
> Cc: Tom Lendacky <thomas.lendacky at amd.com>
> Cc: Sebastien Boeuf <sebastien.boeuf at intel.com>
> Reported-by: Laszlo Ersek <lersek at redhat.com>
> Signed-off-by: Min Xu <min.m.xu at intel.com>
> ---
>  OvmfPkg/AcpiPlatformDxe/CloudHvAcpi.c         | 25 +++++------
>  OvmfPkg/AcpiPlatformDxe/QemuFwCfgAcpi.c       | 42 +++++++++++++------
>  .../Include/Protocol/QemuAcpiTableNotify.h    | 27 ------------
>  3 files changed, 42 insertions(+), 52 deletions(-)
>  delete mode 100644 OvmfPkg/Include/Protocol/QemuAcpiTableNotify.h

First of all, I wouldn't like to make any comments on the
"CloudHvAcpi.c" source file. I wouldn't like my Reviewed-by to cover any
non-trivial change made to "CloudHvAcpi.c".

Therefore, I suggest that this patch be split up to a series of small
patches.

The first patch should, in my opinion, just remove the
QEMU_ACPI_TABLE_NOTIFY_PROTOCOL structure -- that is, "step 5". This
should be a trivial change to both QemuFwCfgAcpi.c and CloudHvAcpi.c,
and I'm comfortable reviewing that for both of these source files. This
patch can also cover "step 3", the removal of "QemuAcpiTableNotify.h".

Patches #2 and #3 should cover "step 4", *split* to separate patches
between CloudHvAcpi.c and QemuFwCfgAcpi.c. I would only like to review
the QemuFwCfgAcpi.c change, from these.

So then we are left with steps 1 and 2.

Patch #4 should just cover step 1, add back the debug log. This only
affects QemuFwCfgAcpi.c, which I'd like to review.

For step 2, patches #5 and #6 should add the missing error handling, and
move the protocol installation to the right spot, and extend the error
path accordingly. This must *definitely* be seprate for QemuFwCfgAcpi.c
and CloudHvAcpi.c, hence my request for splitting step 2 to two patches
(#5, #6). I won't look at the CloudHvAcpi.c, but will scrutinize the
heck out of QemuFwCfgAcpi.c.

--*--

I can already say that your current change for step 2 in
"QemuFwCfgAcpi.c" is wrong, or at least incomplete.

Please consider the following pattern:

It is frequently the case that (a) a function needs to create a set of
*permanent* resources (objects that are effectively the function's
"output"), which are supposed to be published if the functon entirely
succeeds, while at the same time (b) the function needs some *temporary*
resources for the creation of the permanent resource set. The
"permanent" resources must be preserved (= output to the caller, or
"committed" to the system as a side effect) if and only if all steps
succeed, whereas the "temporary" resources must be torn down (or rolled
back) *unconditionally*.

In InstallQemuFwCfgTables(), we have the following *temporary*
resources:

(1.1) LoaderStart -- the QEMU linker/loader script downloaded from
      fw_cfg

(1.2) AllocationsRestrictedTo32Bit -- a dictionary of fw_cfg blobs that
      must not be allocated from 64-bit address space

(1.3) Tracker -- a dictionary collecting the fw_cfg blobs that we have
      thus far downloaded from QEMU, as instructed by the linker/loader
      script

(1.4) InstalledKey -- an array collecting the keys of the ACPI tables
      we've thus far installed with the ACPI Table Protocol.

(1.5) SeenPointers -- a dictionary for one of the command types in the
      linker/loader script, ensuring that we don't attempt to install
      referenced tables more than once

And we have the following *permanent* resources (committed to the system
as a side-effect)

(2.1) S3Context -- effectively an S3 Boot Script representation. Some
      operations in the ACPI linker/loader script of QEMU -- namely, the
      "write pointer" commands -- convey information from the guest
      *back* to QEMU, over fw_cfg, and these must be replayed during S3
      resume. The system reset that occurs between suspend and resume
      clears out a bunch of information from QEMU, and these operations
      re-teach a set of information to QEMU.

(2.2) the *current runtime effect* of the same "write pointer" commands
      as discussed above. S3Context is for S3 resume, but the same
      commands must take primarily *right now*.

(2.3) the set of ACPI tables we've installed

Now, if you look at InstallQemuFwCfgTables() *before* your original
change (9fdc70af6ba8), you will see that:

(a) the publication of S3Context (permanent resource 2.1) as an S3 Boot
    Script is the *final* step in the function. Due to various reasons,
    rolling back this step is not possible. Therefore no action that
    could fail must ever be performed in the function after this action.

(b) The error path is constructed with *multiple labels* in such a way
    that the order of rollback operations is the *reverse order* of
    constructions, intermixing both temporary and permanent resources.
    When a constructive operation fails, a jump is taken such that the
    rollback steps for all constructive operations *not yet reached* are
    skipped, and the rollbacks for the constructions that *have*
    succeeded are all performed. We can call this a "cascade" of error
    handling labels.

(c) Note that in some (infrequent!) cases, *multiple* resources can be
    released under the same error handling label, but that is generally
    the exception. It is only possible if the nature of some
    construction operations is such that they cannot fail separately
    (but still need tear-down), or some of the resources are data sets
    which can be genuinely empty, and whose tear-down needs no
    separation between empty vs. non-empty.

    You can see this with the uninstallation of the ACPI tables we've
    installed (permanent resource 2.3), which is grouped together under
    the same "UninstallAcpiTables" label with the releasing of the
    "SeenPointers" dictionary (temporary resource 1.5).

    There is no grand theory behind this; it simply mirrors the
    construction path:

>   SeenPointers = OrderedCollectionInit (PointerCompare, PointerCompare);
>   if (SeenPointers == NULL) {
>     Status = EFI_OUT_OF_RESOURCES;
>     goto FreeKeys;
>   }
>
>   //
>   // second pass: identify and install ACPI tables
>   //
>   Installed = 0;
>   for (LoaderEntry = LoaderStart; LoaderEntry < LoaderEnd; ++LoaderEntry) {
>     if (LoaderEntry->Type == QemuLoaderCmdAddPointer) {
>       Status = Process2ndPassCmdAddPointer (
>                  &LoaderEntry->Command.AddPointer,
>                  Tracker,
>                  AcpiProtocol,
>                  InstalledKey,
>                  &Installed,
>                  SeenPointers
>                  );
>       if (EFI_ERROR (Status)) {
>         goto UninstallAcpiTables;
>       }
>     }
>   }

    If "SeenPointers" (a dictionary) is initialized successfully, but
    *any* ACPI table installation fails in the loop later, then we must
    tear down "SeenPointers" (regardless of how many entries it
    contains) *and* uninstall all previously installed ACPI tables as
    well (using the keys we've put into InstalledKey -- their number is
    given by Installed).

(d) Very importantly, because we have both permanent and temporary
    resources, we must fall through the error path at the end of the
    function even if the function as a whole succeeds. Therefore the
    tear-down steps on the error path for the *permanent* resources are
    all *restricted* to "Status" differing from EFI_SUCCESS. In other
    words, we go through the entire cascade in this ("grand success")
    case, but omit those rollback steps that would affect our permanent
    resources, which are now all done. Only the temporaries are torn
    down.

    At the very end, we have "return Status", which is consistent with
    the following two requirements (which the function does satisfy):

    - any time we perform a "goto" to an error handling label, Status
      must be set to an error value, either manually, or by the
      operation that just failed (and because of which we're taking the
      goto)

    - by the time we reach the first error handling label via *normal
      progression* (no goto), "Status" must be set to EFI_SUCCESS. This
      is achievable in two ways: one, set Status to EFI_SUCCESS
      explicitly, right there; or two: prove that, by that point, Status
      will have been set to EFI_SUCCESS. In this function, the latter is
      the case.

--*--

Okay. After this brief introduction, consider what you're doing with the
protocol installation:

- the protocol in the protocol database is a new permanent resource (a
  system-wide side-effect)

- installing the protocol is an operation that can fail

- protocol installation *can* be rolled back (I discussed the TPL
  concerns around this in the Bugzilla, but I'm not going to repeat them
  here). This fact -- i.e. that the protocol installation *can* be
  rolled back -- is really nice BTW, because we already have one
  operation in the function that *cannot*. We can't have *two* final
  steps in the function, so it's just as well that the protocol
  installation *need not* be the last one.

So what you need to do in patch#5 or patch#6 is this (diff shown with 6
lines of context):

> diff --git a/OvmfPkg/AcpiPlatformDxe/QemuFwCfgAcpi.c b/OvmfPkg/AcpiPlatformDxe/QemuFwCfgAcpi.c
> index 38584fd608df..0c2b2215f316 100644
> --- a/OvmfPkg/AcpiPlatformDxe/QemuFwCfgAcpi.c
> +++ b/OvmfPkg/AcpiPlatformDxe/QemuFwCfgAcpi.c
> @@ -1246,50 +1246,63 @@ InstallQemuFwCfgTables (
>        if (EFI_ERROR (Status)) {
>          goto UninstallAcpiTables;
>        }
>      }
>    }
>
> +  //
> +  // Install a protocol to notify that the ACPI table provided by Qemu is
> +  // ready.
> +  //
> +  Status = gBS->InstallProtocolInterface (
> +                  &QemuAcpiHandle,
> +                  &gQemuAcpiTableNotifyProtocolGuid,
> +                  EFI_NATIVE_INTERFACE,
> +                  NULL
> +                  );
> +  if (EFI_ERROR (Status)) {
> +    goto UninstallAcpiTables;
> +  }
> +
>    //
>    // Translating the condensed QEMU_LOADER_WRITE_POINTER commands to ACPI S3
>    // Boot Script opcodes has to be the last operation in this function, because
>    // if it succeeds, it cannot be undone.
>    //
>    if (S3Context != NULL) {
>      Status = TransferS3ContextToBootScript (S3Context);
>      if (EFI_ERROR (Status)) {
> -      goto UninstallAcpiTables;
> +      goto UninstallQemuAcpiTableNotifyProtocol;
>      }
>
>      //
>      // Ownership of S3Context has been transferred.
>      //
>      S3Context = NULL;
>    }
>
> +UninstallQemuAcpiTableNotifyProtocol:
> +  if (EFI_ERROR (Status)) {
> +    gBS->UninstallProtocolInterface (
> +           QemuAcpiHandle,
> +           &gQemuAcpiTableNotifyProtocolGuid,
> +           NULL
> +           );
> +  }
> +
>  UninstallAcpiTables:
>    if (EFI_ERROR (Status)) {
>      //
>      // roll back partial installation
>      //
>      while (Installed > 0) {
>        --Installed;
>        AcpiProtocol->UninstallAcpiTable (AcpiProtocol, InstalledKey[Installed]);
>      }
>    } else {
>      DEBUG ((DEBUG_INFO, "%a: installed %d tables\n", __FUNCTION__, Installed));
> -    //
> -    // Install a protocol to notify that the ACPI table provided by Qemu is
> -    // ready.
> -    //
> -    gBS->InstallProtocolInterface (
> -           &mQemuAcpiHandle,
> -           &gQemuAcpiTableNotifyProtocolGuid,
> -           EFI_NATIVE_INTERFACE,
> -           &mAcpiNotifyProtocol
> -           );
>    }
>
>    for (SeenPointerEntry = OrderedCollectionMin (SeenPointers);
>         SeenPointerEntry != NULL;
>         SeenPointerEntry = SeenPointerEntry2)
>    {

Your proposed solution on the error path -- placing the uninstallation
under the existent "UninstallAcpiTables" label, and gating it with
(QemuAcpiHandle != NULL) -- may not be wrong, functionally speaking, but
it still breaks the cohesion of the function.

Specifically, the cohesion-break is that your proposal creates a path
through the code where the (QemuAcpiHandle != NULL) condition is
evaluated, on the error path, *without control ever having reached*
InstallProtocolInterface() previously. Namely, if one of the
Process2ndPassCmdAddPointer() calls fails, in the loop above
InstallProtocolInterface(). Saving that code path is why you need to set
"QemuAcpiHandle" to NULL at the top of the function.

In fact, I request that you *not* set QemuAcpiHandle to NULL at the top
of the function. Letting *only* InstallProtocolInterface() set it, upon
success, is safe, as long as you use the pattern shown above.

--*--

Okay, yet another topic. Before you ask, let me comment on this (from
preexistent code):

> FreeS3Context:
>   if (S3Context != NULL) {
>     ReleaseS3Context (S3Context);
>   }

This is being gated so because S3Context undergoes an ownership transfer
-- in that last, impossible-to-rollback step on the construction path
--, and because S3Context is *optional* as well. So at this point
S3Context is non-NULL precisely if we've had a failure *AND* S3Context
was *needed* in the first place. Just checking EFI_ERROR (Status) is
insufficient, because even on error, S3Context could still be NULL (it's
optional), and once we check for nullity, checking Status becomes
superfluous. (S3Context!=NULL at this point implies EFI_ERROR(Status),
per above.)

--*--

Finally, I *will* admit that the original placement of the DEBUG message
is not ideal. It should not be on an "else" branch under the
"UninstallAcpiTables" label. Instead, it should be, and *stay*, just
above the very first error handling label, where we effectively declare
the function "completed". So the original incorrect placement of the
DEBUG is my mistake, from earlier -- I'm sorry.

Thus, in patch#4, please do reinstate the DEBUG just above the
"UninstallAcpiTables" label. And in patch #5 or #6, keep it above the
new label "UninstallQemuAcpiTableNotifyProtocol".

Thanks
Laszlo

>
> diff --git a/OvmfPkg/AcpiPlatformDxe/CloudHvAcpi.c b/OvmfPkg/AcpiPlatformDxe/CloudHvAcpi.c
> index cbe8bb9b0c75..81f387438a64 100644
> --- a/OvmfPkg/AcpiPlatformDxe/CloudHvAcpi.c
> +++ b/OvmfPkg/AcpiPlatformDxe/CloudHvAcpi.c
> @@ -18,13 +18,9 @@
>
>  #include <Protocol/AcpiSystemDescriptionTable.h>
>  #include <Protocol/AcpiTable.h>
> -#include <Protocol/QemuAcpiTableNotify.h>                 // QEMU_ACPI_TABLE_NOTIFY_PROTOCOL
>
>  #include "AcpiPlatform.h"
>
> -EFI_HANDLE                       mChAcpiHandle = NULL;
> -QEMU_ACPI_TABLE_NOTIFY_PROTOCOL  mChAcpiNotifyProtocol;
> -
>  EFI_STATUS
>  EFIAPI
>  InstallCloudHvTablesTdx (
> @@ -33,13 +29,15 @@ InstallCloudHvTablesTdx (
>  {
>    EFI_STATUS  Status;
>    UINTN       TableHandle;
> +  EFI_HANDLE  ChAcpiHandle;
>
>    EFI_PEI_HOB_POINTERS         Hob;
>    EFI_ACPI_DESCRIPTION_HEADER  *CurrentTable;
>    EFI_ACPI_DESCRIPTION_HEADER  *DsdtTable;
>
> -  DsdtTable   = NULL;
> -  TableHandle = 0;
> +  DsdtTable    = NULL;
> +  TableHandle  = 0;
> +  ChAcpiHandle = NULL;
>
>    Hob.Guid = (EFI_HOB_GUID_TYPE *)GetFirstGuidHob (&gUefiOvmfPkgTdxAcpiHobGuid);
>
> @@ -92,12 +90,15 @@ InstallCloudHvTablesTdx (
>    // Install a protocol to notify that the ACPI table provided by CH is
>    // ready.
>    //
> -  gBS->InstallProtocolInterface (
> -         &mChAcpiHandle,
> -         &gQemuAcpiTableNotifyProtocolGuid,
> -         EFI_NATIVE_INTERFACE,
> -         &mChAcpiNotifyProtocol
> -         );
> +  Status = gBS->InstallProtocolInterface (
> +                  &ChAcpiHandle,
> +                  &gQemuAcpiTableNotifyProtocolGuid,
> +                  EFI_NATIVE_INTERFACE,
> +                  NULL
> +                  );
> +  if (EFI_ERROR (Status)) {
> +    return Status;
> +  }
>
>    return EFI_SUCCESS;
>  }
> diff --git a/OvmfPkg/AcpiPlatformDxe/QemuFwCfgAcpi.c b/OvmfPkg/AcpiPlatformDxe/QemuFwCfgAcpi.c
> index c8dee17c13e6..22340b13e3ee 100644
> --- a/OvmfPkg/AcpiPlatformDxe/QemuFwCfgAcpi.c
> +++ b/OvmfPkg/AcpiPlatformDxe/QemuFwCfgAcpi.c
> @@ -19,10 +19,7 @@
>  #include <Library/QemuFwCfgS3Lib.h>           // QemuFwCfgS3Enabled()
>  #include <Library/UefiBootServicesTableLib.h> // gBS
>
> -#include <Protocol/QemuAcpiTableNotify.h>
>  #include "AcpiPlatform.h"
> -EFI_HANDLE                       mQemuAcpiHandle = NULL;
> -QEMU_ACPI_TABLE_NOTIFY_PROTOCOL  mAcpiNotifyProtocol;
>
>  //
>  // The user structure for the ordered collection that will track the fw_cfg
> @@ -1103,6 +1100,9 @@ InstallQemuFwCfgTables (
>    ORDERED_COLLECTION_ENTRY  *TrackerEntry, *TrackerEntry2;
>    ORDERED_COLLECTION        *SeenPointers;
>    ORDERED_COLLECTION_ENTRY  *SeenPointerEntry, *SeenPointerEntry2;
> +  EFI_HANDLE                QemuAcpiHandle;
> +
> +  QemuAcpiHandle = NULL;
>
>    Status = QemuFwCfgFindFile ("etc/table-loader", &FwCfgItem, &FwCfgSize);
>    if (EFI_ERROR (Status)) {
> @@ -1249,6 +1249,20 @@ InstallQemuFwCfgTables (
>      }
>    }
>
> +  //
> +  // Install a protocol to notify that the ACPI table provided by Qemu is
> +  // ready.
> +  //
> +  Status = gBS->InstallProtocolInterface (
> +                  &QemuAcpiHandle,
> +                  &gQemuAcpiTableNotifyProtocolGuid,
> +                  EFI_NATIVE_INTERFACE,
> +                  NULL
> +                  );
> +  if (EFI_ERROR (Status)) {
> +    goto UninstallAcpiTables;
> +  }
> +
>    //
>    // Translating the condensed QEMU_LOADER_WRITE_POINTER commands to ACPI S3
>    // Boot Script opcodes has to be the last operation in this function, because
> @@ -1275,17 +1289,19 @@ UninstallAcpiTables:
>        --Installed;
>        AcpiProtocol->UninstallAcpiTable (AcpiProtocol, InstalledKey[Installed]);
>      }
> +
> +    //
> +    // Uninstall the notification if it has been installed.
> +    //
> +    if (QemuAcpiHandle != NULL) {
> +      gBS->UninstallProtocolInterface (
> +             QemuAcpiHandle,
> +             &gQemuAcpiTableNotifyProtocolGuid,
> +             NULL
> +             );
> +    }
>    } else {
> -    //
> -    // Install a protocol to notify that the ACPI table provided by Qemu is
> -    // ready.
> -    //
> -    gBS->InstallProtocolInterface (
> -           &mQemuAcpiHandle,
> -           &gQemuAcpiTableNotifyProtocolGuid,
> -           EFI_NATIVE_INTERFACE,
> -           &mAcpiNotifyProtocol
> -           );
> +    DEBUG ((DEBUG_INFO, "%a: installed %d tables\n", __FUNCTION__, Installed));
>    }
>
>    for (SeenPointerEntry = OrderedCollectionMin (SeenPointers);
> diff --git a/OvmfPkg/Include/Protocol/QemuAcpiTableNotify.h b/OvmfPkg/Include/Protocol/QemuAcpiTableNotify.h
> deleted file mode 100644
> index a3dd2fc1dc91..000000000000
> --- a/OvmfPkg/Include/Protocol/QemuAcpiTableNotify.h
> +++ /dev/null
> @@ -1,27 +0,0 @@
> -/** @file
> -
> -  SPDX-License-Identifier: BSD-2-Clause-Patent
> -
> -**/
> -
> -#ifndef QEMU_ACPI_TABLE_NOTIFY_H_
> -#define QEMU_ACPI_TABLE_NOTIFY_H_
> -
> -#define QEMU_ACPI_TABLE_NOTIFY_GUID \
> -  { 0x928939b2, 0x4235, 0x462f, { 0x95, 0x80, 0xf6, 0xa2, 0xb2, 0xc2, 0x1a, 0x4f } };
> -
> -///
> -/// Forward declaration
> -///
> -typedef struct _QEMU_ACPI_TABLE_NOTIFY_PROTOCOL QEMU_ACPI_TABLE_NOTIFY_PROTOCOL;
> -
> -///
> -/// Protocol structure
> -///
> -struct _QEMU_ACPI_TABLE_NOTIFY_PROTOCOL {
> -  UINT8    Notify;
> -};
> -
> -extern EFI_GUID  gQemuAcpiTableNotifyProtocolGuid;
> -
> -#endif



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#97978): https://edk2.groups.io/g/devel/message/97978
Mute This Topic: https://groups.io/mt/96050679/1813853
Group Owner: devel+owner at edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [edk2-devel-archive at redhat.com]
-=-=-=-=-=-=-=-=-=-=-=-




More information about the edk2-devel-archive mailing list