[edk2-devel] [Patch 0/9] Enable multiple process AutoGen

Laszlo Ersek lersek at redhat.com
Sun Jul 21 12:26:06 UTC 2019


Hi Bob,

On 07/18/19 08:14, Bob Feng wrote:
> BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=1875
>
> In order to improve the build performance, we implemented
> multiple-processes AutoGen. This change will reduce 20% time
> for AutoGen phase.
>
> The design document can be got from:
> https://edk2.groups.io/g/devel/files/Designs/2019/0627/Multiple-thread-AutoGen.pdf
>
> This patch serial pass the build of Ovmf, MinKabylake, MinPurley,
> packages under Edk2 repository and intel client and server platforms.

I've done some basic regression-testing with this set, applied on top of
current master (= commit 8ff68cd5e4c9, "ShellPkg: acpiview: DBG2: Remove
redundant forward declarations", 2019-07-19).

(1) The build process now seems to produce files called

      GlobalVar_37d7e986-f7e9-45c2-8067-e371421a626c_AARCH64.bin
      GlobalVar_37d7e986-f7e9-45c2-8067-e371421a626c_ARM.bin
      GlobalVar_5a9e7754-d81b-49ea-85ad-69eaa7b1539b_IA32.bin
      GlobalVar_5a9e7754-d81b-49ea-85ad-69eaa7b1539b_X64.bin

    in the project root directory. (The GUIDs match the PLATFORM_GUID
    values from ArmVirtQemu.dsc and OvmfPkg*.dsc.)

    They seem to come from patch "BaseTools: Enable Multiple Process
    AutoGen".

    Can we place these files under the Build/ directory somewhere,
    instead?

(2) The QEMU project now bundles edk2 (as a git submodule for the edk2
    tree, and some firmware binaries built from that). The build scripts
    carried by QEMU set PYTHON_COMMAND to python2, in order to work
    around TianoCore BZ#1607.

    For regression-testing this set, I ran

    $ make -C roms efi

    with QEMU's edk2 submodule advanced to commit 8ff68cd5e4c9, plus
    this set applied (see above), and with QEMU itself being at
    e2b47666fe15.

    The build itself passes fine, so that's good.

    But, some (not all) of the firmware binaries are *misbuilt*. I'll
    elaborate below (independently of QEMU).

(3) Consider the following test. (Note that this is independent of the
    QEMU project entirely.)

    (3a) Set up a pristine build environment:

         # Open a new shell, then:
         $ git clean -ffdx
         $ git reset --hard
         $ git submodule deinit --force --all
         $ git checkout master
         $ git submodule update --init
         $ export PYTHON_COMMAND=python2
         $ source edksetup.sh
         $ nice make -C "$EDK_TOOLS_PATH" \
             -j $(getconf _NPROCESSORS_ONLN)

    (3b) Build OvmfPkgIA32 with one set of features, and capture a
         checksum of the resultant firmware binary:

         $ nice build -a IA32 -b NOOPT -p OvmfPkg/OvmfPkgIA32.dsc \
             -t GCC48
         $ md5sum Build/OvmfIa32/NOOPT_GCC48/FV/OVMF_CODE.fd
         8d56575dd3b5c89ac6f1829ae4f41b55

    (3c) *Re*build the same platform with a different set of features,
         and capture a checksum again:

         $ nice build -a IA32 -b NOOPT -p OvmfPkg/OvmfPkgIA32.dsc \
             -t GCC48 -D SECURE_BOOT_ENABLE -D SMM_REQUIRE
         $ md5sum Build/OvmfIa32/NOOPT_GCC48/FV/OVMF_CODE.fd
         ed79052e9add242174ccefc78a5bddb3

    Okay, now repeat all three steps from zero, but with the present
    feature patch set applied:

    (3d) Repeat (3a), but rather than checking out the master branch
         with "git checkout", check out the topic branch that is the
         same "master" commit, *plus* the present series applied on top.

         Don't forget to start a new shell first!

    (3e) Repeat (3b):

         $ nice build -a IA32 -b NOOPT -p OvmfPkg/OvmfPkgIA32.dsc \
             -t GCC48
         $ md5sum Build/OvmfIa32/NOOPT_GCC48/FV/OVMF_CODE.fd
         8d56575dd3b5c89ac6f1829ae4f41b55

         Notice that the checksum is identical to that from (3b), so
         all's good.

    (3f) Repeat (3c):

         $ nice build -a IA32 -b NOOPT -p OvmfPkg/OvmfPkgIA32.dsc \
             -t GCC48 -D SECURE_BOOT_ENABLE -D SMM_REQUIRE
         $ md5sum Build/OvmfIa32/NOOPT_GCC48/FV/OVMF_CODE.fd
         baf08c5a9ecb0adc754f5b48410b6476

         This is the problem: the checksum does not match the one from
         (3c). And, the binary from (3f) is in fact mis-built, it
         crashes immediately when launched on QEMU/KVM, with "emulation
         failure".

    Thus:

    - Without the patch set, if I build a platform, then *rebuild* it
      with different -D flags, then both builds produce proper outputs.

    - With the patch series applied however, only the initial build
      produces good results. The *rebuild*, with different -D flags, is
      "poisoned" by the artifacts still laying around from the first
      build.

    In other words, this is a "stale cache" problem.

(4) Not a show stopper, but a divergence from previous behavior:

    When one presses Ctrl-S (the "stop" character) in a terminal, the
    terminal output / autoscrolling is suspended, until Ctrl-Q (the
    "start" character) is pressed. This is generally used for two
    things: scrolling back the terminal history manually, for reading
    something that scrolled off the screen, and for briefly pausing
    whatever processing the program is doing.

    The idea being, if the program is blocked in a terminal write
    operation, it cannot do anything else.

    The present patch set changes the lastly mentioned behavior, because
    the logging is now decoupled to a separate thread ("BaseTools: Add
    LogAgent to support multiple process Autogen"). As a result, if the
    *output* of LogAgent is blocked, that does not block the *input* of
    LogAgent, and so the queue / buffer in LogAgent grows without bound,
    and the build continues even though the terminal output is stopped.

    This is a common initial symptom of programs with internal queues --
    it's usually good to implement internal flow control, under which
    any given queue has a limit, and if that limit is reached, the queue
    blocks the producer threads feeding it.

    Again, this is not a show stopper, just a divergence from previous
    behavior which we should be aware of, and possibly document
    somewhere.

    Importantly, Ctrl-Z (the "suspend" character) works just fine, for
    *both* purposes described above. (And then people can resume the
    build with "fg", like always.)

(5) Two questions out of interest:

    - Has this series been tested for restarting builds that were
      interrupted with Ctrl-C? Can the new build "pick up" where the
      previous build was interrupted?

    - Do we intend to offer a flag for disabling this feature? I'm not
      implying that the old code should be preserved as an alternative,
      but perhaps some internal constants related to parallelization can
      be tweaked such that the ultimate behavior becomes serial. Is the
      "-n 1" option supposed to have that effect?

    The second question is relevant because people might want to disable
    the new feature until it stabilizes (see issue (3) for example).

Thanks,
Laszlo

-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.

View/Reply Online (#44072): https://edk2.groups.io/g/devel/message/44072
Mute This Topic: https://groups.io/mt/32512451/1813853
Group Owner: devel+owner at edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub  [edk2-devel-archive at redhat.com]
-=-=-=-=-=-=-=-=-=-=-=-




More information about the edk2-devel-archive mailing list