Re: 回复: [edk2-devel] Multithreaded compression with LZMA2

Daniel Schaefer daniel.schaefer at hpe.com
Wed Dec 2 08:24:47 UTC 2020


On 12/2/20 1:21 PM, gaoliming wrote:
> Daniel:
> 
>   Can you provide the compressed image size? And, what image is used to be compressed? Is it the generated FV image?
> 
> Thanks
> 
> Liming
> 
> *发件人:*bounce+27952+68159+4905953+8761045 at groups.io <bounce+27952+68159+4905953+8761045 at groups.io> *代表 *Andrew Fish via groups.io
> *发送时间:*2020年12月2日11:37
> *收件人:*devel at edk2.groups.io; daniel.schaefer at hpe.com
> *抄送:*derek.lin2 at hpe.com
> *主题:*Re: [edk2-devel] Multithreaded compression with LZMA2
> 
> 
> 
>     On Dec 1, 2020, at 6:59 PM, Daniel Schaefer <daniel.schaefer at hpe.com <mailto:daniel.schaefer at hpe.com>> wrote:
> 
>     Hi everyone,
> 
>     I'm looking into how to speed up the build process and noticed that our build
>     uses LZMA to encrypt the main firmware volume. Since it's quite big it takes a
>     while but only uses one CPU thread.
> 
>     LZMA2 is a version of LZMA which can be multi-threaded and achieve much faster
>     compression times. I did a quick benchmark using the `xz` command-line tool,
>     which uses a modified version of the LZMA SDK that EDK2 uses. The results are:
> 
>     Uncompressed size: 64M
> 
>     | Algo  | Comp Time | Decomp Time | Size | Threads |
>     | ----- | --------- | ----------- | ---- | ------- |
>     | LZMA  |    19.67s |        0.9s | 9.1M |       1 |
>     | LZMA2 |    20.11s |        1.2s | 9.2M |       1 |
>     | LZMA2 |     8.31s |        1.0s | 9.4M |       4 |
> 
>     Using those commands:
> 
>     time xz --format=lzma testfile
>     time unlzma testfile.lzma
> 
>     time xz --lzma2 testfile
>     time unxz testfile.xz
> 
>     time xz -T4 --lzma2 testfile
>     time unxz testfile.xz
> 
>     This is quite a significant improvement of build time, while decompression time
>     and size only slightly increase. If that's a concern, then LZMA2 could be used
>     for development only.
> 
>     I haven't investigated the details of how to support this in the code but it
>     appears to be a simple change, since the LZMA SDK that we use already supports
>     LZMA2.
> 
>     What do you think?
> 
> Interesting idea. What OS did you use? I tried this on macOS on some larger FVs and I did not see much difference? I tried a 17.5 MiB FV and it was around 3 seconds both ways.
> 
> Maybe it would be worth while seeing how it works on various systems? I guess it might be data set related?

Hi Andrew and Liming,

the FV file is our main FV with the majority of DXEs. It's 64MB uncompressed
and 9MB compressed, as mentioned before.  Unfortunately I cannot share that
particular file with you but I am also suprised that it compresses so well to
just 14% of its original size.

I'm running my tests on X64 Linux. I ran the same tests again on a more
powerful machine with the hyperfine command. It runs the testcase 3 times to
warm up (e.g. caches) and then runs it 10 times that count towards the average.
The result is the same as before. The compression takes just 40% with 4 threads.
I don't observe any further speedup by using all 16 thread of the CPU.

# Simple LZMA
$ hyperfine --warmup 3 'xz -k --format=lzma testfile && rm testfile.lzma'
Benchmark #1: xz -k --format=lzma testfile && rm testfile.lzma
   Time (mean ± σ):     12.755 s ±  0.151 s    [User: 12.691 s, System: 0.064 s]
   Range (min … max):   12.568 s … 12.991 s    10 runs

# LZMA2 with single thread
$ hyperfine --warmup 3 'xz -k -T1 --lzma2 testfile && rm testfile.xz'
Benchmark #1: xz -k -T1 --lzma2 testfile && rm testfile.xz
   Time (mean ± σ):     12.838 s ±  0.149 s    [User: 12.783 s, System: 0.055 s]
   Range (min … max):   12.546 s … 13.053 s    10 runs

# LZMA2 with 4 threads
$ hyperfine --warmup 3 'xz -k -T4 --lzma2 testfile && rm testfile.xz'
Benchmark #1: xz -k -T4 --lzma2 testfile && rm testfile.xz
   Time (mean ± σ):      5.241 s ±  0.025 s    [User: 13.537 s, System: 0.177 s]
   Range (min … max):    5.227 s …  5.302 s    10 runs

Using xz from mingw64 on Windows 10 X64 the 4-threaded compression takes 47% of single-threaded.

---

I wanted to try it on a bigger file and ran the same benchmarks on a 16MB file
to discover that compression is no faster with multithreading. The file shrinks
from 16MB to 9MB. However, after my tests I discovered that this is the
combined FV, which includes a few other compressed FVs.
Here the multithreaded command is even able to compress the FV 0.2MB more.

$ hyperfine --warmup 3 'xz -k -T1 --lzma2 testfile && rm testfile.xz'
Benchmark #1: xz -k -T1 --lzma2 testfile && rm testfile.xz
   Time (mean ± σ):      2.874 s ±  0.134 s    [User: 2.825 s, System: 0.049 s]
   Range (min … max):    2.751 s …  3.088 s    10 runs
$ ls -lh
-rw-r--r-- 1 zoid users  16M Dec  2 15:29 testfile
-rw-r--r-- 1 zoid users 9.4M Dec  2 15:29 testfile.lzma

$ hyperfine --warmup 3 'xz -k -T4 --lzma2 testfile && rm testfile.xz'
Benchmark #1: xz -k -T4 --lzma2 testfile && rm testfile.xz
   Time (mean ± σ):      2.874 s ±  0.108 s    [User: 2.818 s, System: 0.070 s]
   Range (min … max):    2.775 s …  3.081 s    10 runs
$ ls -lh
-rw-r--r-- 1 zoid users  16M Dec  2 15:29 testfile
-rw-r--r-- 1 zoid users 9.2M Dec  2 15:29 testfile.xz

So even with a file that doesn't compress as well, the performance doesn't get any worse.

Thanks,
Daniel


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#68166): https://edk2.groups.io/g/devel/message/68166
Mute This Topic: https://groups.io/mt/78653662/1813853
Group Owner: devel+owner at edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [edk2-devel-archive at redhat.com]
-=-=-=-=-=-=-=-=-=-=-=-






More information about the edk2-devel-archive mailing list