回复: 回复: [edk2-devel] [PATCH] BaseMemoryLibSse2: Take advantage of write combining buffers

gaoliming gaoliming at byosoft.com.cn
Fri Oct 16 00:59:44 UTC 2020


I have created pull request for this patch set https://github.com/tianocore/edk2/pull/1009

Thanks
Liming
> -----邮件原件-----
> 发件人: bounce+27952+66282+4905953+8761045 at groups.io
> <bounce+27952+66282+4905953+8761045 at groups.io> 代表 Compostella,
> Jeremy
> 发送时间: 2020年10月16日 1:48
> 收件人: gaoliming <gaoliming at byosoft.com.cn>; devel at edk2.groups.io
> 主题: Re: 回复: [edk2-devel] [PATCH] BaseMemoryLibSse2: Take advantage
> of write combining buffers
> 
> Thank for the review Liming.  Is there anything else I should to help
> with the review/merge process ?
> 
> Regards,
> Jeremy
> 
> gaoliming <gaoliming at byosoft.com.cn> writes:
> 
> > This is a good enhancement. The change is good to me.
> >
> > Reviewed-by: Liming Gao <gaoliming at byosoft.com.cn>
> >
> > Thanks
> > Liming
> >> -----邮件原件-----
> >> 发件人: bounce+27952+66093+4905953+8761045 at groups.io
> >> <bounce+27952+66093+4905953+8761045 at groups.io> 代表
> Compostella,
> >> Jeremy
> >> 发送时间: 2020年10月10日 4:43
> >> 收件人: devel at edk2.groups.io
> >> 主题: [edk2-devel] [PATCH] BaseMemoryLibSse2: Take advantage of
> write
> >> combining buffers
> >>
> >> The current SSE2 implementation of the ZeroMem(), SetMem(),
> >> SetMem16(), SetMem32 and SetMem64 functions is writing 16 bytes per
> 16
> >> bytes. It hurts the performances so bad that this is even slower than
> >> a simple 'rep stos' (4% slower) in regular DRAM.
> >>
> >> To take full advantages of the 'movntdq' instruction it is better to
> >> "queue" a total of 64 bytes in the write combining buffers.  This
> >> patch implement such a change.  Below is a table where I measured
> >> (with 'rdtsc') the time to write an entire 100MB RAM buffer. These
> >> functions operate almost two times faster.
> >>
> >> | Function | Arch | Untouched | 64 bytes | Result |
> >> |----------+------+-----------+----------+--------|
> >> | ZeroMem  | Ia32 |  17765947 |  9136062 | 1.945x |
> >> | ZeroMem  | X64  |  17525170 |  9233391 | 1.898x |
> >> | SetMem   | Ia32 |  17522291 |  9137272 | 1.918x |
> >> | SetMem   | X64  |  17949261 |  9176978 | 1.956x |
> >> | SetMem16 | Ia32 |  18219673 |  9372062 | 1.944x |
> >> | SetMem16 | X64  |  17523331 |  9275184 | 1.889x |
> >> | SetMem32 | Ia32 |  18495036 |  9273053 | 1.994x |
> >> | SetMem32 | X64  |  17368864 |  9285885 | 1.870x |
> >> | SetMem64 | Ia32 |  18564473 |  9241362 | 2.009x |
> >> | SetMem64 | X64  |  17506951 |  9280148 | 1.886x |
> >>
> >> Signed-off-by: Jeremy Compostella <jeremy.compostella at intel.com>
> >> ---
> >>  .../BaseMemoryLibSse2/Ia32/SetMem.nasm        | 11 ++++++----
> >>  .../BaseMemoryLibSse2/Ia32/SetMem16.nasm      | 11 ++++++----
> >>  .../BaseMemoryLibSse2/Ia32/SetMem32.nasm      |  9 ++++++---
> >>  .../BaseMemoryLibSse2/Ia32/SetMem64.nasm      | 20
> >> +++++++++++++++----
> >>  .../BaseMemoryLibSse2/Ia32/ZeroMem.nasm       | 11 ++++++----
> >>  .../Library/BaseMemoryLibSse2/X64/SetMem.nasm |  9 ++++++---
> >>  .../BaseMemoryLibSse2/X64/SetMem16.nasm       | 11 ++++++----
> >>  .../BaseMemoryLibSse2/X64/SetMem32.nasm       |  9 ++++++---
> >>  .../BaseMemoryLibSse2/X64/SetMem64.nasm       | 19
> >> ++++++++++++++----
> >>  .../BaseMemoryLibSse2/X64/ZeroMem.nasm        | 13
> +++++++-----
> >>  10 files changed, 85 insertions(+), 38 deletions(-)
> >>
> >> diff --git a/MdePkg/Library/BaseMemoryLibSse2/Ia32/SetMem.nasm
> >> b/MdePkg/Library/BaseMemoryLibSse2/Ia32/SetMem.nasm
> >> index 24313cb4b3..a8744300c6 100644
> >> --- a/MdePkg/Library/BaseMemoryLibSse2/Ia32/SetMem.nasm
> >> +++ b/MdePkg/Library/BaseMemoryLibSse2/Ia32/SetMem.nasm
> >> @@ -34,7 +34,7 @@ ASM_PFX(InternalMemSetMem):
> >>      mov     al, [esp + 16]              ; al <- Value
> >>      xor     ecx, ecx
> >>      sub     ecx, edi
> >> -    and     ecx, 15                     ; ecx + edi aligns on
> 16-byte
> >> boundary
> >> +    and     ecx, 63                     ; ecx + edi aligns on
> 16-byte
> >> boundary
> >>      jz      .0
> >>      cmp     ecx, edx
> >>      cmova   ecx, edx
> >> @@ -42,8 +42,8 @@ ASM_PFX(InternalMemSetMem):
> >>      rep     stosb
> >>  .0:
> >>      mov     ecx, edx
> >> -    and     edx, 15
> >> -    shr     ecx, 4                      ; ecx <- # of DQwords to
> set
> >> +    and     edx, 63
> >> +    shr     ecx, 6                      ; ecx <- # of DQwords to
> set
> >>      jz      @SetBytes
> >>      mov     ah, al                      ; ax <- Value | (Value <<
> 8)
> >>      add     esp, -16
> >> @@ -53,7 +53,10 @@ ASM_PFX(InternalMemSetMem):
> >>      movlhps xmm0, xmm0                  ; xmm0 <- Value
> repeats
> >> 16 times
> >>  .1:
> >>      movntdq [edi], xmm0                 ; edi should be 16-byte
> >> aligned
> >> -    add     edi, 16
> >> +    movntdq [edi + 16], xmm0
> >> +    movntdq [edi + 32], xmm0
> >> +    movntdq [edi + 48], xmm0
> >> +    add     edi, 64
> >>      loop    .1
> >>      mfence
> >>      movdqu  xmm0, [esp]                 ; restore xmm0
> >> diff --git a/MdePkg/Library/BaseMemoryLibSse2/Ia32/SetMem16.nasm
> >> b/MdePkg/Library/BaseMemoryLibSse2/Ia32/SetMem16.nasm
> >> index 6e308b5594..d461ee086c 100644
> >> --- a/MdePkg/Library/BaseMemoryLibSse2/Ia32/SetMem16.nasm
> >> +++ b/MdePkg/Library/BaseMemoryLibSse2/Ia32/SetMem16.nasm
> >> @@ -33,7 +33,7 @@ ASM_PFX(InternalMemSetMem16):
> >>      mov     edi, [esp + 8]
> >>      xor     ecx, ecx
> >>      sub     ecx, edi
> >> -    and     ecx, 15                     ; ecx + edi aligns on
> 16-byte
> >> boundary
> >> +    and     ecx, 63                     ; ecx + edi aligns on
> 16-byte
> >> boundary
> >>      mov     eax, [esp + 16]
> >>      jz      .0
> >>      shr     ecx, 1
> >> @@ -43,15 +43,18 @@ ASM_PFX(InternalMemSetMem16):
> >>      rep     stosw
> >>  .0:
> >>      mov     ecx, edx
> >> -    and     edx, 7
> >> -    shr     ecx, 3
> >> +    and     edx, 31
> >> +    shr     ecx, 5
> >>      jz      @SetWords
> >>      movd    xmm0, eax
> >>      pshuflw xmm0, xmm0, 0
> >>      movlhps xmm0, xmm0
> >>  .1:
> >>      movntdq [edi], xmm0                 ; edi should be 16-byte
> >> aligned
> >> -    add     edi, 16
> >> +    movntdq [edi + 16], xmm0
> >> +    movntdq [edi + 32], xmm0
> >> +    movntdq [edi + 48], xmm0
> >> +    add     edi, 64
> >>      loop    .1
> >>      mfence
> >>  @SetWords:
> >> diff --git a/MdePkg/Library/BaseMemoryLibSse2/Ia32/SetMem32.nasm
> >> b/MdePkg/Library/BaseMemoryLibSse2/Ia32/SetMem32.nasm
> >> index 2cfc8cb0dd..3ffdcd07d7 100644
> >> --- a/MdePkg/Library/BaseMemoryLibSse2/Ia32/SetMem32.nasm
> >> +++ b/MdePkg/Library/BaseMemoryLibSse2/Ia32/SetMem32.nasm
> >> @@ -43,14 +43,17 @@ ASM_PFX(InternalMemSetMem32):
> >>      rep     stosd
> >>  .0:
> >>      mov     ecx, edx
> >> -    and     edx, 3
> >> -    shr     ecx, 2
> >> +    and     edx, 15
> >> +    shr     ecx, 4
> >>      jz      @SetDwords
> >>      movd    xmm0, eax
> >>      pshufd  xmm0, xmm0, 0
> >>  .1:
> >>      movntdq [edi], xmm0
> >> -    add     edi, 16
> >> +    movntdq [edi + 16], xmm0
> >> +    movntdq [edi + 32], xmm0
> >> +    movntdq [edi + 48], xmm0
> >> +    add     edi, 64
> >>      loop    .1
> >>      mfence
> >>  @SetDwords:
> >> diff --git a/MdePkg/Library/BaseMemoryLibSse2/Ia32/SetMem64.nasm
> >> b/MdePkg/Library/BaseMemoryLibSse2/Ia32/SetMem64.nasm
> >> index e153495a68..cd000648ae 100644
> >> --- a/MdePkg/Library/BaseMemoryLibSse2/Ia32/SetMem64.nasm
> >> +++ b/MdePkg/Library/BaseMemoryLibSse2/Ia32/SetMem64.nasm
> >> @@ -38,17 +38,29 @@ ASM_PFX(InternalMemSetMem64):
> >>      add     edx, 8
> >>      dec     ecx
> >>  .0:
> >> -    shr     ecx, 1
> >> +    push    ebx
> >> +    mov     ebx, ecx
> >> +    and     ebx, 7
> >> +    shr     ecx, 3
> >>      jz      @SetQwords
> >>      movlhps xmm0, xmm0
> >>  .1:
> >>      movntdq [edx], xmm0
> >> -    lea     edx, [edx + 16]
> >> +    movntdq [edx + 16], xmm0
> >> +    movntdq [edx + 32], xmm0
> >> +    movntdq [edx + 48], xmm0
> >> +    lea     edx, [edx + 64]
> >>      loop    .1
> >>      mfence
> >>  @SetQwords:
> >> -    jnc     .2
> >> +    test    ebx, ebx
> >> +    jz .3
> >> +    mov     ecx, ebx
> >> +.2
> >>      movq    qword [edx], xmm0
> >> -.2:
> >> +    lea     edx, [edx + 8]
> >> +    loop    .2
> >> +.3:
> >> +    pop ebx
> >>      ret
> >>
> >> diff --git a/MdePkg/Library/BaseMemoryLibSse2/Ia32/ZeroMem.nasm
> >> b/MdePkg/Library/BaseMemoryLibSse2/Ia32/ZeroMem.nasm
> >> index cd34006f59..0e0828551b 100644
> >> --- a/MdePkg/Library/BaseMemoryLibSse2/Ia32/ZeroMem.nasm
> >> +++ b/MdePkg/Library/BaseMemoryLibSse2/Ia32/ZeroMem.nasm
> >> @@ -33,7 +33,7 @@ ASM_PFX(InternalMemZeroMem):
> >>      xor     ecx, ecx
> >>      sub     ecx, edi
> >>      xor     eax, eax
> >> -    and     ecx, 15
> >> +    and     ecx, 63
> >>      jz      .0
> >>      cmp     ecx, edx
> >>      cmova   ecx, edx
> >> @@ -41,13 +41,16 @@ ASM_PFX(InternalMemZeroMem):
> >>      rep     stosb
> >>  .0:
> >>      mov     ecx, edx
> >> -    and     edx, 15
> >> -    shr     ecx, 4
> >> +    and     edx, 63
> >> +    shr     ecx, 6
> >>      jz      @ZeroBytes
> >>      pxor    xmm0, xmm0
> >>  .1:
> >>      movntdq [edi], xmm0
> >> -    add     edi, 16
> >> +    movntdq [edi + 16], xmm0
> >> +    movntdq [edi + 32], xmm0
> >> +    movntdq [edi + 48], xmm0
> >> +    add     edi, 64
> >>      loop    .1
> >>      mfence
> >>  @ZeroBytes:
> >> diff --git a/MdePkg/Library/BaseMemoryLibSse2/X64/SetMem.nasm
> >> b/MdePkg/Library/BaseMemoryLibSse2/X64/SetMem.nasm
> >> index 5bd1c2262d..28b11ee586 100644
> >> --- a/MdePkg/Library/BaseMemoryLibSse2/X64/SetMem.nasm
> >> +++ b/MdePkg/Library/BaseMemoryLibSse2/X64/SetMem.nasm
> >> @@ -42,8 +42,8 @@ ASM_PFX(InternalMemSetMem):
> >>      rep     stosb
> >>  .0:
> >>      mov     rcx, rdx
> >> -    and     rdx, 15
> >> -    shr     rcx, 4
> >> +    and     rdx, 63
> >> +    shr     rcx, 6
> >>      jz      @SetBytes
> >>      mov     ah, al                      ; ax <- Value repeats
> twice
> >>      movdqa  [rsp + 0x10], xmm0           ; save xmm0
> >> @@ -52,7 +52,10 @@ ASM_PFX(InternalMemSetMem):
> >>      movlhps xmm0, xmm0                  ; xmm0 <- Value
> repeats
> >> 16 times
> >>  .1:
> >>      movntdq [rdi], xmm0                 ; rdi should be 16-byte
> >> aligned
> >> -    add     rdi, 16
> >> +    movntdq [rdi + 16], xmm0
> >> +    movntdq [rdi + 32], xmm0
> >> +    movntdq [rdi + 48], xmm0
> >> +    add     rdi, 64
> >>      loop    .1
> >>      mfence
> >>      movdqa  xmm0, [rsp + 0x10]           ; restore xmm0
> >> diff --git a/MdePkg/Library/BaseMemoryLibSse2/X64/SetMem16.nasm
> >> b/MdePkg/Library/BaseMemoryLibSse2/X64/SetMem16.nasm
> >> index 90d159820a..375be19313 100644
> >> --- a/MdePkg/Library/BaseMemoryLibSse2/X64/SetMem16.nasm
> >> +++ b/MdePkg/Library/BaseMemoryLibSse2/X64/SetMem16.nasm
> >> @@ -33,7 +33,7 @@ ASM_PFX(InternalMemSetMem16):
> >>      mov     r9, rdi
> >>      xor     rcx, rcx
> >>      sub     rcx, rdi
> >> -    and     rcx, 15
> >> +    and     rcx, 63
> >>      mov     rax, r8
> >>      jz      .0
> >>      shr     rcx, 1
> >> @@ -43,15 +43,18 @@ ASM_PFX(InternalMemSetMem16):
> >>      rep     stosw
> >>  .0:
> >>      mov     rcx, rdx
> >> -    and     edx, 7
> >> -    shr     rcx, 3
> >> +    and     edx, 31
> >> +    shr     rcx, 5
> >>      jz      @SetWords
> >>      movd    xmm0, eax
> >>      pshuflw xmm0, xmm0, 0
> >>      movlhps xmm0, xmm0
> >>  .1:
> >>      movntdq [rdi], xmm0
> >> -    add     rdi, 16
> >> +    movntdq [rdi + 16], xmm0
> >> +    movntdq [rdi + 32], xmm0
> >> +    movntdq [rdi + 48], xmm0
> >> +    add     rdi, 64
> >>      loop    .1
> >>      mfence
> >>  @SetWords:
> >> diff --git a/MdePkg/Library/BaseMemoryLibSse2/X64/SetMem32.nasm
> >> b/MdePkg/Library/BaseMemoryLibSse2/X64/SetMem32.nasm
> >> index 928e086889..5d12beaa9a 100644
> >> --- a/MdePkg/Library/BaseMemoryLibSse2/X64/SetMem32.nasm
> >> +++ b/MdePkg/Library/BaseMemoryLibSse2/X64/SetMem32.nasm
> >> @@ -43,14 +43,17 @@ ASM_PFX(InternalMemSetMem32):
> >>      rep     stosd
> >>  .0:
> >>      mov     rcx, rdx
> >> -    and     edx, 3
> >> -    shr     rcx, 2
> >> +    and     edx, 15
> >> +    shr     rcx, 4
> >>      jz      @SetDwords
> >>      movd    xmm0, eax
> >>      pshufd  xmm0, xmm0, 0
> >>  .1:
> >>      movntdq [rdi], xmm0
> >> -    add     rdi, 16
> >> +    movntdq [rdi + 16], xmm0
> >> +    movntdq [rdi + 32], xmm0
> >> +    movntdq [rdi + 48], xmm0
> >> +    add     rdi, 64
> >>      loop    .1
> >>      mfence
> >>  @SetDwords:
> >> diff --git a/MdePkg/Library/BaseMemoryLibSse2/X64/SetMem64.nasm
> >> b/MdePkg/Library/BaseMemoryLibSse2/X64/SetMem64.nasm
> >> index d771810542..265983d5ad 100644
> >> --- a/MdePkg/Library/BaseMemoryLibSse2/X64/SetMem64.nasm
> >> +++ b/MdePkg/Library/BaseMemoryLibSse2/X64/SetMem64.nasm
> >> @@ -37,17 +37,28 @@ ASM_PFX(InternalMemSetMem64):
> >>      add     rdx, 8
> >>      dec     rcx
> >>  .0:
> >> -    shr     rcx, 1
> >> +    push    rbx
> >> +    mov     rbx, rcx
> >> +    and     rbx, 7
> >> +    shr     rcx, 3
> >>      jz      @SetQwords
> >>      movlhps xmm0, xmm0
> >>  .1:
> >>      movntdq [rdx], xmm0
> >> -    lea     rdx, [rdx + 16]
> >> +    movntdq [rdx + 16], xmm0
> >> +    movntdq [rdx + 32], xmm0
> >> +    movntdq [rdx + 48], xmm0
> >> +    lea     rdx, [rdx + 64]
> >>      loop    .1
> >>      mfence
> >>  @SetQwords:
> >> -    jnc     .2
> >> -    mov     [rdx], r8
> >> +    push    rdi
> >> +    mov     rcx, rbx
> >> +    mov     rax, r8
> >> +    mov     rdi, rdx
> >> +    rep     stosq
> >> +    pop     rdi
> >>  .2:
> >> +    pop rbx
> >>      ret
> >>
> >> diff --git a/MdePkg/Library/BaseMemoryLibSse2/X64/ZeroMem.nasm
> >> b/MdePkg/Library/BaseMemoryLibSse2/X64/ZeroMem.nasm
> >> index 5ddcae9ca5..21f504e3b7 100644
> >> --- a/MdePkg/Library/BaseMemoryLibSse2/X64/ZeroMem.nasm
> >> +++ b/MdePkg/Library/BaseMemoryLibSse2/X64/ZeroMem.nasm
> >> @@ -32,7 +32,7 @@ ASM_PFX(InternalMemZeroMem):
> >>      xor     rcx, rcx
> >>      xor     eax, eax
> >>      sub     rcx, rdi
> >> -    and     rcx, 15
> >> +    and     rcx, 63
> >>      mov     r8, rdi
> >>      jz      .0
> >>      cmp     rcx, rdx
> >> @@ -41,13 +41,16 @@ ASM_PFX(InternalMemZeroMem):
> >>      rep     stosb
> >>  .0:
> >>      mov     rcx, rdx
> >> -    and     edx, 15
> >> -    shr     rcx, 4
> >> +    and     edx, 63
> >> +    shr     rcx, 6
> >>      jz      @ZeroBytes
> >>      pxor    xmm0, xmm0
> >>  .1:
> >> -    movntdq [rdi], xmm0                 ; rdi should be 16-byte
> >> aligned
> >> -    add     rdi, 16
> >> +    movntdq [rdi], xmm0
> >> +    movntdq [rdi + 16], xmm0
> >> +    movntdq [rdi + 32], xmm0
> >> +    movntdq [rdi + 48], xmm0
> >> +    add     rdi, 64
> >>      loop    .1
> >>      mfence
> >>  @ZeroBytes:
> >> --
> >> 2.25.3
> >>
> >>
> >>
> >>
> >>
> 
> 
> 
> 





-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#66299): https://edk2.groups.io/g/devel/message/66299
Mute This Topic: https://groups.io/mt/77541862/1813853
Group Owner: devel+owner at edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [edk2-devel-archive at redhat.com]
-=-=-=-=-=-=-=-=-=-=-=-






More information about the edk2-devel-archive mailing list