[edk2-devel] [PATCH] BaseTools: fix decoding issue in file operation

Wang, Jian J jian.j.wang at intel.com
Fri Oct 16 07:41:24 UTC 2020


The build tool reports failure upon file read, such as calling trim
to clean preprocessed source files, if the tool is running on OS with
non-western code-page and the source file has non-ascii characters.

Even if utf-8 has also problem when encountering some characters
encoded in cp1252 (such 0x92, 0x96, 0xa0, etc).

Currently, the safest way to read file in python code is using
'latin-1' (iso-8859-1) because it uses every byte between 00-FF
and then won't cause encoding/decoding issue. It behaves almost
the same as reading file in binary mode.

cp1252 is similar to latin-1 but it doesn't support encoding '\x80'
to '\xff' and doesn't support decoding following bytes:

  '\x81', '\x8d', '\x8f', '\x90', '\x9d'

So if there're utf-8/16 encoded characters in file, it will fail
sometimes.

Refer to following links for details:
  https://en.wikipedia.org/wiki/Latin-1_Supplement_(Unicode_block)
  https://en.wikipedia.org/wiki/Windows-1252
  https://kb.iu.edu/d/aepu
  https://www.i18nqa.com/debug/table-iso8859-1-vs-windows-1252.html

One can use following python code to verify this.

for i in range(0x100):
    try:
        chr(i).encode('latin-1')
    except:
        print("    %s cannot encode %02x" % ('latin-1', i))

for i in range(0x100):
    try:
        b = bytes([i])
        b.decode('latin-1')
    except:
        print("    %s cannot decode %02x" % ('latin-1', i))

This patch add code to enforce using 'latin-1' as encoding argument
of open() in function OpenLongFilePath(), if the open mode is for
text file only. This can solve the file decoding issue completely.

The possible related BZs:
    https://bugzilla.tianocore.org/show_bug.cgi?id=1434
    https://bugzilla.tianocore.org/show_bug.cgi?id=1637
    https://bugzilla.tianocore.org/show_bug.cgi?id=2578
    https://bugzilla.tianocore.org/show_bug.cgi?id=2709
    https://bugzilla.tianocore.org/show_bug.cgi?id=2829

Cc: Bob Feng <bob.c.feng at intel.com>
Cc: Liming Gao <gaoliming at byosoft.com.cn>
Cc: Yuwei Chen <yuwei.chen at intel.com>
Signed-off-by: Jian J Wang <jian.j.wang at intel.com>
---
 BaseTools/Source/Python/Common/LongFilePathSupport.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/BaseTools/Source/Python/Common/LongFilePathSupport.py b/BaseTools/Source/Python/Common/LongFilePathSupport.py
index 38c4396544..c8dce077f2 100644
--- a/BaseTools/Source/Python/Common/LongFilePathSupport.py
+++ b/BaseTools/Source/Python/Common/LongFilePathSupport.py
@@ -30,7 +30,8 @@ def LongFilePath(FileName):
 # wrap open to support opening a long file path
 #
 def OpenLongFilePath(FileName, Mode='r', Buffer= -1):
-    return open(LongFilePath(FileName), Mode, Buffer)
+    Encoding = None if 'b' in Mode else 'latin-1'
+    return open(LongFilePath(FileName), Mode, Buffer, Encoding)
 
 def CodecOpenLongFilePath(Filename, Mode='rb', Encoding=None, Errors='strict', Buffering=1):
     return codecs.open(LongFilePath(Filename), Mode, Encoding, Errors, Buffering)
-- 
2.24.0.windows.2



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#66316): https://edk2.groups.io/g/devel/message/66316
Mute This Topic: https://groups.io/mt/77546105/1813853
Group Owner: devel+owner at edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [edk2-devel-archive at redhat.com]
-=-=-=-=-=-=-=-=-=-=-=-






More information about the edk2-devel-archive mailing list