[edk2-devel] [PATCH] BaseTools: fix decoding issue in file operation
Wang, Jian J
jian.j.wang at intel.com
Fri Oct 16 07:41:24 UTC 2020
The build tool reports failure upon file read, such as calling trim
to clean preprocessed source files, if the tool is running on OS with
non-western code-page and the source file has non-ascii characters.
Even if utf-8 has also problem when encountering some characters
encoded in cp1252 (such 0x92, 0x96, 0xa0, etc).
Currently, the safest way to read file in python code is using
'latin-1' (iso-8859-1) because it uses every byte between 00-FF
and then won't cause encoding/decoding issue. It behaves almost
the same as reading file in binary mode.
cp1252 is similar to latin-1 but it doesn't support encoding '\x80'
to '\xff' and doesn't support decoding following bytes:
'\x81', '\x8d', '\x8f', '\x90', '\x9d'
So if there're utf-8/16 encoded characters in file, it will fail
sometimes.
Refer to following links for details:
https://en.wikipedia.org/wiki/Latin-1_Supplement_(Unicode_block)
https://en.wikipedia.org/wiki/Windows-1252
https://kb.iu.edu/d/aepu
https://www.i18nqa.com/debug/table-iso8859-1-vs-windows-1252.html
One can use following python code to verify this.
for i in range(0x100):
try:
chr(i).encode('latin-1')
except:
print(" %s cannot encode %02x" % ('latin-1', i))
for i in range(0x100):
try:
b = bytes([i])
b.decode('latin-1')
except:
print(" %s cannot decode %02x" % ('latin-1', i))
This patch add code to enforce using 'latin-1' as encoding argument
of open() in function OpenLongFilePath(), if the open mode is for
text file only. This can solve the file decoding issue completely.
The possible related BZs:
https://bugzilla.tianocore.org/show_bug.cgi?id=1434
https://bugzilla.tianocore.org/show_bug.cgi?id=1637
https://bugzilla.tianocore.org/show_bug.cgi?id=2578
https://bugzilla.tianocore.org/show_bug.cgi?id=2709
https://bugzilla.tianocore.org/show_bug.cgi?id=2829
Cc: Bob Feng <bob.c.feng at intel.com>
Cc: Liming Gao <gaoliming at byosoft.com.cn>
Cc: Yuwei Chen <yuwei.chen at intel.com>
Signed-off-by: Jian J Wang <jian.j.wang at intel.com>
---
BaseTools/Source/Python/Common/LongFilePathSupport.py | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/BaseTools/Source/Python/Common/LongFilePathSupport.py b/BaseTools/Source/Python/Common/LongFilePathSupport.py
index 38c4396544..c8dce077f2 100644
--- a/BaseTools/Source/Python/Common/LongFilePathSupport.py
+++ b/BaseTools/Source/Python/Common/LongFilePathSupport.py
@@ -30,7 +30,8 @@ def LongFilePath(FileName):
# wrap open to support opening a long file path
#
def OpenLongFilePath(FileName, Mode='r', Buffer= -1):
- return open(LongFilePath(FileName), Mode, Buffer)
+ Encoding = None if 'b' in Mode else 'latin-1'
+ return open(LongFilePath(FileName), Mode, Buffer, Encoding)
def CodecOpenLongFilePath(Filename, Mode='rb', Encoding=None, Errors='strict', Buffering=1):
return codecs.open(LongFilePath(Filename), Mode, Encoding, Errors, Buffering)
--
2.24.0.windows.2
-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#66316): https://edk2.groups.io/g/devel/message/66316
Mute This Topic: https://groups.io/mt/77546105/1813853
Group Owner: devel+owner at edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [edk2-devel-archive at redhat.com]
-=-=-=-=-=-=-=-=-=-=-=-
More information about the edk2-devel-archive
mailing list