[Freeipa-devel] [PATCH] 0748 Handle encoding for ipautil.run

Petr Viktorin pviktori at redhat.com
Tue Nov 24 16:21:07 UTC 2015


On 11/23/2015 10:50 AM, Jan Cholasta wrote:
> On 23.11.2015 07:43, Jan Cholasta wrote:
>> On 19.11.2015 00:55, Petr Viktorin wrote:
>>> On 11/03/2015 02:39 PM, Petr Viktorin wrote:
>>>> Hello,
>>>> Python 3's strings are Unicode, so data coming to or leaving a Python
>>>> program needs to be decoded/encoded if it's to be handled as a string.
>>>> One of the boundaries where encoding is necessary is external programs,
>>>> specifically, ipautil.run.
>>>> Unfortunately there's no one set of options that would work for all
>>>> run() invocations, so I went through them all and specified the
>>>> stdin/stdout/stderr encoding where necessary. I've also updated the
>>>> call
>>>> sites to make it clearer if the return values are used or not.
>>>> If an encoding is not set, run() will accept/return bytes. (This is a
>>>> fail-safe setting, since it can't raise errors, and using bytes where
>>>> strings are expected generally fails loudly in py3.)
>>>>
>>>> Note that the changes are not effective under Python 2.
>>>
>>> ping,
>>> Could someone look at this patch?
>>
>> Looking.
> 
> 1) I think a better approach would be to use str for stdin/stdout/stderr
> by default in both Python 2 and 3, i.e. do nothing in Python 2 and
> encode/decode using locale.getpreferredencoding() in Python 3. This is
> consistent with sys.std{in,out,err} in both Python 2 and 3. Using bytes
> or different encoding should be opt-in.
> 
> Note that different encoding should be used only if the LC_ALL or LANG
> variables are overriden in the command's environment.

That would assume the output of *everything* run via ipautil.run can be
decoded using the locale encoding. Any stray invalid byte would make IPA
crash, even in cases where we don't care about the output. IPA calls too
many weird tools to assume they all output text.

> 2) The str() here is wrong:
> 
> +    arg_string = nolog_replace(' '.join(str(shell_quote(a)) for a in
> args), nolog)
> 
> It converts b'data' to "b'data'" in Python 3.
> 
> Popen accepts both bytes and text in both Python 2 and 3, so there is
> nothing to be done for bytes arguments.

Right, the logging for bytes arguments could be better. Changed.

> 3) I think the "surrogateescape" error handler would be better for
> non-strict decoding, as the text can be encoded back to bytes without
> loss of information.

Non-strict decoding is meant for logging, so we want backslash escapes
rather than invalid surrogates. If you need round-tripping then you
should use bytes.
I've changed the error strategy to 'replace' though, as I realized
'backslashreplace' doesn't work for decoding.

> 4) There are direct calls to subprocess.Popen() in a few places, have
> you checked them as well?

No. I'll do it in another patch; thanks for remembering that.


-- 
Petr Viktorin



-------------- next part --------------
A non-text attachment was scrubbed...
Name: freeipa-pviktori-0748.2-Handle-encoding-for-ipautil.run.patch
Type: text/x-patch
Size: 55905 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/freeipa-devel/attachments/20151124/2a89b264/attachment.bin>


More information about the Freeipa-devel mailing list