[Freeipa-devel] [PATCH] 0748 Handle encoding for ipautil.run

Wed Nov 25 10:04:30 UTC 2015

On 24.11.2015 17:21, Petr Viktorin wrote:
> On 11/23/2015 10:50 AM, Jan Cholasta wrote:
>> On 23.11.2015 07:43, Jan Cholasta wrote:
>>> On 19.11.2015 00:55, Petr Viktorin wrote:
>>>> On 11/03/2015 02:39 PM, Petr Viktorin wrote:
>>>>> Hello,
>>>>> Python 3's strings are Unicode, so data coming to or leaving a Python
>>>>> program needs to be decoded/encoded if it's to be handled as a string.
>>>>> One of the boundaries where encoding is necessary is external programs,
>>>>> specifically, ipautil.run.
>>>>> Unfortunately there's no one set of options that would work for all
>>>>> run() invocations, so I went through them all and specified the
>>>>> stdin/stdout/stderr encoding where necessary. I've also updated the
>>>>> call
>>>>> sites to make it clearer if the return values are used or not.
>>>>> If an encoding is not set, run() will accept/return bytes. (This is a
>>>>> fail-safe setting, since it can't raise errors, and using bytes where
>>>>> strings are expected generally fails loudly in py3.)
>>>>>
>>>>> Note that the changes are not effective under Python 2.
>>>>
>>>> ping,
>>>> Could someone look at this patch?
>>>
>>> Looking.
>>
>> 1) I think a better approach would be to use str for stdin/stdout/stderr
>> by default in both Python 2 and 3, i.e. do nothing in Python 2 and
>> encode/decode using locale.getpreferredencoding() in Python 3. This is
>> consistent with sys.std{in,out,err} in both Python 2 and 3. Using bytes
>> or different encoding should be opt-in.
>>
>> Note that different encoding should be used only if the LC_ALL or LANG
>> variables are overriden in the command's environment.
>
> That would assume the output of *everything* run via ipautil.run can be
> decoded using the locale encoding. Any stray invalid byte would make IPA
> crash, even in cases where we don't care about the output. IPA calls too
> many weird tools to assume they all output text.

Such a stray invalid byte may bubble through our code and cause havoc 
somewhere else, which is much harder to troubleshoot than a crash in 
ipautil.run(), where you can see that it was a misbehaving command that 
caused the crash and exactly what command it was.

If we don't care about output somewhere, we should not capture it there 
at all.

>> 3) I think the "surrogateescape" error handler would be better for
>> non-strict decoding, as the text can be encoded back to bytes without
>> loss of information.
>
> Non-strict decoding is meant for logging, so we want backslash escapes
> rather than invalid surrogates. If you need round-tripping then you
> should use bytes.

I see. There is no guarantee that non-strict output will be used only 
for logging though.

I think it might be better to return raw output and let the caller use a 
decode-for-logging utility function instead. The logger already escapes 
invalid bytes itself, so the function would have to be used only when 
raising exceptions.

> I've changed the error strategy to 'replace' though, as I realized
> 'backslashreplace' doesn't work for decoding.

-- 
Jan Cholasta