[Freeipa-devel] [PATCH] 0748 Handle encoding for ipautil.run

Petr Viktorin pviktori at redhat.com
Wed Nov 25 14:47:36 UTC 2015


On 11/25/2015 11:04 AM, Jan Cholasta wrote:
> On 24.11.2015 17:21, Petr Viktorin wrote:
>> On 11/23/2015 10:50 AM, Jan Cholasta wrote:
>>> On 23.11.2015 07:43, Jan Cholasta wrote:
>>>> On 19.11.2015 00:55, Petr Viktorin wrote:
>>>>> On 11/03/2015 02:39 PM, Petr Viktorin wrote:
>>>>>> Hello,
>>>>>> Python 3's strings are Unicode, so data coming to or leaving a Python
>>>>>> program needs to be decoded/encoded if it's to be handled as a
>>>>>> string.
>>>>>> One of the boundaries where encoding is necessary is external
>>>>>> programs,
>>>>>> specifically, ipautil.run.
>>>>>> Unfortunately there's no one set of options that would work for all
>>>>>> run() invocations, so I went through them all and specified the
>>>>>> stdin/stdout/stderr encoding where necessary. I've also updated the
>>>>>> call
>>>>>> sites to make it clearer if the return values are used or not.
>>>>>> If an encoding is not set, run() will accept/return bytes. (This is a
>>>>>> fail-safe setting, since it can't raise errors, and using bytes where
>>>>>> strings are expected generally fails loudly in py3.)
>>>>>>
>>>>>> Note that the changes are not effective under Python 2.
>>>>>
>>>>> ping,
>>>>> Could someone look at this patch?
>>>>
>>>> Looking.
>>>
>>> 1) I think a better approach would be to use str for stdin/stdout/stderr
>>> by default in both Python 2 and 3, i.e. do nothing in Python 2 and
>>> encode/decode using locale.getpreferredencoding() in Python 3. This is
>>> consistent with sys.std{in,out,err} in both Python 2 and 3. Using bytes
>>> or different encoding should be opt-in.
>>>
>>> Note that different encoding should be used only if the LC_ALL or LANG
>>> variables are overriden in the command's environment.
>>
>> That would assume the output of *everything* run via ipautil.run can be
>> decoded using the locale encoding. Any stray invalid byte would make IPA
>> crash, even in cases where we don't care about the output. IPA calls too
>> many weird tools to assume they all output text.
> 
> Such a stray invalid byte may bubble through our code and cause havoc
> somewhere else, which is much harder to troubleshoot than a crash in
> ipautil.run(), where you can see that it was a misbehaving command that
> caused the crash and exactly what command it was.

The invalid byte can't bubble through code, because if you don't specify
an encoding you get bytes back. You'll get a crash when you try using it
as a string.

locale.getpreferredencoding() is not used consistently in all software,
especially if it's not written in Python. For instance, wget won't
magically re-encode the data it fetches for us. It's better to
explicitly specify the encoding every time.

> If we don't care about output somewhere, we should not capture it there
> at all.

Then people need to remember to put "capture_output=True" everywhere
(except that also disables logging of the output, so we'd need an
additional option).

>>> 3) I think the "surrogateescape" error handler would be better for
>>> non-strict decoding, as the text can be encoded back to bytes without
>>> loss of information.
>>
>> Non-strict decoding is meant for logging, so we want backslash escapes
>> rather than invalid surrogates. If you need round-tripping then you
>> should use bytes.
> 
> I see. There is no guarantee that non-strict output will be used only
> for logging though.
> 
> I think it might be better to return raw output and let the caller use a
> decode-for-logging utility function instead. The logger already escapes
> invalid bytes itself, so the function would have to be used only when
> raising exceptions.

It's also used for logging at a higher level (info-critical).
But this makes sense; changed.

-- 
Petr Viktorin

-------------- next part --------------
A non-text attachment was scrubbed...
Name: freeipa-pviktori-0748.3-Handle-encoding-for-ipautil.run.patch
Type: text/x-patch
Size: 59069 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/freeipa-devel/attachments/20151125/cc501b4e/attachment.bin>


More information about the Freeipa-devel mailing list