[Freeipa-devel] [PATCH] 0748 Handle encoding for ipautil.run

Fri Dec 4 09:53:54 UTC 2015

On 12/04/2015 08:51 AM, Jan Cholasta wrote:
> On 3.12.2015 15:42, Petr Viktorin wrote:
>> On 12/03/2015 10:55 AM, Jan Cholasta wrote:
>> [...]
>>>>>>>>> If we don't care about output somewhere, we should not capture it
>>>>>>>>> there
>>>>>>>>> at all.
>>>>>>>>
>>>>>>>> Then people need to remember to put "capture_output=True"
>>>>>>>> everywhere
>>>>>>>> (except that also disables logging of the output, so we'd need an
>>>>>>>> additional option).
>>>>>>>
>>>>>>> capture_output=True is the default and capture_output=False
>>>>>>> should be
>>>>>>> set where output is not needed.
>>>>>>
>>>>>> "capture_output=True" being the default would mean that it's still
>>>>>> possible to leave "capture_output=False" out, but ignore the
>>>>>> output. It
>>>>>> would make the wrong usage easier to type than the right one,
>>>>>> which is
>>>>>> something to avoid.
>>>>>
>>>>> I'm not sure I understand what are you trying to say here.
>>>>>
>>>>> My point is that capture_output=True is currently the default (see for
>>>>> yourself) and it is indeed possible to leave capture_output=False but
>>>>> ignore the output. I believe this is wrong and that you should always
>>>>> have to explicitly request the output to be captured.
>>>>
>>>> OK, looks like I will need to refactor ipautil.run even for Python 2,
>>>> then.
>>>
>>> If this is just about changing the default and fixing all run()
>>> invocations, I can provide the patch myself, if that would work better
>>> for you.
>>>
>>>> I seem to have have problems coming up with a solution that would match
>>>> your expectations, so let me write the semantics as I understand you'd
>>>> like hem:
>>>>
>>>> Would you be happy with the following signature?
>>>>
>>>> def run(args, stdin=None, raiseonerr=True,
>>>>           nolog=(), env=None,
>>>>           capture_output=False, skip_output=False,
>>>>           cwd=None, runas=None, timeout=None, suplementary_groups=[],
>>>>           capture_stderr=False,
>>>>           encode_stdout=True,
>>>>           encode_stderr=True,
>>>>          ):
>>>>
>>>> Output and error would be always logged, except if skip_output is set.
>>>> But they would be only returned if capture_output/capture_stderr were
>>>> set.
>>>
>>> I would personally also rename capture_output to capture_stdout. If we
>>> change the default, we have to update its every use, so we might as well
>>> rename it. (Again, I can provide the patch.)
>>
>> If we agree on the semantics, writing the patch isn't that big a deal.
> 
> Right.
> 
>>
>>>> If encode_* is False, bytes are returned, by default
>>>> locale.getpreferredencoding() is used. (If the caller wants any other
>>>> encoding, it can do that by itself.)
>>>
>>> I thought bytes should be the default after all? See my comment in the
>>> previous mail.
>>
>> If the default is no capturing, I'd rather go with text by default.
>> With capturing by deafult, "run(args)" could raise encoding errors *if
>> the output is then ignored*. If it's "run(args, capture_stdout=True)", I
>> think it's fine to encode.
> 
> OK, makes sense.
> 
>>
>>>> stdin it encoded using locale.getpreferredencoding() if it's unicode.
>>>
>>> If we do encode/decode in run(), I think there should be a way to
>>> override the default encoding. My idea was to either accept/return only
>>> bytes and let the caller handle encoding themselves, or to handle
>>> encoding in run(), but be able to override the defaults.
>>>
>>> Given we handle encoding in run(), I would imagine it like this:
>>>
>>>      def run(args, stdin=None, raiseonerr=True, nolog=(), env=None,
>>>              capture_stdout=False, skip_output=False, cwd=None,
>>>              runas=None, timeout=None, suplementary_groups=[],
>>>              capture_stderr=False, encode_stdout=False,
>>>              encode_stderr=False, encoding=None):
>>>
>>> i.e. nothing is captured by default, when stdout or stderr are captured
>>> they are returned as bytes by default, when stdout or stderr are
>>> returned as text they are decoded using the locale encoding by default,
>>> or the encoding can be explicitly specified.
>>>
>>> Do you think this is feasible?
>>
>> Feasible, yes.
>> In the majority of cases where the output is needed, we want text
>> encoded with locale.getpreferredencoding(), so I'd rather make that the
>> default rather than bytes.
>>
>> Or we could make "encode_stdout" imply "capture_stdout", so you wouldn't
>> have to always specify both. (It would be better named "encoded_stdout"
>> in that case.)
> 
> I think it should be OK to decode by default. (IMO it would be best to
> reverse the logic and name it "raw_stdout", with False default. Do we
> need "raw_stderr" at all? I don't think we do.)

Actually, now that I think about it: if the result was a namedtuple
subclass, we can always set extra raw_stdout/raw_stderr attributes on
it. (Unless skip_output=True, of course.)

   result = run(['generate_binary_data'])
   use(result.raw_stdout)

   # and for backcompat,
   rc, _out, _err = result

> So the signature will be this:
> 
>       def run(args, stdin=None, raiseonerr=True, nolog=(), env=None,
>               capture_stdout=False, skip_output=False, cwd=None,
>               runas=None, timeout=None, suplementary_groups=[],
>               capture_stderr=False, raw_stdout=False, encoding=None):
>
> Then, when you want to capture just error messages on stderr:
> 
>     rc, _out, err = run(args,
>                         capture_stderr=True)

With namedtuple:

    result = run(args, capture_stderr=True, raiseonerr=False)
    if result.rc != 0:
        complain(result.err)

> When you want to capture text on both stdout and stderr:
> 
>     rc, out, err = run(args,
>                        capture_stdout=True,
>                        capture_stderr=True)
> 
> When you want to capture bytes on stdout and text on stderr (think
> "certutil -L -r"):
> 
>     rc, out, err = run(args,
>                        capture_stdout=True,
>                        capture_stderr=True,
>                        raw_stdout=True)

    result = run(args, capture_stderr=True)
    data = result.raw_stdout

> When you want to capture text on both stdout and stderr, where the text
> on stdout uses different encoding (think your wget example from before):
> 
>     rc, out, err = run(args,
>                        capture_stdout=True,
>                        capture_stderr=True,
>                        raw_stdout=True)
>     out = out.decode('utf-8')

    result = run(args, capture_stderr=True)
    out = result.raw_stdout.decode('utf-8')

> And when you want to force the "C" locale:
> 
>     rc, _out, err = run(args,
>                         env={'LC_ALL': 'C'},
>                         capture_stderr=True,
>                         encoding='ascii')

It's more complicated if the existing env matters:

    env = dict(os.environ)
    env['LC_ALL'] = 'C'
    result = run(args, env=env, capture_stderr=True, encoding='ascii')

-- 
Petr Viktorin