[Freeipa-devel] [PATCH 66] Replace broken i18n shell test with Python test

Fri Mar 23 09:59:42 UTC 2012

On 03/22/2012 10:13 PM, John Dennis wrote:
> On 03/22/2012 02:59 PM, Petr Viktorin wrote:
>> On 03/22/2012 06:01 PM, John Dennis wrote:
.. snip ..
>>
>>> When I ran the validation on our complete set of files a lot of errors
>>> were reported for
>>>
>>> ipalib/plugins/hbactest.py:27
>>> msgid has whitespace between $ and variable in '$ ipa '
>>>
>>> This is due to a doc string with shell command examples, the $
>>> is the shell prompt. The validation code looks for $ followed by a word
>>> token (these are template substitutions). We have a lot of variable and
>>> template substitutions in our strings, any of which could be malformed
>>> and cause serious run-time problems. I don't know of any way to have the
>>> validation logic ignore the use of $ as a shell prompt without adding a
>>> special case exception. Since shell prompts are rarity in strings my
>>> suggestion would be to pick a different shell prompt character,
>>> something other than $ and be done with the issue and not try to
>>> over-engineer the validation logic.
>>
>> No need to test this case -- if there's whitespace, it's not a shell
>> substitution.
>> If it's wrong in English, we should notice.
>
> Unfortunately reality is different, people make mistakes. In fact I've
> already had to fix a mistake like this previously. To the best of my
> knowledge we currently do not have anything which validates
> substitutions in the source code.
>
> There is actually very little difference in between pot files and po
> files. The syntax checker was originally written to validate po files,
> the po files derive from the pot file. If we can prevent errors from
> creeping into the pot file before each translator copies the error into
> their respective po file we've saved both ourselves and the translators
> work. So it makes sense to me to run the syntax checks on the pot file
> as well.
>

The problem is that the messages are validated as if anything that looks 
like a substitution would be substituted, regardless of actual usage. 
There can be false positives, requiring workarounds just to make the 
tests pass. Having to use a non-obvious shell prompt is a good example.
I'd like to keep the potential false positives to a minimum. Shell 
substitution with a "bare" variable doesn't even occur in our messages. 
I haven't seen $(var) either. Are these really necessary?

You can't catch all possible cases where an exception might be raised. 
For example I can add '%,' or even an extra '%s' into a string, which 
the validator won't catch. For this reason I'd prefer, for English, 
beefing up regular tests rather than running heuristic validation after 
the context is lost.

>> If it's wrong in another
>> language, it should be caught by checking if the translation contains
>> all necessary substitutions.
>
> In fact that is exactly one of the checks which are performed. But the
> check depends on it being correct in the pot file. If it's wrong in the
> pot file too we'll miss the error, another reason to validate the pot
> file as well.

>> (In fact, all these syntax tests are redundant. But they do make nice
>> error messages.)
>
> I don't understand how they are redundant.

My reasoning was that English should be validated by other tests, and 
translations by checking if the same substitutions occur. But I'm 
willing to accept that the existing tests aren't adequate.

>>>
>>
>> Some of the option parsing code is reinventing the wheel a bit. The rest
>> of the project uses optparse.OptionParser, it'd be nice to switch to
>> that.
>> Particularly the global config dict is a bad idea. Don't assume all of
>> this will only ever run as a script. Give the constants their own
>> variables (and advertise that they're constants, as per
>> http://www.python.org/dev/peps/pep-0008/#constants).
>
> O.K. I can change how options are parsed (although getopt is a standard
> Python module). I can also modify how the parameters are stored but to
> my mind it makes perfect sense to have them be part of a configuration
> block (dict). If this code were ever used outside this script passing a
> config dict to control it would seem to make sense. Plus the idea was at
> some point we might want a command line option to change one of them,
> the approach is flexible in that regard.

Optparse is also a standard module, and although deprecated, it's much 
better than getopt for this purpose.
OptionParser gives you an "options" object that you can pass around, 
exactly as you describe -- except it's not a global variable, so e.g. 
changing things won't affect subsequent runs.

About optparse deprecation -- the reason to deprecate it was that its 
API isn't very extensible, but since we don't extend it we might as well 
continue with it for now. The switch is relatively straightforward 
if/when we want to do it.
Argparse is not standard in Python <=2.6, so we'd need another external 
dependency if we want to use it.

>> In validate_substitutions_match, you may be over-validating. Why does a
>> given substitution need to appear the same number of times in both
>> strings? Making sure translators include all needed info is nice, but
>> outside of that, give them some freedom.
>
> O.K. good point, appearing the same number of times might be too
> restrictive.
>
>>
>> A nice sanity test I've seen in Qt is checking if the original and
>> translation strings end in the same punctuation ('.', '?', '!', ':').
>> Would it make sense to implement that?
>
> I'm not sure about that, I'm not a linguist but different languages use
> different punctuation and order things differently. I'd rather leave
> that to the translator to get right.
>
> These checks are not designed to assure a translation is linguistically
> correct, rather their prime indent is to assure they are programatically
> correct. What we don't want is to throw exceptions at run time for a
> particular language because the translator mangled what is essentially
> program variables. I've chased down these run time problems more than
> once in both IPA and Dogtag (Certificate Server). I've seen cases where
> the entire application aborts because of a bad translation, we really
> want to catch those problems up front because our normal test procedures
> won't.

Right, I understand the scope now. This is better left as Transifex 
warnings, if they have those.

>> I saw some style issues, e.g.:
>> - in places such as this:
>> > raise ValueError("Translated string \"%s\" minus the first& last
>> character is not equal to msgid \"%s\"" % \
>> > (translated.encode('utf-8'), msgid))
>>
>> line continuation backslashes are unnecessary inside parentheses.
>> Also you can mix single and double quotes to avoid backslashes in
>> strings.
>> - {'':'', '(':')', '{':'}'} should have spaces after the colons

After colons only, not before
{'': '', '(': ')', '{': '}'}

>> - in entry_seperator = '-' * page_width there's too much whitespace
>
> Yes, all good style issues, I'll fix them.
>
>>
>> All of this is quite a lot of nontrivial code; some tests for the
>> validation itself would be nice.
>
> Agreed, but I'm not sure adding more work to this is appropriate. I
> suspect both Rob and Dmitri would suggest time is better spent elsewhere.

I suspect otherwise, there was a push for better testing recently. 
Better ask them directly, though.

>> All in all, nice work! I hope I don't sound too negative, I really like
>> where this is going.
>>
>
> Thank you and thank you for a though review. I'll update the patch
> shortly to address the issues you raised.
>

More nitpicks:
Please add the generated files to .gitignore

 > print >>sys.stderr, "ERROR: you may not select -c or -t when 
validating substitutions"
Why not? Wouldn't it make sense to run all the tests at once?

If -s is given with no arguments, could it validate all .po files in the 
translation directory?

-- 
Petr³