[Freeipa-devel] [PATCH 66] Replace broken i18n shell test with Python test

John Dennis jdennis at redhat.com
Thu Mar 22 21:13:39 UTC 2012


On 03/22/2012 02:59 PM, Petr Viktorin wrote:
> On 03/22/2012 06:01 PM, John Dennis wrote:
>> This addresses tickets 2044 and 1958. Ticket 2044 was on my short list
>> but the code had already been written for ticket 1958 (validation of
>> substitution variables). It had been sitting in my home directory
>> waiting for integration so since the original 2044 ticket required me to
>> update test_i18n.py it seemed like a perfect opportunity to get ticket
>> 1958 done as well, it only added 2 hours of additional work.
>
> This is great!
>
>> Summary:
>>
>> In the install/po directory you can now do this:
>>
>> % make test
>> % make validate
>>
>> Details (aside from what is in the commit message):
>>
>> Also attached (to prevent line wrapping) is an example of validating the
>> Spanish translation. It gives a good example of the type of errors it's
>> capable of finding. This is not abstract, mistakes like these caused us
>> to throw errors in different locales and since we normally test only
>> with English it's important to assure the other languages do not cause
>> Python errors.
>>
>> We should consider adding the i18n validation to our top level
>> validation logic so don't allow problems to creep in (our original
>> non-translated strings are checked too and are just as capable of having
>> errors as the translated strings).
>
> Errors in English strings should be caught by regular tests. But yes, if
> we ship translations in the repo, the main test suite should check them.
> (If it doesn't take minutes to run).
>
>> When I ran the validation on our complete set of files a lot of errors
>> were reported for
>>
>> ipalib/plugins/hbactest.py:27
>> msgid has whitespace between $ and variable in '$ ipa '
>>
>> This is due to a doc string with shell command examples, the $
>> is the shell prompt. The validation code looks for $ followed by a word
>> token (these are template substitutions). We have a lot of variable and
>> template substitutions in our strings, any of which could be malformed
>> and cause serious run-time problems. I don't know of any way to have the
>> validation logic ignore the use of $ as a shell prompt without adding a
>> special case exception. Since shell prompts are rarity in strings my
>> suggestion would be to pick a different shell prompt character,
>> something other than $ and be done with the issue and not try to
>> over-engineer the validation logic.
>
> No need to test this case -- if there's whitespace, it's not a shell
> substitution.
> If it's wrong in English, we should notice.

Unfortunately reality is different, people make mistakes. In fact I've 
already had to fix a mistake like this previously. To the best of my 
knowledge we currently do not have anything which validates 
substitutions in the source code.

There is actually very little difference in between pot files and po 
files. The syntax checker was originally written to validate po files, 
the po files derive from the pot file. If we can prevent errors from 
creeping into the pot file before each translator copies the error into 
their respective po file we've saved both ourselves and the translators 
work. So it makes sense to me to run the syntax checks on the pot file 
as well.

> If it's wrong in another
> language, it should be caught by checking if the translation contains
> all necessary substitutions.

In fact that is exactly one of the checks which are performed. But the 
check depends on it being correct in the pot file. If it's wrong in the 
pot file too we'll miss the error, another reason to validate the pot 
file as well.

> (In fact, all these syntax tests are redundant. But they do make nice
> error messages.)

I don't understand how they are redundant.

>>
>
> Some of the option parsing code is reinventing the wheel a bit. The rest
> of the project uses optparse.OptionParser, it'd be nice to switch to that.
> Particularly the global config dict is a bad idea. Don't assume all of
> this will only ever run as a script. Give the constants their own
> variables (and advertise that they're constants, as per
> http://www.python.org/dev/peps/pep-0008/#constants).

O.K. I can change how options are parsed (although getopt is a standard 
Python module). I can also modify how the parameters are stored but to 
my mind it makes perfect sense to have them be part of a configuration 
block (dict). If this code were ever used outside this script passing a 
config dict to control it would seem to make sense. Plus the idea was at 
some point we might want a command line option to change one of them, 
the approach is flexible in that regard.
>
>
> In validate_substitutions_match, you may be over-validating. Why does a
> given substitution need to appear the same number of times in both
> strings? Making sure translators include all needed info is nice, but
> outside of that, give them some freedom.

O.K. good point, appearing the same number of times might be too 
restrictive.

>
> A nice sanity test I've seen in Qt is checking if the original and
> translation strings end in the same punctuation ('.', '?', '!', ':').
> Would it make sense to implement that?

I'm not sure about that, I'm not a linguist but different languages use 
different punctuation and order things differently. I'd rather leave 
that to the translator to get right.

These checks are not designed to assure a translation is linguistically 
correct, rather their prime indent is to assure they are programatically 
correct. What we don't want is to throw exceptions at run time for a 
particular language because the translator mangled what is essentially 
program variables. I've chased down these run time problems more than 
once in both IPA and Dogtag (Certificate Server). I've seen cases where 
the entire application aborts because of a bad translation, we really 
want to catch those problems up front because our normal test procedures 
won't.

>
>
> I saw some style issues, e.g.:
> - in places such as this:
>     >  raise ValueError("Translated string \"%s\" minus the first&  last
> character is not equal to msgid \"%s\"" % \
>     >   (translated.encode('utf-8'), msgid))
>
>     line continuation backslashes are unnecessary inside parentheses.
> Also you can mix single and double quotes to avoid backslashes in strings.
> - {'':'', '(':')', '{':'}'} should have spaces after the colons
> - in entry_seperator   = '-' * page_width there's too much whitespace

Yes, all good style issues, I'll fix them.

>
> All of this is quite a lot of nontrivial code; some tests for the
> validation itself would be nice.

Agreed, but I'm not sure adding more work to this is appropriate. I 
suspect both Rob and Dmitri would suggest time is better spent elsewhere.
>
>
> All in all, nice work! I hope I don't sound too negative, I really like
> where this is going.
>

Thank you and thank you for a though review. I'll update the patch 
shortly to address the issues you raised.

-- 
John Dennis <jdennis at redhat.com>

Looking to carve out IT costs?
www.redhat.com/carveoutcosts/




More information about the Freeipa-devel mailing list