[libvirt] [PATCH 5/5] po: minimize & canonicalize translations stored in git

Ján Tomko jtomko at redhat.com
Thu Apr 19 07:53:01 UTC 2018


On Thu, Apr 12, 2018 at 02:28:22PM +0100, Daniel P. Berrangé wrote:
>Similar to the libvirt.pot, .po files contain line numbers and file
>names identifying where in the source a translatable string comes from.
>The source locations in the .po files are thrown away and replaced with
>content from the libvirt.pot whenever msgmerge is run, so this is not
>precious information that needs to be stored in git.
>
>When msgmerge processes a .po file, it will add in any msgids from the
>libvirt.pot that were not already present. Thus, if a particular msgid
>currently has no translation, it can be considered redundant and again
>does not need storing in git.
>
>When msgmerge processes a .po file and can't find an exact existing
>translation match, it will try todo fuzzy matching instead, marking such
>entries with a "# fuzzy" comment to alert the translator to take a
>look and either discard, edit or accept the match. Looking at the
>existing fuzzy matches in .po files shows that the quality is awful,
>with many having a completely different set of printf format specifiers
>between the msgid and fuzzy msgstr entry. Fortunately when msgfmt
>generates the .gmo, the fuzzy entries are all ignored anyway. The fuzzy
>entries could be useful to translators if they were working on the .po
>files directly from git, but Libvirt outsourced translation to the
>Fedora Zanata system, so keeping fuzzy matches in git is not much help.
>
>Finally, by default msgids are sorted based on source location. Thus, if
>a bit of code with translatable text is moved from one file to another,
>it may shift around in the .po file, despite the msgid not itself changing.
>If the msgids were sorted alphabetically, the .po files would have
>stable ordering when code is refactored.
>
>This patch takes advantage of the above observations to canonicalize
>and minimize the content stored for .po files in git. Instead of storing
>the real .po files, we now store .mini.po files.
>
>The .mini.po files are the same file format as .po files, but have no
>source location comments, are sorted alphabetically, and all fuzzy
>msgstrs and msgids with no translation are discarded. This cuts the size
>of content in the po directory from 109MB to 19MB.
>
>Users working from a libvirt git checkout who need the full .po files
>can run "make update-po", which merges the libvirt.pot and .mini.po
>file to create a .po file containing all the content previously stored
>in git.
>
>Conversely if a full .po file has been modified, for example, by
>downloading new content from Zanata, the .mini.po files can be updated
>by running "make update-mini-po". The resulting diffs of the .mini.po
>file will clearly show the changed translations without any of the noise
>that previously obscured content. Being able to see content changes
>clearly actually identified a bug in the zanata python client where it
>was adding bogus "fuzzy" annotations to many messages:
>
>  https://bugzilla.redhat.com/show_bug.cgi?id=1564497
>
>Users working from libvirt releases should not see any difference in
>behaviour, since the tarballs only contain the full .po files, not the
>.mini.po files.
>
>As an added benefit, generating tarballs with "make dist", will no
>longer cause creation of dirty files in git, since it won't touch the
>.mini.po files, only the .po files which are no longer kept in git.
>
>To avoid creating a single commit 100+MB in size, each language is
>minimized separately in a following commit.

From a brief look at those, the few Slovak "translations" are all in
English and many of the translation team pages still point to transifex,
but I assume that data comes from Zanata.

>
>Signed-off-by: Daniel P. Berrangé <berrange at redhat.com>
>---
> .gitignore               |  3 +++
> build-aux/minimize-po.pl | 37 +++++++++++++++++++++++++++++++++
> po/Makefile.am           | 30 ++++++++++++++-------------
> po/README.md             | 53 +++++++++++++++++++++++++++++++++++++++++-------
> 4 files changed, 102 insertions(+), 21 deletions(-)
> create mode 100755 build-aux/minimize-po.pl
>

Reviewed-by: Ján Tomko <jtomko at redhat.com>

Jano
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20180419/b4ca2bcc/attachment-0001.sig>


More information about the libvir-list mailing list