Planning a future L10N infrastructure (including Fedora)

Mon Sep 15 06:09:22 UTC 2008

Hi infrastructure wranglers,

(cc transifex-devel)

Over the last few months, a few of us involved in Red Hat L10N engineering 
have discussed how to best ensure we have Localisation Infrastructure and 
Tools that can serve the needs of Red Hat, JBoss, Fedora and 'upstream' 
communities  in years to come. Let me first describe some of the background and 
requirements behind this project:

Up until now, we have managed translations through version control systems 
such as CVS, Svn and Git. This has ensured that all contributions are pushed 
upstream, as we always store translations within the upstream repositories and 
projects. 'Damned Lies' further gave us a tool to view language-specific 
translation statistics for modules, branches and releases, as well as 
convenient information about people, teams and projects. This has been a great 
help for translators in their work. Dimitris' (and others) work on Transifex 
has in addition given the translation community a way to submit translations 
upstream without ever touching a developer-centric version control system, 
which has been of great help to translators. 

Some of the immediate needs that could be addressed within the existing 
framework (some of which are on the Transifex roadmap) are:
- Consolidation of Damned Lies and Transifex, allowing retrieving and 
submitting translations through the same interface
- Allowing retrieving and submitting multiple-files at once (e.g. for 
translating a publican document with many PO files)
- Simple workflow on top of Transifex (porting features from Vertimus)
- Better usability and easier user registration process (Fedora specific)

Transifex is gaining some traction upstream (e.g. within Gnome), and we hope 
development will continue strong, serving Fedora and potentially other 
upstream communities.

Looking at the bigger picture, some of the core requirements we have identified 
for Red Hat and community L10N going forward are:
- Customizable Translation Workflows and integration with e.g. Content 
Authoring Workflows
- Infrastructure easily adaptable to support new File formats and project 
types (e.g. OpenOffice formats, CMS formats, DTP formats, Wiki, Dita, Java 
formats), rather than relying on 'upstream' projects to fit a certain L10N 
infrastructure.
- Managing the life-cycle of a translation project across releases and 
iterations
- Translation Reuse and Terminology Management across projects and iterations
- Job management, scoping, tracking and resourcing
- Managing and/or Tracking upstream translation projects, pushing changes back 
upstream. 

These requirements require a system where the translation lifecycle would be 
managed within 'Translation Repositories' (similar to e.g. Pootle or Launchpad 
Translations), rather than directly through e.g. upstream version control 
systems. With a repository-based approach, we would be able to track and 
manage changes to a project on a translation unit level, and manage e.g. 
translation reuse and terminology within and across projects. We could still 
retain a link with upstream repositories (like with Transifex/Damned Lies). 
However, this would not be the 'core datamodel', but on a different layer 
through plug-ins. This link to external repositories could also go beyond 
traditional version control systems, communicating with external sources like 
wikis and CMSs. 

We have evaluated a number of existing open source L10N frameworks and 
systems, but haven't found any (yet) that stands out or satisfies our needs or 
requirements as a development platform. Technology-wise, we are aiming to 
develop a Java-based(!) system, using technology such as JBoss Seam, 
Hibernate, jBPM and RichFaces. A java based platform will enable us to make 
best use of internal expertise in these technologies, as well as making use of 
technology we are developing (as open source) through collaboration with 
partners in the L10N industry.

We hope some of these requirements and ideas will excite some of you, and 
ultimately lead to something that can be of use to open source communities. 
While we have certain requirements and goals for this internally within the 
company, there is no need for this to be an 'internal' Red Hat project, and 
most of the requirements and needs overlap with those of community projects 
like Fedora. In other words, by developing this in collaboration with the 
community from a very early stage, we are more likely to develop something 
that may be of use to the greater community. 

Thoughts and comments, all sorts of comments, are very welcome. 

cheers,
asgeir frimannsson
(Senior Software Engineer, I18N Engineering, Red Hat APAC)