Planning a future L10N infrastructure (including Fedora)

Mon Sep 22 01:16:20 UTC 2008

Hi Dimitris,

Thanks for your comments.

----- "Dimitris Glezos" <dimitris at glezos.com> wrote:
> 2008/9/17 Asgeir Frimannsson <asgeirf at redhat.com>:
> > On Tuesday 16 September 2008 23:29:32 Mike McGrath wrote:
> >> > >
> >> > > Please correct me if I'm reading this wrong but I see
> "transifex is
> >> > > great or close to it" and "here's how we're going to build our
> own
> >> > > solution anyway" ?
> >> >
> >> > Yes, "Transifex is great and will continue to serve us".
> >> >
> >> > BUT:
> >> >
> >> > If you look at the state of the art in L10N outside the typical
> Linux
> >> > projects where PO and Gettext rule, you'll notice we are very
> short on
> >> > areas like: - Translation Reuse
> >> > - Terminology Management
> >> > - Translation Workflow and Project Management
> >> > - Integration with CMSs.
> >> > - Richer Translation Tools
> >> >
> >> > This is an effort in narrowing that gap, and I can't see that
> effort work
> >> > by evolving an existing tool from this 'cultural background'.
> Yes, we can
> >> > get some of the way by developing custom solutions for e.g.
> linking wikis
> >> > to Transifex for CMS integration, or using e.g. Pootle for
> web-based
> >> > translation. But we would still be limited to the core
> architecture of
> >> > the intent of the original developers, which is something that
> would
> >> > radically slow the project down.
> 
> For the record, I believe these are some fine ideas, which I would
> like to see added to Transifex as features (eg. through plugins). I
> have been discussing most of them with people around conferences for
> the past year. An example: Tx already downloaded all the translation
> files from upstream projects, so if someone requests a translation
> file, why not be able to pre-populate it using existing translations
> from all the other projects (translation reuse)?
> 
> Also, I should mention that Transifex isn't (and will never be)
> specific to a particular translation file format (eg. PO) or any
> translation repository. I'd like to support translation of both PO
> and
> XLIFF files. And also support not only VCSs, but CMSs, wiki pages and
> even arbitrary chunks of text. Transifex's goal is to be a platform
> to
> help you manage your translations.

For the record (since XLIFF is mentioned and since I'm part of the Oasis XLIFF Technical Committee), I am not aiming to design anything around XLIFF in this project, other than perhaps support XLIFF is an import/export format for resources in the same way as we support PO (we do have the odd XLIFF file coming through for translation). I don't think XLIFF (1.2) is mature enough yet as a L10N resource format.

I know there are some big ideas in transifex. In fact, when transifex is mentioned, often people refer to the *goal/idea* of transifex, rather the actual current implementation. Take for example plugins, transifex doesn't currently have a plugin system, neither does it have workflow, project management, or any concept of translation resources internally. Transifex today is a simple 'file submission system' with a growing community aiming to build it into something more. With this in mind, 'building on top of transifex' really means redefining what transifex really is. For example, 'file submission' should really be a plugin, not a core feature. That means all of transifex today (excluding maybe the login UI), should really be plugins to a core model of projects, people, etc, that currently doesn't exist. 

Defining this 'model' of a repository doesn't really depend much on the implementation, and in fact many implementations might help push this faster and ensure a better solution (if it was on the tx roadmap in the first place). And it's not like it is impossible for e.g. a java based repository to communicate with Transifex for file submissions, isn't that exactly what the remote-interface of TX (on the roadmap) is supposed to provide? What I'm hearing is "Don't build something new, continue building on the python/tg/transifex architecture", which is fully understandable. However, considering the cost of developing this on top of tx (re-architecture, convincing all that it is the right path to go, immaturity/stability of libraries for e.g. ajax, limited workflow support), I honestly think it's better with two projects that 'compliment' each other. There are more than enough tasks for everyone in the existing Tx roadmap, and the idea is bigger than what a combined development team could accomplish. Diversifying and pulling in good people from e.g. the java-side of things might even help speed things up. 

> >> Correct me if I'm wrong though, instead of forking or adapting or
> working
> >> with upstream, you are talking about doing your own thing right?
> >
> > We have a goal of where we want to see L10N infrastructure go, to
> enable us in
> > the future to provide internal (translators paid by Red Hat) and
> community
> > translators with tools to increase their productivity as well as
> better tools
> > to manage the overall L10N process. If there is an 'upstream' that
> provides
> > this, or a platform on to which we could develop this, then yes, we
> would
> > consider 'working with upstream' or (in a worst-case-scenario)
> forking
> > upstream.
> 
> The Translate Toolkit folks are a very friendly bunch, actively
> maintaining and extending the rich library, and always open to
> suggestions. Maybe some (if not all) of the features could be done in
> TT, and the rest that might not fit there, as Python libraries to
> maximize interoperability and community involvement.

Yes, I know TT very well, and have discussed the library with Dwayne Bailey (the main visionary behind the project) in the past, even before tx was born. In fact, a django-migration of Pootle (built on top of the TT) has been on the agenda for a while, and combining forces with TT is one of the other options I have been strongly considering for a repository (TT e.g. has a file submission library, and there is a lot of duplication between tt and tx). Looking at the svn activity of TT (in my rss reader), it is definetly a project with a 'dangerous' future.

> I also think that Transifex could serve as the "UI" for a lot of
> translation-specific tasks. If there's a library that does X, that
> would help people manage their translations or leverage Transifex's
> strong points of "I read a lot of repositories" and "I write to some
> repositories", then we could provide a web wrapper around it. (eg.
> search for string "X" in all translation files of language "Y", or
> "mark <this> file as a downstream of <that> and send me an msgmerged
> file whenever <that> changes".
> 
> > So to answer your question bluntly, YES - after 4 years involvement
> in
> > industry and community L10N processes - I believe we can do better.
> But
> > holding that thought, remember that this is in many ways
> 'middleware', and
> > making use of e.g. the vast amount of knowledge invested in
> Translate Toolkit
> > (file format conversions, build tools, QA) makes sense, and I'm not
> saying
> > 'forget about all that we have invested in tools so far'.
> 
> It might be my poor English or the fact that I usually read long
> mails
> at night, but despite the lengthy descriptions I still don't have a
> clear picture of exactly what problem you'd like to solve, and the
> reasoning behind the decisions being made.

I do understand there is a 'semantic gap' here, and that we do need to provide a better description and demonstration of why a new project is necessary. I do believe everything is theoretically possible to build on top of python/tg and through reuse of concepts in e.g. tx and TT, but I honestly believe if we are going to manage and drive the development effort in this, it is more worthwhile to expand beyond the fedora/python community, and use tools that the core developers would be more comfortable and productive with. This is not a 'we think you guys should develop this' request, we are taking ownership of the project, as well as inviting anyone that is interested in the community to participate and take ownership.

> Don't take me wrong -- I think there are some good ideas. But I feel
> it would be too bad if you guys didn't invest on top of existing
> tools
> (TT for file formats, Transifex for file operations and UI, OmegaT
> for
> translation memory) or just isolate specific solutionsthat don't fit
> into other projects in well-defined libraries (do one thing, to it
> right). Sure, it takes a lot more effort to work *with* other people,
> but it is usually worth it. :-)

This is *not* about an effort to avoid working with people. It is an effort to get more people working on this. I know more people in the Java community that is or might be interested in a open source solution for these problems than in the Python/Fedora/TG community. And of course adding to this a portion of my natural bias towards Java, and the fact that the people that would be working on this would initially be much more productive in Java than in Python (TG2 or django). 

With the fact that we throw this idea out to the fedora/tx community early, please take that as a sign that we are trying to work with the community, rather than simply developing something on our own. And I for one will continue being involved with Tx to some degree, and help out where I can. L10N is an area with a lot of space for improvement, and an area that has sadly been to some extent 'neglected' except for Dimitris' recent work. We still have a long way to go before we have what I would call a L10N infrastructure that serves translators well.

cheers,
asgeir