rawhide report: 20050405 changes

Konstantin Ryabitsev mricon at gmail.com
Tue Apr 5 14:38:36 UTC 2005


On Apr 5, 2005 10:17 AM, Daniel Veillard <veillard at redhat.com> wrote:
>   Is that worth adding yet another XML Parser package to the distribution
> used by a single tool ? 

Yes, I believe it is. 2-3 times faster in my book is definitely "worth it."

> Is there a compatibility layer to still use libxml2 ?

I don't mean to come across as rude, but libxml2 has a very clunky
non-pythonic API. I'd choose cElementTree if only because I don't have
to use MethodNamesThatStartWithCapsAgainstAllConventions(). It also
has sensible error reporting (i.e. not just segfault, which is not
useful with python).

In other words, cElementTree feels like a Python library, as opposed
to libxml2, which is very obviously a set of bindings to a C API done
as an afterthought.

> If I remember correctly, the performance problem wasn't libxml2 itself
> but the specific usage within yum, i.e. collecting the data, libxml2 by
> itself is parsing the megabyte sized file in less than a tenth of a second.

I believe it wasn't "within yum" it was "within python," specifically
going from C strings to python strings, which took a lot of resources.
That's all that matters to yum, since, well, it's written in python,
and cElementTree outperformed libxml2 in our tests and resulted in
much nicer code. I'm the one who did the testing and convincing, so
all blame and hatemail should be aimed at me.

> I'm surprized the solution ends up going to use a python specific library
> instead of trying to find why the interface between libxml2 and yum generated
> that problem. I don't remember you saying you would switch library as a result.

cElementTree (part of python-elementtree) is not a python-specific
library. It's a python interface to expat, and a very well-designed
one. It has fewer features than libxml2, for sure, but it's far more
pleasant to use in python.

Kind regards,
-- 
Konstantin Ryabitsev
Zlotniks, INC




More information about the fedora-devel-list mailing list