[Libosinfo] RFC: Splitting off database into separate package

Daniel P. Berrange berrange at redhat.com
Thu Sep 17 12:45:49 UTC 2015


On Fri, Jul 24, 2015 at 04:54:57PM +0100, Daniel P. Berrange wrote:
> On Fri, Jul 24, 2015 at 04:50:34PM +0200, Christophe Fergeau wrote:
> > >  - Should we restructure the database ?
> > > 
> > >    eg, we have a single data/oses/fedora.xml file that contains
> > >    the data for every Fedora release. This is already 200kb in
> > >    size and will grow forever. If we split up all the files
> > >    so there is only ever one entity (os, hypervisor, device, etc)
> > >    in each XML file, each file will be smaller in size. This would
> > >    also let us potentially do database minimization. eg we could
> > >    provide a download that contains /all/ OS, and another download
> > >    that contains only non-end-of-life OS.
> > 
> > I was about to make the same comment as Zeeshan, GNOME has had issues in
> > the past with data scattered among too many small files, in general this
> > is solved by adding a cache file containing a concatenated version of
> > all the files (possibly pre-parsed to some domain-specific format).
> 
> If we can avoid loading the entire database, and only load the subset
> of files we want info on, we'd hopefully not have such problems. I
> could see benefit in having some "index" file perhaps which says
> which entity is defined in which file, as a way to avoid dictating
> a filename/dirname convention.

FYI, I wrote a simple perl script to process our current XML files
and split them up into 1 file per entity... This resulted in 438
individual XML files.

I timed libosinfo speed of loading the database with the current
database structure, and with the split structure. There as no
measurable difference in load time. I repeated  using vm.drop_caches=3
to clear the FS cache between timing, and still found no difference
in load time. So I think our load time is not dominated by the
number of files we have - most likely the XML parsing & object
allocation is our main timesink.

FWIW, with warm cache it was ~250ms, with cold cache it was 1.9s, though
in the latter number I don't know how much of that time is from loading
the ELF libraries, vs the database. Anyway, it wasn't different according
to file split.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|




More information about the Libosinfo mailing list