[Libosinfo] [RFC] Providing apps an easier way to update osinfo-db

Daniel P. Berrangé berrange at redhat.com
Wed Jul 4 15:46:32 UTC 2018


On Wed, Jul 04, 2018 at 08:46:11AM +0200, Fabiano Fidêncio wrote:
> On Tue, Jul 3, 2018 at 1:54 PM, Daniel P. Berrangé <berrange at redhat.com> wrote:
> > On Tue, Jul 03, 2018 at 01:32:28PM +0200, Fabiano Fidêncio wrote:
> >> Folks,
> >>
> >> One thing that has been discussed between Felipe Borges (GNOME Boxes
> >> maintainer) and myself is the possibility to, from an app, download
> >> and install the latest osinfo-db content without being dependent of
> >> distribution packagers and being able to ensure that the users will
> >> have the most up-to-date db as soon as a something was
> >> introduced/changed.
> >>
> >> In order to do so, we'd have to (according to my understanding):
> >> - Support URLs as arguments for osinfo-db-import:
> >>   This step would allow the apps to download the latest release from
> >> our official source;
> >
> > This is fine, and needed no matter what else is done I think.
> >
> >> - Expose a "latest" osinfo-db build:
> >>   Currently we tag our releases, make a tarball and this tarball is
> >> uploaded in pagure. The tarball's name looks like
> >> "osinfo-db-20180612.tar.xz" and, preferably, we'd like to have
> >> something like "osinfo-db-latest.tar.xz" pointing to the latest
> >> release. Actually, in an ideal world, we'd like to have a
> >> osinfo-db-latest.tar.xz pointing to the last osinfo-db commit (then,
> >> maybe, we would have to rely on gitlab infra to do that?);
> >
> > I think the idea of a -latest.tar.xz link is a mistake as it
> > encourages apps to download the content even if hasn't been changed,
> > as they can't easily see if it is a new version or not.
> 
> With the -latest naming I was envisioning that we could ship
> non-official releases, but git master.
> So, first thing, the name is, at least, misleading. :-/
> 
> Daniel,
> You think that shipping git master would also be a mistake? We can
> easily do a release per day we have patches and update the release in
> pagure without issues. But it becomes problematic in order to how
> distribute those to fedora (considering that the time waiting for
> those to reach the distro is ~2 weeks from the release day ... and I'm
> packaging it as soon as a release is done).

I'd be pretty wary of shipping git master without having any human
interaction given our current way of testing.

All our tests are done post-commit, so we don't know that things
are 100% working correctly at time of push. We've also had cases
where we simply wanted to change things before releasing them.
This is particularly important for URLs.

So I would not want applications pulling down git master based
content, at least not with our current workflow.

> > Also we must not have any applications hardcoding releases.pagure.org
> > URLs. I do not have confidence in pagure.org sticking around over the
> > long time frames RHEL lives for.
> >
> >> Ideas? Possible limitations?
> >
> > I've been thinking about live updates for quite a while - ever since I
> > created the osinfo-db-tools separate package in fact :-)
> >
> > The two key problems with any live update mechanism are scalability and
> > compatibility. We need to ensure that applications don't repeatedly
> > hit the web server to download unchanged content over & over. We also
> > neeed to ensure that we don't bake in usage of a URL that is liable
> > to change over time. Whatever we do will end up in RHEL releases that
> > can live for 10+ years, and I don't have confidence our hosting will
> > live for 10+ years without moving at least once.
> >
> > My thought was to leverage DNS as a key part of the solution.  Have DNS
> > TXT record exposing the current live update URL ie a TXT query against
> > "dbupdate.libosinfo.org"  could return something akin to
> >
> >   url=https://releases.pagure.org/libosinfo/osinfo-db-20180612.tar.xz,version=20180612
> >
> > The locally install osinfo-db content always creates a file VERSION in
> > the root of the install location. So that content can be compared against
> > the DNS result to see if it is outdated of not, and thus avoid hitting
> > the web server at all unless there's genuinely a change that needs
> > acquiring.
> >
> > I would anticipate some osinfo-db-update-check tool to perform this work
> > and print out a URL, that could then be given to osinfo-db-import.
> 
> I like the idea!
> 
> Who's the owner of libosinfo.org? Daniel? Red Hat?

I own & host it.

> Daniel,
> Did you start poking at this in the past? If not, it's something I can
> slowly starting to take a look at.

No I've not done any active work on it.

> > It feels like a cron job to kick off periodic updates would be nice, but
> > the downside with that is that it synchronizes a DOS attack from all
> > deployed hosts (modulo timezone differences).
> 
> The idea about what will be used on Boxes is something to be discussed
> during its hackfest that will  happen next Monday.
> I'm expecting that for generating the flatpack a cron job will be
> used. I'm not sure whether we want to also have an option to update
> the osinfo-db on non-flatpak'ed apps (aka, the ones normally shipped
> on OSes), although I do believe GNOME Boxes should!
> 
> >
> > Security is, however, a key issue here. For this to be practical at the
> > very least it would likely require either DNSSEC, or embedding a crypto
> > signature in the result. The latter would however require a long term
> > pub key to be embedded in the client, and I'm not a fan of that.
> >
> > There is a concern of reinventing the wheel here. There is some kind of
> > industry standard I vaguely recall for doing software/content updates
> > over the internet. I've not investigated that in enough detail to
> > understand if it avoids the scalability & compatibility concerns I
> > have though
> 
> Hmm. Right, it makes sense. I'll check what the standard is and how we
> could take advantage of that.

This is the thing I was remembering:

   https://theupdateframework.github.io/

It is, however, fairly complicated spec. It avoids apps pulling down the
actual content when there is no change, but apps will still be querying
the server for metadata about the content.  Their approach to this appears
to expect you to setup mirroring/redundancy in your hosting solution.

This is fine for big orgs like distros or commercial vendors with money
to spend, but its more limiting for a small scale project like libosinfo,
which could none the less see notable traffic growth.

We're a smaller scale in terms of data size, but I'm wary of the fate
that happened with LVFS where amazon bills ended up being $100/month

  https://blogs.gnome.org/hughsie/2018/04/03/the-lvfs-cdn-will-change-soon/

Having read more about TUP spec though, I can see it covers many security
flaws that would affect the "simple" DNS based impl I mentioned. So I'm
not too sure how to go forward.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




More information about the Libosinfo mailing list