[Libguestfs] [PATCH nbdkit] curl: Add effective-url flag

Laszlo Ersek lersek at redhat.com
Thu Oct 14 15:18:41 UTC 2021


On 10/13/21 12:06, Richard W.M. Jones wrote:
> Probably best to read this first:
> https://bugzilla.redhat.com/show_bug.cgi?id=2013000
> 
> This adds an effective-url=true|false flag to nbdkit-curl-plugin.  If
> true, the first time we fetch the URL, we fetch the "effective" URL
> (ie. the URL after all redirects were resolved) and use that for all
> future connections within the current nbdkit instance.
> 
> The idea behind this patch is to do with the Fedora mirror system
> which is flakey.  Some mirrors return errors or 404s.  And the mirror
> system will give you a new redirect each time (within your
> geographical region).  This results in transfers sometimes failing
> because a single read went to a bad mirror.
> 
> After implementing this patch I'm not very happy with it.  It is
> incomplete because we pass the original URL to cookie-script and
> header-script, so plugin/curl/scripts.c will also need to be modified.
> Once those modifications are done the change becomes quite invasive.
> Also I'm not convinced that it really solves any problem.  In the
> manual change I wrote:
> 
>     Note use of this feature in long-lived nbdkit instances can cause
>     subtle problems:
> 
>     •   The effective URL persists across connections for the lifetime
>         of the nbdkit instance.  If nbdkit is used for a long time then
>         it is possible for the redirected URL to become stale.
> 
>     •   It will defeat some mirror load-balancing techniques.
> 
>     •   If the mirror service sometimes redirects to a broken URL and
>         it happens that the URL you fetch first is broken then nbdkit
>         will no longer recover on subsequent connections (instead you
>         will need to restart nbdkit).
> 
> I suggested another way to solve this by using curl APIs to fetch the
> effective URL up front and passing that URL to nbdkit (see
> https://bugzilla.redhat.com/show_bug.cgi?id=2013000#c1), but
> apparently that solution isn't acceptable for unclear reasons.
> 
> I can't think of any other way to solve this in the context of nbdkit
> (maybe have it detect when redirection is happening and retry the
> redirection on error?).  So here's the patch.

Given that I've been CC'd, here's my opinion:

I strongly dislike transparent mirror selection (redirects) as a
principle (based in experience). Precisely with the Fedora mirror
system, I frequently see "dnf update" download some packages at
lightning speed, then get stuck *completely* at some other package
(potantially after spewing a bunch of "broken mirror" messages at me),
so that I have to Ctrl-C the whole command, and re-issue it. Transparent
redirects seem to want to hide the "mess" from the user, but they fail
to do that quite frequently, IMO.

The Cygwin experience is better, IMO. When you start the Cygwin package
installer (whether you do it for initial installation or for updating
packages), a fresh list of mirrors is fetched from some central location
(I think?), but then you, the user, have to pick a *specific* mirror
from that list. I always just go for fsn.hu, which I know to be a
rock-stable mirror (for many OS distros, including Fedora) in my
location. No bad surprises using that mirror, ever.

With that in mind, I wouldn't complicate *any* application to deal with
redirects / mirror selection transparently. Whatever the application
does, the user will not be happy, and will want to tweak the logic. Just
let the user pick an effective URL themselves, and stick with that forever.

This may not be great for "load balancing", but AIUI the pain point here
is failed *individual* imports. I think it should be OK to stick with a
particular fixed URL for the duration of an import.

(I should actually update my DNF repo files on Fedora to use fsn.hu as
well, I just always get discouraged by the "metalink" stuff in there,
and the hard-to-read variables (such as "$releasever", "$basearch", ...).)

>From Alexander's description, it's clear that the reliability of the
mirror network is the core issue here, and we're now pushing around the
unwanted job of hiding it from the user, from one application to the
other. I'd say let the *user* deal with it, *once*, in their
configuration, and neither application (= neither nbdkit nor the app
that starts nbdkit) should struggle with redirects.

In particular, tolerating (following) redirects per every single Range
request looks incredibly inefficient to me (not to mention, brittle).

(Sorry if my opinion is too naive.)

Thanks
Laszlo




More information about the Libguestfs mailing list