[Pulp-dev] pulp_rpm and The Pitfalls of urljoin()
Quirin Pamp
pamp at atix.de
Thu Mar 18 07:59:49 UTC 2021
Hi,
it looks like there is now a commit on pulp_rpm that uses something named "urlpath_sanitize" instead of urljoin.
This is causing a major clash with this PR: https://github.com/pulp/pulp_rpm/pull/1896 which has replaced a lot of urljoin's with os.path.join.
My problem is that since I am not the original author of this PR (I merely adopted it), I am not sure what the exact background of the switch to "os.path.join" is.
I presume, it was necessary (and perhaps trying to solve the same issue as the "urlpath_sanitize").
I can of course rebase the path and test it out in either version (using urlpath_sanitize vs os.path.join), but if you have already put some recent thought into the intricacies of urljoin, I would appreciate your input before I do. 😉
Regards,
Quirin (quba42)
________________________________
From: pulp-dev-bounces at redhat.com <pulp-dev-bounces at redhat.com> on behalf of Grant Gainey <ggainey at redhat.com>
Sent: 15 March 2021 20:43
To: Pulp-dev <pulp-dev at redhat.com>
Subject: [Pulp-dev] pulp_rpm and The Pitfalls of urljoin()
Hey folks,
I was looking at https://pulp.plan.io/issues/7995 with an eye to fixing it for 3.12. The underlying problem can be summed up as "urljoin doesn't do what you expect if 'base' doesn't end in a '/'"[0][1]. Using urljoin() as if all it does is "concatenate these two strings with a '/' between them" is a pretty common misuse that works 'most of the time', alas.
Looking around to see if we do this anywhere else, I find that pulp_rpm uses urljoin() in a number of places that I think might be subject to the same unintended problem. However, the semantics of sync'ing RPM repositories is even more nuanced than urljoin()'s is!
I am tempted to replace urljoin() with just straightforward path-creation. Here's the list of places in pulp_rpm non-test python that uses it currently:
(pulp3) (master) ~/github/Pulp3/pulp_rpm $ find . -name \*.py | grep -v tests | xargs grep -n "urljoin("
./pulp_rpm/app/kickstart/treeinfo.py:23: url=urljoin(remote_url, namespace), silence_errors_for_response_status_codes={403, 404}
./pulp_rpm/app/models/repository.py:355: gpgkey_path = urllib.parse.urljoin(
./pulp_rpm/app/models/repository.py:358: gpgkey_path = urllib.parse.urljoin(gpgkey_path, self.base_path, True)
./pulp_rpm/app/tasks/synchronizing.py:101: downloader = remote.get_downloader(url=urljoin(url, "repodata/repomd.xml"))
./pulp_rpm/app/tasks/synchronizing.py:138: downloader = remote.get_downloader(url=urljoin(remote_url, "repodata/repomd.xml"))
./pulp_rpm/app/tasks/synchronizing.py:243: new_url = urljoin(remote_url, path)
./pulp_rpm/app/tasks/synchronizing.py:430: url=urljoin(self.data.remote_url, "repodata/repomd.xml")
./pulp_rpm/app/tasks/synchronizing.py:460: url=urljoin(self.data.remote_url, self.treeinfo["filename"]),
./pulp_rpm/app/tasks/synchronizing.py:470: url=urljoin(self.data.remote_url, path),
./pulp_rpm/app/tasks/synchronizing.py:599: url = urljoin(self.data.remote_url, package.location_href)
./pulp_rpm/app/tasks/synchronizing.py:722: repodata_url = urljoin(self.data.remote_url, record.location_href)
./pulp_rpm/app/tasks/synchronizing.py:726: self.data.updateinfo_url = urljoin(self.data.remote_url, record.location_href)
./pulp_rpm/app/tasks/synchronizing.py:731: comps_url = urljoin(self.data.remote_url, record.location_href)
./pulp_rpm/app/tasks/synchronizing.py:735: self.data.modules_url = urljoin(self.data.remote_url, record.location_href)
./pulp_rpm/app/tasks/synchronizing.py:743: url=urljoin(self.data.remote_url, record.location_href),
./pulp_rpm/app/downloaders.py:84: url = urljoin(self.url, auth_param)
(pulp3) (master) ~/github/Pulp3/pulp_rpm $
Any thoughts before I dive down this rabbithole? I'm afraid I don't even have a pocketwatch...
G
[0] https://docs.python.org/2/library/urlparse.html#urlparse.urljoin
[1] https://stackoverflow.com/a/10893427
--
Grant Gainey
Principal Software Engineer, Red Hat System Management Engineering
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20210318/272e93f7/attachment.htm>
More information about the Pulp-dev
mailing list