<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Hi,</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
it looks like there is now a commit on pulp_rpm that uses something named "urlpath_sanitize" instead of urljoin.</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
This is causing a major clash with this PR: <a href="https://github.com/pulp/pulp_rpm/pull/1896" id="LPlnk919246">
https://github.com/pulp/pulp_rpm/pull/1896</a> which has replaced a lot of urljoin's with os.path.join.</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
My problem is that since I am not the original author of this PR (I merely adopted it), I am not sure what the exact background of the switch to "os.path.join" is.</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
I presume, it was necessary (and perhaps trying to solve the same issue as the "urlpath_sanitize").</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
I can of course rebase the path and test it out in either version (using urlpath_sanitize vs os.path.join), but if you have already put some recent thought into the intricacies of urljoin, I would appreciate your input before I do. ðŸ˜‰</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Regards,</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Quirin (quba42)<br>
</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> pulp-dev-bounces@redhat.com <pulp-dev-bounces@redhat.com> on behalf of Grant Gainey <ggainey@redhat.com><br>
<b>Sent:</b> 15 March 2021 20:43<br>
<b>To:</b> Pulp-dev <pulp-dev@redhat.com><br>
<b>Subject:</b> [Pulp-dev] pulp_rpm and The Pitfalls of urljoin()</font>
<div> </div>
</div>
<div>
<div dir="ltr">Hey folks,
<div><br>
</div>
<div>I was looking at <a href="https://pulp.plan.io/issues/7995">https://pulp.plan.io/issues/7995</a> with an eye to fixing it for 3.12. The underlying problem can be summed up as "urljoin doesn't do what you expect if 'base' doesn't end in a '/'"[0][1]. Using
 urljoin() as if all it does is "concatenate these two strings with a '/' between them" is a pretty common misuse that works 'most of the time', alas.</div>
<div><br>
</div>
<div>Looking around to see if we do this anywhere else, I find that pulp_rpm uses urljoin() in a number of places that I think might be subject to the same unintended problem. However, the semantics of sync'ing RPM repositories is even
<b>more</b> nuanced than urljoin()'s is!</div>
<div><br>
</div>
<div>I am tempted to replace urljoin() with just straightforward path-creation. Here's the list of places in pulp_rpm non-test python that uses it currently:</div>
<div><br>
</div>
<div><font size="1" face="monospace">(pulp3) (master) ~/github/Pulp3/pulp_rpm $ find . -name \*.py | grep -v tests | xargs grep -n "urljoin("
<br>
./pulp_rpm/app/kickstart/treeinfo.py:23:            url=urljoin(remote_url, namespace), silence_errors_for_response_status_codes={403, 404}<br>
./pulp_rpm/app/models/repository.py:355:                gpgkey_path = urllib.parse.urljoin(<br>
./pulp_rpm/app/models/repository.py:358:                gpgkey_path = urllib.parse.urljoin(gpgkey_path, self.base_path, True)<br>
./pulp_rpm/app/tasks/synchronizing.py:101:    downloader = remote.get_downloader(url=urljoin(url, "repodata/repomd.xml"))<br>
./pulp_rpm/app/tasks/synchronizing.py:138:    downloader = remote.get_downloader(url=urljoin(remote_url, "repodata/repomd.xml"))<br>
./pulp_rpm/app/tasks/synchronizing.py:243:            new_url = urljoin(remote_url, path)<br>
./pulp_rpm/app/tasks/synchronizing.py:430:                url=urljoin(self.data.remote_url, "repodata/repomd.xml")<br>
./pulp_rpm/app/tasks/synchronizing.py:460:                    url=urljoin(self.data.remote_url, self.treeinfo["filename"]),<br>
./pulp_rpm/app/tasks/synchronizing.py:470:                    url=urljoin(self.data.remote_url, path),<br>
./pulp_rpm/app/tasks/synchronizing.py:599:                url = urljoin(self.data.remote_url, package.location_href)<br>
./pulp_rpm/app/tasks/synchronizing.py:722:        repodata_url = urljoin(self.data.remote_url, record.location_href)<br>
./pulp_rpm/app/tasks/synchronizing.py:726:        self.data.updateinfo_url = urljoin(self.data.remote_url, record.location_href)<br>
./pulp_rpm/app/tasks/synchronizing.py:731:        comps_url = urljoin(self.data.remote_url, record.location_href)<br>
./pulp_rpm/app/tasks/synchronizing.py:735:        self.data.modules_url = urljoin(self.data.remote_url, record.location_href)<br>
./pulp_rpm/app/tasks/synchronizing.py:743:                url=urljoin(self.data.remote_url, record.location_href),<br>
./pulp_rpm/app/downloaders.py:84:            url = urljoin(self.url, auth_param)<br>
(pulp3) (master) ~/github/Pulp3/pulp_rpm $</font></div>
<div> <br>
</div>
<div>
<div>Any thoughts before I dive down this rabbithole? I'm afraid I don't even have a pocketwatch...</div>
<div><br>
</div>
<div>G</div>
<div><br>
</div>
<div>[0] <a href="https://docs.python.org/2/library/urlparse.html#urlparse.urljoin">https://docs.python.org/2/library/urlparse.html#urlparse.urljoin</a></div>
<div>[1] <a href="https://stackoverflow.com/a/10893427">https://stackoverflow.com/a/10893427</a></div>
-- <br>
<div dir="ltr" class="x_gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>Grant Gainey</div>
<div>Principal Software Engineer, Red Hat System Management Engineering</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
</html>