<div dir="ltr">I've been bitten by this a couple times. I also noticed that ansible actually defines its own version of urljoin:<div><br></div><div><a href="https://github.com/ansible/ansible/blob/00bd0b893d5d21de040b53032c466707bacb3b93/lib/ansible/galaxy/api.py#L166-L167">https://github.com/ansible/ansible/blob/00bd0b893d5d21de040b53032c466707bacb3b93/lib/ansible/galaxy/api.py#L166-L167</a><br clear="all"><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div dir="ltr"><div dir="ltr"><div><br></div><div>David</div></div></div></div></div></div></div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Mar 15, 2021 at 3:44 PM Grant Gainey <<a href="mailto:ggainey@redhat.com">ggainey@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hey folks,<div><br></div><div>I was looking at <a href="https://pulp.plan.io/issues/7995" target="_blank">https://pulp.plan.io/issues/7995</a> with an eye to fixing it for 3.12. The underlying problem can be summed up as "urljoin doesn't do what you expect if 'base' doesn't end in a '/'"[0][1]. Using urljoin() as if all it does is "concatenate these two strings with a '/' between them" is a pretty common misuse that works 'most of the time', alas.</div><div><br></div><div>Looking around to see if we do this anywhere else, I find that pulp_rpm uses urljoin() in a number of places that I think might be subject to the same unintended problem. However, the semantics of sync'ing RPM repositories is even <b>more</b> nuanced than urljoin()'s is!</div><div><br></div><div>I am tempted to replace urljoin() with just straightforward path-creation. Here's the list of places in pulp_rpm non-test python that uses it currently:</div><div><br></div><div><font size="1" face="monospace">(pulp3) (master) ~/github/Pulp3/pulp_rpm $ find . -name \*.py | grep -v tests | xargs grep -n "urljoin(" <br>./pulp_rpm/app/kickstart/treeinfo.py:23:            url=urljoin(remote_url, namespace), silence_errors_for_response_status_codes={403, 404}<br>./pulp_rpm/app/models/repository.py:355:                gpgkey_path = urllib.parse.urljoin(<br>./pulp_rpm/app/models/repository.py:358:                gpgkey_path = urllib.parse.urljoin(gpgkey_path, self.base_path, True)<br>./pulp_rpm/app/tasks/synchronizing.py:101:    downloader = remote.get_downloader(url=urljoin(url, "repodata/repomd.xml"))<br>./pulp_rpm/app/tasks/synchronizing.py:138:    downloader = remote.get_downloader(url=urljoin(remote_url, "repodata/repomd.xml"))<br>./pulp_rpm/app/tasks/synchronizing.py:243:            new_url = urljoin(remote_url, path)<br>./pulp_rpm/app/tasks/synchronizing.py:430:                url=urljoin(self.data.remote_url, "repodata/repomd.xml")<br>./pulp_rpm/app/tasks/synchronizing.py:460:                    url=urljoin(self.data.remote_url, self.treeinfo["filename"]),<br>./pulp_rpm/app/tasks/synchronizing.py:470:                    url=urljoin(self.data.remote_url, path),<br>./pulp_rpm/app/tasks/synchronizing.py:599:                url = urljoin(self.data.remote_url, package.location_href)<br>./pulp_rpm/app/tasks/synchronizing.py:722:        repodata_url = urljoin(self.data.remote_url, record.location_href)<br>./pulp_rpm/app/tasks/synchronizing.py:726:        self.data.updateinfo_url = urljoin(self.data.remote_url, record.location_href)<br>./pulp_rpm/app/tasks/synchronizing.py:731:        comps_url = urljoin(self.data.remote_url, record.location_href)<br>./pulp_rpm/app/tasks/synchronizing.py:735:        self.data.modules_url = urljoin(self.data.remote_url, record.location_href)<br>./pulp_rpm/app/tasks/synchronizing.py:743:                url=urljoin(self.data.remote_url, record.location_href),<br>./pulp_rpm/app/downloaders.py:84:            url = urljoin(self.url, auth_param)<br>(pulp3) (master) ~/github/Pulp3/pulp_rpm $</font></div><div> <br></div><div><div>Any thoughts before I dive down this rabbithole? I'm afraid I don't even have a pocketwatch...</div><div><br></div><div>G</div><div><br></div><div>[0] <a href="https://docs.python.org/2/library/urlparse.html#urlparse.urljoin" target="_blank">https://docs.python.org/2/library/urlparse.html#urlparse.urljoin</a></div><div>[1] <a href="https://stackoverflow.com/a/10893427" target="_blank">https://stackoverflow.com/a/10893427</a></div>-- <br><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><div>Grant Gainey</div><div>Principal Software Engineer, Red Hat System Management Engineering</div></div></div></div></div></div></div>
_______________________________________________<br>
Pulp-dev mailing list<br>
<a href="mailto:Pulp-dev@redhat.com" target="_blank">Pulp-dev@redhat.com</a><br>
<a href="https://listman.redhat.com/mailman/listinfo/pulp-dev" rel="noreferrer" target="_blank">https://listman.redhat.com/mailman/listinfo/pulp-dev</a><br>
</blockquote></div>