[Pulp-list] Pulp and package.io

Somers-Harris, David | David | OPS david.somers-harris at mail.rakuten.com
Sun Feb 1 08:00:31 UTC 2015


Thanks for this Michael!

This gives me a little more hope :).

Yes, Chef (as with Puppet) likes to bundle its own dependencies so that they don't have to worry about dependency issues for each distro which is probably why their rpm is so large...

In Foreman when I type in "https://packagecloud.io/chef/stable/el/6/x86_64/" into Repo Discovery it comes back with nothing - do you think the rpm/metadata size issue is the cause for this?

Here are the current pulp packages I have installed with Foreman - I noticed that you tested with 2.5 and I have 2.4... that wouldn't be an issue would it?

pulp-admin-client-2.4.0-1.el7.noarch
pulp-katello-0.3-3.el7.noarch
pulp-nodes-common-2.4.0-1.el7.noarch
pulp-nodes-parent-2.4.0-1.el7.noarch
pulp-puppet-plugins-2.4.0-1.el7.noarch
pulp-puppet-tools-2.4.0-1.el7.noarch
pulp-rpm-admin-extensions-2.4.0-1.el7.noarch
pulp-rpm-plugins-2.4.0-1.el7.noarch
pulp-selinux-2.4.0-1.el7.noarch
pulp-server-2.4.0-1.el7.noarch
python-isodate-0.5.0-4.pulp.el7.noarch
python-kombu-3.0.15-10.pulp.el7.noarch
python-pulp-bindings-2.4.0-1.el7.noarch
python-pulp-client-lib-2.4.0-1.el7.noarch
python-pulp-common-2.4.0-1.el7.noarch
python-pulp-puppet-common-2.4.0-1.el7.noarch
python-pulp-rpm-common-2.4.0-1.el7.noarch
rubygem-hammer_cli_katello-0.0.6-1.el7.noarch

Thanks,
David

-----Original Message-----
From: Michael Hrivnak [mailto:mhrivnak at redhat.com] 
Sent: Sunday, February 01, 2015 5:53 AM
To: Somers-Harris, David | David | OPS
Cc: pulp-list at redhat.com
Subject: Re: [Pulp-list] Pulp and package.io

David,

Thanks for asking about this. Pulp will happily sync any valid yum repository.

Using a variation of the link you provided [0], I was able to sync one of their repositories with pulp 2.5 (with one catch that I'll get to in a moment). However, they don't make it easy to figure out what repository URL to use. I had to hack up their "installer" script [1] to see what link it would generate.

FWIW, it seems that their "el/7/x86_64/" repository has no packages. You can download the XML file [2] that contains a package list and see that it has no entries. If you tried to sync this, there would be no errors, but also no packages retrieved. Perhaps that was a source of confusion?

Just to clarify, their implementation details of redirecting to S3 links is reasonable and should be completely transparent to the user of any HTTP client, yum and pulp included. For anyone who wants to understand what's going on under the hood, see the example [3] below. Accessing a file's URL returns a 302 redirect to a time-bombed S3 URL. I don't know why they're using signed/expiring URLs when the links on packagecloud.io are wide-open, but there's certainly no harm.

Now for the catch... some of these rpms are huge. Not just in bytes (~140MB), but tens of thousands of files. Many have 50,000-60,000 files. It looks like there is practically an entire operating system bundled into one rpm. While this is technically possible, it's not a normal use of the rpm package format, and pulp is not able to catalog some of these rpms. The problem is that there is so much metadata (mostly the file list), that it literally won't fit into a single mongodb object. Unfortunately, we don't have a good solution right now for handling rpms that large. Ideas are welcome. In theory, we could compress the XML before saving it in the database, but I wonder what impact that would have on our publish performance.

In any case, I hope this is helpful. Let me know if you have any additional questions.

Michael

[0] https://packagecloud.io/chef/stable/el/6/x86_64/
[1] https://packagecloud.io/chef/stable/install
[2] https://packagecloud.io/chef/stable/el/7/x86_64/repodata/primary.xml.gz
[3] $ curl -I https://packagecloud.io/chef/stable/el/7/x86_64/repodata/primary.xml.gz
HTTP/1.1 302 Found
Server: nginx/1.1.19
Date: Sat, 31 Jan 2015 17:20:40 GMT
Content-Type: text/html;charset=utf-8
Content-Length: 0
Connection: keep-alive
Status: 302 Found
Location: https://packagecloud-repositories.s3.amazonaws.com/empty/rpm/primary.xml.gz?AWSAccessKeyId=AKIAI44QGWC7C5WEV4XA&Signature=Wq80Dw1MhI9kFe8OoB3puB6kJmw%3D&Expires=1422725140
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
Cache-Control: no-cache
X-Request-Id: a73f3650-74b6-4b3c-868e-3fbc4b345282
X-Runtime: 0.068724
Strict-Transport-Security: max-age=31536000
X-Frame-Options: DENY


----- Original Message -----
> From: "David Somers-Harris | David | OPS" 
> <david.somers-harris at mail.rakuten.com>
> To: pulp-list at redhat.com
> Sent: Saturday, January 31, 2015 9:07:51 AM
> Subject: [Pulp-list] Pulp and package.io
> 
> 
> 
> Hello,
> 
> 
> 
> I’m trying to sync the Chef repository (hosted in package.io) into 
> Foreman
> (theforeman.org) which uses Pulp but I’m not having any luck.
> 
> When I contacted support at package.io, it turns out that they store 
> everything in S3 storage in the background and only expose over http 
> what yum needs to be able to see the packages.
> 
> 
> 
> However, this is apparently not enough for pulp to be able to do 
> repo-discovery and sync the repository.
> 
> What does pulp expect when it’s looking at a repository?
> 
> (e.g. it looks like pulp breaks if the actual URI of the rpm is not 
> the same as the URI of the directory structure)
> 
> Are these expectations documented somewhere?
> 
> 
> 
> In short I want to give the guys at package.io a list of what pulp 
> expects to see if there is anything they can do about supporting it.
> 
> 
> 
> 
> Thanks,
> 
> David
> 
> 
> 
> 
> 
> From: support.16458.940aef3ec148f754 at helpscout.net
> [mailto:support.16458.940aef3ec148f754 at helpscout.net] On Behalf Of 
> packagecloud.io support
> Sent: Friday, January 30, 2015 1:45 AM
> To: Somers-Harris, David | David | OPS
> Subject: Re: Repo Syncing
> 
> 
> 
> 	
> 	
> 	Joe
> 	
> 
> Jan 29 4:44pm
> 	
> 
> 
> Yes we are using S3. It's likely that pulp and similar tools would use 
> the actual metadata found in the repository as opposed to traversing 
> the directory structure itself.
> 
> Can you share some example URLs that work and I can show you similar 
> URLs on packagecloud? In theory, pulp should simply need to know where 
> to find the yum metadata and everything else will be taken care of itself.
> 
> --
> Joe Damato
> support at packagecloud.io
> 
> 	
> 	
> 	
> 	David | David | Ops Somers-Harris
> 	
> 
> Jan 29 9:34am
> 	
> 
> 
> Hi Joe,
> 
> 
> 
> Thanks for the reply.
> 
> 
> 
> Foreman uses Pulp for handling its repositories.
> 
> http://www.pulpproject.org/
> 
> 
> 
> I think it basically does an http scrub with something similar to rsync.
> 
> We don't mind hosting large amount of data locally, it gives us more 
> control and reduces our bandwidth.
> 
> 
> 
> I think Package Cloud would either need to simulate the full directory 
> over http or Pulp would need to have a plugin to understand your API.
> 
> Do you use object storage compatible with S3?
> 
> 
> 
> 
> Regards,
> 
> David Somers-Harris
> 
> Global Operations Department
> 
> 
> 	
> 	
> 	
> 	Joe
> 	
> 
> Jan 26 7:58am
> 	
> 
> 
> Hi David:
> 
> No, that's not possible because packagecloud doesn't work that way -- 
> there are no actual directories mapped to a filesystem as you would 
> get if you were using createrepo. I have no idea how Foreman works, 
> but if you can provide more details on how Foreman's syncing/mirroring 
> works, I can probably help you figure out what you need to do to accomplish this.
> packagecloud serves up files and metadata at URLs that yum and apt 
> expect but those URLs are just an abstraction over how we store the data.
> 
> Keep in mind that packagecloud is actually able to retain all previous 
> versions of uploaded packages, which means that if you are mirroring 
> the entire Chef Stable Enterprise Linux repository for any individual 
> version of Enterprise Linux, you will be consuming *considerable* disk 
> space on your side.
> 
> --
> Joe Damato
> support at packagecloud.io
> 
> 	
> 	
> 	
> 	David | David | Ops Somers-Harris
> 	
> 
> Jan 26 7:24am
> 	
> 
> 
> Hello,
> 
> 
> 
> 
> I would like to see directory listing under 
> https://packagecloud.io/chef/stable/el so that I can sync to my local 
> repo into Foreman .
> Is this possible?
> 
> 
> 
> 
> Thanks,
> David
> 
> 	
> 	
> 	
> 
> 
> 
> 
> {#HS:67557052-1014#}
> 
> 
> 
> 
> _______________________________________________
> Pulp-list mailing list
> Pulp-list at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-list




More information about the Pulp-list mailing list