[Pulp-list] Download analytics from CDN?
daviddavis at redhat.com
Wed Apr 7 15:39:06 UTC 2021
Interesting. Keep us posted.
On Tue, Apr 6, 2021 at 9:37 PM Danny Sauer <danny.sauer at konghq.com> wrote:
> Thanks for following up. Yes, the query string *should* be there. I found
> this bug last week when I was looking in to it, though (basically, telling
> Django-storages to use cloudfront breaks the query string appending code).
> I'm back from away-from-keyboard vacation tomorrow, and should be able to
> get a some patches sent upstream. :)
> On Tue, Apr 6, 2021, 2:07 PM David Davis <daviddavis at redhat.com> wrote:
>> Hi Danny,
>> I don't know much about AWS logging but Pulp does set the filename in the
>> response-content-disposition. Could that be used to determine the
>> filename for each request?
>> If not, I'm looking at the boto3 docs for get_object to see if there's
>> another parameter we could set to help you track the filename in requests
>> but I'm seeing anything useful. My knowledge of s3 is a bit limited so if
>> you have a suggestion how we can construct a request to S3 that would help
>> you to track the filenames of requests to s3, I could probably look at how
>> we could support it in Pulp 3.
>> On Tue, Mar 30, 2021 at 10:43 AM Danny Sauer <danny.sauer at konghq.com>
>>> I've got Pulp set up to serve all the content from S3 behind
>>> CloudFront. This works really well, except for a minor issue: the content
>>> URLs are all the UUIDs for artifacts, not, for example, the pretty name of
>>> the RPM being downloaded. That's an issue in my situation because we'd
>>> really like to generate download analytics using off-the-shelf tools which
>>> consume the AWS CDN standard log format.
>>> My initial thought was that it might be easy to have the redirects
>>> include a query string in the generated URL which notes the original
>>> filename or relative path requested. But I don't have sufficiently
>>> developed Django skills to know the easiest way to do that (or if it's even
>>> reasonable to think that's easy). Using the content server's logs is
>>> another option, but I have some other content on the same S3 bucket which
>>> may not necessarily be reached solely through Pulp's content server, so
>>> that means two log locations, etc. If it was easy to make Django /
>>> Gunicorn log to an S3 bucket in a manner similar to Cloudfront, that might
>>> also be ok. Post-processing logs with a series of API calls to work out
>>> what artifact maps to what repository content would ideally be a last
>>> Anyone have some great insights which might help me out here? :) If it
>>> helps, I'm building my own Docker images which ultimately run in EKS. So
>>> patches / extra modules are an option, but I'd prefer to stay as close to
>>> vanilla upstream as possible with environment variable-based config
>>> Pulp-list mailing list
>>> Pulp-list at redhat.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Pulp-list