[Pulp-list] Download analytics from CDN?

Danny Sauer danny.sauer at konghq.com
Tue Apr 20 03:49:50 UTC 2021


That patch broke signed cloudfront URLs, as the S3 content-disposition
query string has to be included in the URL which gets signed.  Sigh.  So,
there's a PR upstream at
https://github.com/jschneier/django-storages/pull/1004 which resolves
that.  PR 1003 also allows the signing key to work easily when passed in
via env variable.  There's a forked django-storages with both incorporated
at https://github.com/Kong/django-storages/tree/release/kong-prod, should
someone else need to use that before upstream integrates the changes.  It's
fairly easy to just `pip3 install "django-storages[boto3] @ git+
https://github.com/Kong/django-storages@release/kong-prod"` instead of the
usual documented step.  Hope that helps someone. :)

--Danny

On Fri, Apr 16, 2021 at 11:35 AM Danny Sauer <danny.sauer at konghq.com> wrote:

> FWIW, I worked around this by just patching pulp in our adjusted docker
> build.
>
> https://github.com/Kong/docker-pulp/blob/main/pulp-core/patches/content_parameter_filename_fix.patch
>
> Upstream patch hasn't been submitted yet because I'm still scrambling to
> get this implemented before our current hosted provider goes away.  Which
> is also why it took a week to share the workaround I had in place last week
> (and why my documentation PRs still don't have issues associated). :D
>
> I know the project is preferring to go the Kube operator / Ansible route,
> but speaking of Docker and Kubernetes and a CDN, we do have a helm chart
> for this whole thing that I'm hoping we can open source soon as well.
> Someday...
>
> --Danny
>
> On Wed, Apr 7, 2021 at 10:39 AM David Davis <daviddavis at redhat.com> wrote:
>
>> Interesting. Keep us posted.
>>
>> David
>>
>>
>> On Tue, Apr 6, 2021 at 9:37 PM Danny Sauer <danny.sauer at konghq.com>
>> wrote:
>>
>>> Thanks for following up. Yes, the query string *should* be there. I
>>> found this bug last week when I was looking in to it, though (basically,
>>> telling Django-storages to use cloudfront breaks the query string appending
>>> code). I'm back from away-from-keyboard vacation tomorrow, and should be
>>> able to get a some patches sent upstream. :)
>>>
>>> https://github.com/jschneier/django-storages/issues/997
>>>
>>> --Danny
>>>
>>> On Tue, Apr 6, 2021, 2:07 PM David Davis <daviddavis at redhat.com> wrote:
>>>
>>>> Hi Danny,
>>>>
>>>> I don't know much about AWS logging but Pulp does set the filename in
>>>> the response-content-disposition[0]. Could that be used to determine the
>>>> filename for each request?
>>>>
>>>> If not, I'm looking at the boto3 docs for get_object[1] to see if
>>>> there's another parameter we could set to help you track the filename in
>>>> requests but I'm seeing anything useful. My knowledge of s3 is a bit
>>>> limited so if you have a suggestion how we can construct a request to S3
>>>> that would help you to track the filenames of requests to s3, I could
>>>> probably look at how we could support it in Pulp 3.
>>>>
>>>> [0]
>>>> https://github.com/pulp/pulpcore/blob/f38f955425b185749b3c8d4d878a7e166cfc05b9/pulpcore/content/handler.py#L613-L614
>>>> [1]
>>>> https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.get_object
>>>>
>>>> David
>>>>
>>>>
>>>> On Tue, Mar 30, 2021 at 10:43 AM Danny Sauer <danny.sauer at konghq.com>
>>>> wrote:
>>>>
>>>>> I've got Pulp set up to serve all the content from S3 behind
>>>>> CloudFront.  This works really well, except for a minor issue: the content
>>>>> URLs are all the UUIDs for artifacts, not, for example, the pretty name of
>>>>> the RPM being downloaded.  That's an issue in my situation because we'd
>>>>> really like to generate download analytics using off-the-shelf tools which
>>>>> consume the AWS CDN standard log format.
>>>>>
>>>>> My initial thought was that it might be easy to have the redirects
>>>>> include a query string in the generated URL which notes the original
>>>>> filename or relative path requested.  But I don't have sufficiently
>>>>> developed Django skills to know the easiest way to do that (or if it's even
>>>>> reasonable to think that's easy).  Using the content server's logs is
>>>>> another option, but I have some other content on the same S3 bucket which
>>>>> may not necessarily be reached solely through Pulp's content server, so
>>>>> that means two log locations, etc.  If it was easy to make Django /
>>>>> Gunicorn log to an S3 bucket in a manner similar to Cloudfront, that might
>>>>> also be ok.  Post-processing logs with a series of API calls to work out
>>>>> what artifact maps to what repository content would ideally be a last
>>>>> resort.
>>>>>
>>>>> Anyone have some great insights which might help me out here? :)  If
>>>>> it helps, I'm building my own Docker images which ultimately run in EKS.
>>>>> So patches / extra modules are an option, but I'd prefer to stay as close
>>>>> to vanilla upstream as possible with environment variable-based config
>>>>> adjustments.
>>>>>
>>>>> Thanks.
>>>>> --Danny
>>>>> _______________________________________________
>>>>> Pulp-list mailing list
>>>>> Pulp-list at redhat.com
>>>>> https://listman.redhat.com/mailman/listinfo/pulp-list
>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-list/attachments/20210419/6ad8c334/attachment.htm>


More information about the Pulp-list mailing list