[Pulp-list] Download analytics from CDN?
danny.sauer at konghq.com
Fri Apr 16 16:35:22 UTC 2021
FWIW, I worked around this by just patching pulp in our adjusted docker
Upstream patch hasn't been submitted yet because I'm still scrambling to
get this implemented before our current hosted provider goes away. Which
is also why it took a week to share the workaround I had in place last week
(and why my documentation PRs still don't have issues associated). :D
I know the project is preferring to go the Kube operator / Ansible route,
but speaking of Docker and Kubernetes and a CDN, we do have a helm chart
for this whole thing that I'm hoping we can open source soon as well.
On Wed, Apr 7, 2021 at 10:39 AM David Davis <daviddavis at redhat.com> wrote:
> Interesting. Keep us posted.
> On Tue, Apr 6, 2021 at 9:37 PM Danny Sauer <danny.sauer at konghq.com> wrote:
>> Thanks for following up. Yes, the query string *should* be there. I found
>> this bug last week when I was looking in to it, though (basically, telling
>> Django-storages to use cloudfront breaks the query string appending code).
>> I'm back from away-from-keyboard vacation tomorrow, and should be able to
>> get a some patches sent upstream. :)
>> On Tue, Apr 6, 2021, 2:07 PM David Davis <daviddavis at redhat.com> wrote:
>>> Hi Danny,
>>> I don't know much about AWS logging but Pulp does set the filename in
>>> the response-content-disposition. Could that be used to determine the
>>> filename for each request?
>>> If not, I'm looking at the boto3 docs for get_object to see if
>>> there's another parameter we could set to help you track the filename in
>>> requests but I'm seeing anything useful. My knowledge of s3 is a bit
>>> limited so if you have a suggestion how we can construct a request to S3
>>> that would help you to track the filenames of requests to s3, I could
>>> probably look at how we could support it in Pulp 3.
>>> On Tue, Mar 30, 2021 at 10:43 AM Danny Sauer <danny.sauer at konghq.com>
>>>> I've got Pulp set up to serve all the content from S3 behind
>>>> CloudFront. This works really well, except for a minor issue: the content
>>>> URLs are all the UUIDs for artifacts, not, for example, the pretty name of
>>>> the RPM being downloaded. That's an issue in my situation because we'd
>>>> really like to generate download analytics using off-the-shelf tools which
>>>> consume the AWS CDN standard log format.
>>>> My initial thought was that it might be easy to have the redirects
>>>> include a query string in the generated URL which notes the original
>>>> filename or relative path requested. But I don't have sufficiently
>>>> developed Django skills to know the easiest way to do that (or if it's even
>>>> reasonable to think that's easy). Using the content server's logs is
>>>> another option, but I have some other content on the same S3 bucket which
>>>> may not necessarily be reached solely through Pulp's content server, so
>>>> that means two log locations, etc. If it was easy to make Django /
>>>> Gunicorn log to an S3 bucket in a manner similar to Cloudfront, that might
>>>> also be ok. Post-processing logs with a series of API calls to work out
>>>> what artifact maps to what repository content would ideally be a last
>>>> Anyone have some great insights which might help me out here? :) If it
>>>> helps, I'm building my own Docker images which ultimately run in EKS. So
>>>> patches / extra modules are an option, but I'd prefer to stay as close to
>>>> vanilla upstream as possible with environment variable-based config
>>>> Pulp-list mailing list
>>>> Pulp-list at redhat.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Pulp-list