[Pulp-dev] pulp 3 upload API validation

Mon Jul 10 19:36:18 UTC 2017

On Mon, Jul 10, 2017 at 3:26 PM, Michael Hrivnak <mhrivnak at redhat.com>
wrote:

>
>
> On Mon, Jul 10, 2017 at 3:06 PM, Dennis Kliban <dkliban at redhat.com> wrote:
>
>> The upload API for Artifacts is going to allow users to specify the
>> artifact size and a digest. The Artifact model currently supports  'md5',
>> 'sha1', 'sha224', 'sha256', 'sha384', and 'sha512' digests.
>>
>> Do we want to let users specify more than one digest per upload? e.g. md5
>> and sha256?
>>
>
> There may be no harm in this, but it would add complexity to the
> verification and not add much value. I'd stick with just one unless there's
> a compelling reason for multiple.
>

I agree. The API is going to raise a validation exception when more than 1
digest is provided.

>
>
>>
>> Do we want to store all 6 digests for each Artifact?
>>
>
> The expensive part of calculating the digests is reading the file. As long
> as you're already reading the entire file, which we will during
> verification, you may as well stuff the bits through multiple hashers
> (digesters?) and get all the digests. Pulp 2 has a function that does this:
>
> https://github.com/pulp/pulp/blob/2.13-release/server/pulp/
> server/util.py#L327-L353
>
> But we can't always guarantee that we'll have all the checksums available,
> for at least two reasons. 1) If in the future if we want to use yet another
> algorithm, we probably won't want to run a migration that re-reads every
> file and calculates the additional digest. 2) For on-demand content, we
> don't have it locally, so we can't calculate any additional checksums until
> it gets fetched.
>
> So this may be one of those times where we use a good-ole-fashioned getter
> method that returns the requested digest if it's on the artifact,
> calculates it if not, or raises an exception if the value isn't available
> and can't be calculated.
>

For uploaded Artifacts, all of the digests will be calculated as the file
is being processed during the upload. So I don't think calculating all of
them should incur significantly more cost than just one. The code snippet
from Pulp 2 looks similar to what I am doing.

I haven't given much thought to the getter, but your idea sounds fine to
me.

Thanks,
Dennis

>
> --
>
> Michael Hrivnak
>
> Principal Software Engineer, RHCE
>
> Red Hat
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20170710/02b6715b/attachment.htm>