[Pulp-dev] pulp 3 upload API validation
Jeff Ortel
jortel at redhat.com
Tue Jul 11 18:23:31 UTC 2017
On 07/11/2017 12:27 PM, Dennis Kliban wrote:
> On Tue, Jul 11, 2017 at 1:20 PM, Brian Bouterse <bbouters at redhat.com <mailto:bbouters at redhat.com>> wrote:
>
> We should not raise a validation exception because due to semver we can't stop raising that exception
> later. Specifically if we want to ever allow double checksum being specified in the future.
>
> For the MVP, I think the choices are: only respect sha256 and ignore the rest OR do as many standard
> validators as they specify. I'm ok with either of these with a slight preference for validating as many of
> them as are specified would be the best. Since we're already handling all of the data at save time,
> pushing it through additional digest validators will cost a bit of cpu but not much additional I/O. Also
> if the user asked us to do that validation then having it demand some additional resources is OK. Also
> having feature parity with the changesets is good. The changesets can validate all standard digest
> validators so having uploads do the same would be consistent.
>
>
> I am reversing what I had previously said, and I agree that we should not raise an exception if a user
> provides more than one checksum at upload time. My current implementation checks every checksum specified by
> the user.
>
>
I don't think I understand the purpose of an upload user specifying more than 1 algorithm/digest for the
platform to validate the integrity of an uploaded file. Why would the user specify an algorithm/digest they
don't trust?
>
>
> On Mon, Jul 10, 2017 at 5:11 PM, Jeff Ortel <jortel at redhat.com <mailto:jortel at redhat.com>> wrote:
>
>
>
> On 07/10/2017 02:36 PM, Dennis Kliban wrote:
> > On Mon, Jul 10, 2017 at 3:26 PM, Michael Hrivnak <mhrivnak at redhat.com <mailto:mhrivnak at redhat.com>
> <mailto:mhrivnak at redhat.com <mailto:mhrivnak at redhat.com>>> wrote:
> >
> >
> >
> > On Mon, Jul 10, 2017 at 3:06 PM, Dennis Kliban <dkliban at redhat.com <mailto:dkliban at redhat.com> <mailto:dkliban at redhat.com
> <mailto:dkliban at redhat.com>>> wrote:
> >
> > The upload API for Artifacts is going to allow users to specify the artifact size and a digest. The
> > Artifact model currently supports 'md5', 'sha1', 'sha224', 'sha256', 'sha384', and 'sha512' digests.
> >
> > Do we want to let users specify more than one digest per upload? e.g. md5 and sha256?
> >
> >
> > There may be no harm in this, but it would add complexity to the verification and not add much value. I'd
> > stick with just one unless there's a compelling reason for multiple.
> >
> >
> > I agree. The API is going to raise a validation exception when more than 1 digest is provided.
>
> +1
>
> >
> >
> >
> >
> >
> > Do we want to store all 6 digests for each Artifact?
> >
> >
> > The expensive part of calculating the digests is reading the file. As long as you're already reading the
> > entire file, which we will during verification, you may as well stuff the bits through multiple hashers
> > (digesters?) and get all the digests. Pulp 2 has a function that does this:
> >
> > https://github.com/pulp/pulp/blob/2.13-release/server/pulp/server/util.py#L327-L353
> <https://github.com/pulp/pulp/blob/2.13-release/server/pulp/server/util.py#L327-L353>
> > <https://github.com/pulp/pulp/blob/2.13-release/server/pulp/server/util.py#L327-L353
> <https://github.com/pulp/pulp/blob/2.13-release/server/pulp/server/util.py#L327-L353>>
> >
> > But we can't always guarantee that we'll have all the checksums available, for at least two reasons. 1) If
> > in the future if we want to use yet another algorithm, we probably won't want to run a migration that
> > re-reads every file and calculates the additional digest. 2) For on-demand content, we don't have it
> > locally, so we can't calculate any additional checksums until it gets fetched.
> >
> > So this may be one of those times where we use a good-ole-fashioned getter method that returns the
> > requested digest if it's on the artifact, calculates it if not, or raises an exception if the value isn't
> > available and can't be calculated.
> >
> >
> > For uploaded Artifacts, all of the digests will be calculated as the file is being processed during the
> > upload. So I don't think calculating all of them should incur significantly more cost than just one. The code
> > snippet from Pulp 2 looks similar to what I am doing.
>
> This functionality should be a method on the Artifact and not a util function somewhere.
>
> >
> > I haven't given much thought to the getter, but your idea sounds fine to me.
> > Thanks,
> > Dennis
> >
> >
> >
> >
> > --
> >
> > Michael Hrivnak
> >
> > Principal Software Engineer, RHCE
> >
> > Red Hat
> >
> >
> >
> >
> > _______________________________________________
> > Pulp-dev mailing list
> > Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
> > https://www.redhat.com/mailman/listinfo/pulp-dev <https://www.redhat.com/mailman/listinfo/pulp-dev>
> >
>
>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
> https://www.redhat.com/mailman/listinfo/pulp-dev <https://www.redhat.com/mailman/listinfo/pulp-dev>
>
>
>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
> https://www.redhat.com/mailman/listinfo/pulp-dev <https://www.redhat.com/mailman/listinfo/pulp-dev>
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 847 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20170711/50c4fd3a/attachment.sig>
More information about the Pulp-dev
mailing list