[Freeipa-devel] [PATCH] 360 be smarter about decoding certs

Fri Jan 29 16:26:43 UTC 2010

John Dennis wrote:
> On 01/29/2010 09:28 AM, Rob Crittenden wrote:
>> John Dennis wrote:
>>> On 01/28/2010 10:30 PM, Rob Crittenden wrote:
>>>> John Dennis wrote:
>>>>> On 01/28/2010 04:15 PM, Rob Crittenden wrote:
>>>>>> Gah, got the description mixed up with the last patch :-(
>>>>>>
>>>>>> Be a bit smarter about decoding certificates that might be base64
>>>>>> encoded. First see if it only contains those characters allowed 
>>>>>> before
>>>>>> trying to decode it. This reduces the number of false positives.
>>>>>
>>>>> I'm not sure the test is doing what you want or even if it's the right
>>>>> test.
>>>>>
>>>>> The test is saying "If there is one or more characters in the bas64
>>>>> alphabet then try and decode. That means just about anything will
>>>>> match, which doesn't seem like a very strong test.
>>>>>
>>>>> Why not just try and decode it and let the decoder decide if it's
>>>>> really base64, the decoder has much strong rules about the input,
>>>>> including assuring the padding is correct.
>>>>>
>>>>
>>>> The reason is I had a binary cert that was correctly decoded by the
>>>> base64 encoder. I don't know the why's and wherefores but there it is.
>>>
>>> Then testing to see if each byte is in the base64 alphabet would not
>>> have prevented this error.
>>
>> And yet it did in practice. I think you're assuming too much about the
>> input testing in base64.b64decode(). It gladly takes binary data, as
>> long as it fits the expected padding.
> 
> You're right, I just went and checked the code, it skips any char not in 
> the base64 alphabet :-(
> 
>>>
>>> For a while now I've been feeling like we need to associate a format
>>> attribute to the certificate (e.g. DER, PEM, BASE64, etc.).
>>
>> There is simply no good way to carry that extra data when all you have
>> is a blob of data. We'd still need some mechanism to look at it and ask
>> "what are you?" That or we simply reject some types of input.
> 
> My concern is that correctly deducing what an object is just by scanning 
> it's contents is not robust. As you've seen it's easy to draw the wrong 
> conclusion. Rather if the convention is "it must be an object in this 
> format" (e.g. canonical) then there is no reason to even ask the 
> question, it's simpler and more robust for most of our (internal) code, 
> we only have to worry about it at the interface boundaries.
> 
> So who enforces the canonical format? The only place we have to be 
> concerned is when it's user provided, any item we produce will be 
> guaranteed to be in the canonical format (hopefully :-). That just means 
> at our interface boundaries we *must* specify the canonical format.
> 
> If we're taking input from the user on the command line we offer them 
> the option of "input as pem", "input as der", "input as base64", try to 
> validate as best we can trusting the user has told us the correct format 
> and then convert to the canonical format.
> 
> Think about the openssl x509 utilities, with those you must specify the 
> input format.
> 
> If we're taking input through an exposed API we do essentially the same 
> thing. Require the format be passed along with the data, validate as 
> best we can, and convert to the canonical format as it enters our system.
> 
> BTW, by having the user/caller indicate the format they're providing 
> will make the validation more robust, for example if it's stated the 
> data is in DER format then there is no reason to even try to see if it 
> can be base64 decoded which might lead to a false positive. Likewise if 
> it's stated it's in pem format it must have the header and footer.
> 
> Bottom line, I'm leery of trying to guess at random points what the 
> format is, it's too easy for the guessing logic to draw the wrong 
> conclusion, I'd much rather see it be explicit.

Perhaps but validators take a single argument so there is no way to pass 
in type.

rob