[Freeipa-devel] [PATCH] 360 be smarter about decoding certs

Fri Jan 29 15:06:08 UTC 2010

On 01/29/2010 09:28 AM, Rob Crittenden wrote:
> John Dennis wrote:
>> On 01/28/2010 10:30 PM, Rob Crittenden wrote:
>>> John Dennis wrote:
>>>> On 01/28/2010 04:15 PM, Rob Crittenden wrote:
>>>>> Gah, got the description mixed up with the last patch :-(
>>>>>
>>>>> Be a bit smarter about decoding certificates that might be base64
>>>>> encoded. First see if it only contains those characters allowed before
>>>>> trying to decode it. This reduces the number of false positives.
>>>>
>>>> I'm not sure the test is doing what you want or even if it's the right
>>>> test.
>>>>
>>>> The test is saying "If there is one or more characters in the bas64
>>>> alphabet then try and decode. That means just about anything will
>>>> match, which doesn't seem like a very strong test.
>>>>
>>>> Why not just try and decode it and let the decoder decide if it's
>>>> really base64, the decoder has much strong rules about the input,
>>>> including assuring the padding is correct.
>>>>
>>>
>>> The reason is I had a binary cert that was correctly decoded by the
>>> base64 encoder. I don't know the why's and wherefores but there it is.
>>
>> Then testing to see if each byte is in the base64 alphabet would not
>> have prevented this error.
>
> And yet it did in practice. I think you're assuming too much about the
> input testing in base64.b64decode(). It gladly takes binary data, as
> long as it fits the expected padding.

You're right, I just went and checked the code, it skips any char not in 
the base64 alphabet :-(

>>
>> For a while now I've been feeling like we need to associate a format
>> attribute to the certificate (e.g. DER, PEM, BASE64, etc.).
>
> There is simply no good way to carry that extra data when all you have
> is a blob of data. We'd still need some mechanism to look at it and ask
> "what are you?" That or we simply reject some types of input.

My concern is that correctly deducing what an object is just by scanning 
it's contents is not robust. As you've seen it's easy to draw the wrong 
conclusion. Rather if the convention is "it must be an object in this 
format" (e.g. canonical) then there is no reason to even ask the 
question, it's simpler and more robust for most of our (internal) code, 
we only have to worry about it at the interface boundaries.

So who enforces the canonical format? The only place we have to be 
concerned is when it's user provided, any item we produce will be 
guaranteed to be in the canonical format (hopefully :-). That just means 
at our interface boundaries we *must* specify the canonical format.

If we're taking input from the user on the command line we offer them 
the option of "input as pem", "input as der", "input as base64", try to 
validate as best we can trusting the user has told us the correct format 
and then convert to the canonical format.

Think about the openssl x509 utilities, with those you must specify the 
input format.

If we're taking input through an exposed API we do essentially the same 
thing. Require the format be passed along with the data, validate as 
best we can, and convert to the canonical format as it enters our system.

BTW, by having the user/caller indicate the format they're providing 
will make the validation more robust, for example if it's stated the 
data is in DER format then there is no reason to even try to see if it 
can be base64 decoded which might lead to a false positive. Likewise if 
it's stated it's in pem format it must have the header and footer.

Bottom line, I'm leery of trying to guess at random points what the 
format is, it's too easy for the guessing logic to draw the wrong 
conclusion, I'd much rather see it be explicit.

>
>> Or we need to adopt a convention that certs are always in one
>> canonical format and the interface is responsible for assuring what it
>> accepts as input is converted to the canonical form.
>
> Again, something would need to do that and base64.b64decode() is not
> sufficient.
>
> I know this seems rather hacky, I thought as much when I coded it, just
> trying to make it robust.
>
> rob
>
>>
>>> I see what you mean about my regex being a bit weak though, it really
>>> should require that the entire string conform. I'll see what I can do.
>>>
>>> rob
>>
>>
>

-- 
John Dennis <jdennis at redhat.com>

Looking to carve out IT costs?
www.redhat.com/carveoutcosts/