[Freeipa-devel] JSON problems (the woes of binary data)

John Dennis jdennis at redhat.com
Fri Feb 26 20:59:53 UTC 2010


The Problem:
------------

I've been looking at the encoding exception which is being thrown when 
you click on the "Services" menu item in our current implementation. By 
default we seem to be using JSON as our RPC mechanism. The exception is 
being thrown when the JSON encoder hits a certificate. Recall that we 
store certificates in LDAP as binary data and in our implementation we 
distinguish binary data from text by Python object type, text is 
*always* a unicode object and binary data is *always* a str object. 
However in Python 2.x str objects are believed to be text and are 
subject to encoding/decoding in many parts of the Python world.

Unlike XML-RPC JSON does *not* have a binary type. In JSON there are 
*only* unicode strings. So what is happening is that that when the JSON 
encoder sees our certificate data in a str object it says "str objects 
are text and we have to produce a UTF-8 unicode encoding from that str 
object". There's the problem! It's completely nonsensical to try and 
encode binary to to UTF-8.

The right way to handle this is to encode the binary data to base64 
ASCII text and then hand it to JSON. FWIW our XML-RPC handler does this 
already because XML-RPC knows about binary data and elects to 
encode/decode it to base64 as it's marshaled and unmarshaled. But JSON 
can't do this during marhasling and unmarshaling because the JSON 
protocol has no concept of binary data.

The python JSON encoder class does give us the option to hook into the 
encoder and check if the object is a str object and then base64 encode. 
But that doesn't help us at the opposite end. How would we know when 
unmarshaling that a given string is supposed to be base64 decoded back 
into binary data? We could prepend a special string and hope that string 
never gets used by normal text (yuck). Keeping a list of what needs 
base64 decoding is not an option within JSON because at the time of 
decoding we have no information available about the context of the JSON 
objects.

That means if we want to use JSON we really should push the base64 
encode/decode to the parts of the code which have a priori knowledge 
about the objects they're pushing through the command interface. This 
would mean any command which passes a certificate should base64 encode 
it prior to sending it and base64 decode after it come back from a 
command result. Actually it would be preferable to use PEM encoding, and 
by the way, the whole reason why PEM encodings for certificates was 
developed was exactly for this scenario: transporting a certificate 
through a text based interchange mechanism!

Possible Solutions:
-------------------

As I see it we have these options in front of us for how to deal with 
this problem:

* Drop support for JSON, only use XML-RPC

* Once we read a certificate from LDAP immediately convert it to PEM 
format. Adopt the convention that anytime we exchange certificates it 
will be in PEM format. Only convert from PEM format when the target 
demands binary (e.g. storing it in LDAP, passing it to a library 
expecting DER encoded data, etc.).

* Come up with some hacky protocol on top of JSON which signals "this 
string is really binary" and check for it on every JSON encode/decode 
and cross our fingers no one tries to send a legitimate string which 
would trigger the encode/decode.

Question: Are certificates the one and only example of binary data we 
exchange?

Recommendation:
---------------

My personal recommendation is we adopt the convention that certificates 
are always PEM encoded. We've already run into many problems trying to 
deduce what format a certificate is (e.g. binary, base64, PEM) I think 
it would be good if we just put a stake in the ground and said 
"certificates are always PEM encoded" and be done with all these 
problems we keep having with the data type of certificates.

As an aside I'm also skeptical of the robustness of allowing binary data 
at all in our implementation. Trying to support binary data has been 
nothing but a headache and a source of many many bugs. Do we really need it?

-- 
John Dennis <jdennis at redhat.com>

Looking to carve out IT costs?
www.redhat.com/carveoutcosts/




More information about the Freeipa-devel mailing list