[Freeipa-devel] JSON problems (the woes of binary data)

Rich Megginson rmeggins at redhat.com
Fri Feb 26 23:19:22 UTC 2010


John Dennis wrote:
> The Problem:
> ------------
>
> I've been looking at the encoding exception which is being thrown when 
> you click on the "Services" menu item in our current implementation. 
> By default we seem to be using JSON as our RPC mechanism. The 
> exception is being thrown when the JSON encoder hits a certificate. 
> Recall that we store certificates in LDAP as binary data and in our 
> implementation we distinguish binary data from text by Python object 
> type, text is *always* a unicode object and binary data is *always* a 
> str object. However in Python 2.x str objects are believed to be text 
> and are subject to encoding/decoding in many parts of the Python world.
>
> Unlike XML-RPC JSON does *not* have a binary type. In JSON there are 
> *only* unicode strings. So what is happening is that that when the 
> JSON encoder sees our certificate data in a str object it says "str 
> objects are text and we have to produce a UTF-8 unicode encoding from 
> that str object". There's the problem! It's completely nonsensical to 
> try and encode binary to to UTF-8.
>
> The right way to handle this is to encode the binary data to base64 
> ASCII text and then hand it to JSON. FWIW our XML-RPC handler does 
> this already because XML-RPC knows about binary data and elects to 
> encode/decode it to base64 as it's marshaled and unmarshaled. But JSON 
> can't do this during marhasling and unmarshaling because the JSON 
> protocol has no concept of binary data.
>
> The python JSON encoder class does give us the option to hook into the 
> encoder and check if the object is a str object and then base64 
> encode. But that doesn't help us at the opposite end. How would we 
> know when unmarshaling that a given string is supposed to be base64 
> decoded back into binary data? We could prepend a special string and 
> hope that string never gets used by normal text (yuck). Keeping a list 
> of what needs base64 decoding is not an option within JSON because at 
> the time of decoding we have no information available about the 
> context of the JSON objects.
>
> That means if we want to use JSON we really should push the base64 
> encode/decode to the parts of the code which have a priori knowledge 
> about the objects they're pushing through the command interface. This 
> would mean any command which passes a certificate should base64 encode 
> it prior to sending it and base64 decode after it come back from a 
> command result. Actually it would be preferable to use PEM encoding, 
> and by the way, the whole reason why PEM encodings for certificates 
> was developed was exactly for this scenario: transporting a 
> certificate through a text based interchange mechanism!
>
> Possible Solutions:
> -------------------
>
> As I see it we have these options in front of us for how to deal with 
> this problem:
>
> * Drop support for JSON, only use XML-RPC
>
> * Once we read a certificate from LDAP immediately convert it to PEM 
> format. Adopt the convention that anytime we exchange certificates it 
> will be in PEM format. Only convert from PEM format when the target 
> demands binary (e.g. storing it in LDAP, passing it to a library 
> expecting DER encoded data, etc.).
>
> * Come up with some hacky protocol on top of JSON which signals "this 
> string is really binary" and check for it on every JSON encode/decode 
> and cross our fingers no one tries to send a legitimate string which 
> would trigger the encode/decode.
>
> Question: Are certificates the one and only example of binary data we 
> exchange?
>
> Recommendation:
> ---------------
>
> My personal recommendation is we adopt the convention that 
> certificates are always PEM encoded. We've already run into many 
> problems trying to deduce what format a certificate is (e.g. binary, 
> base64, PEM) I think it would be good if we just put a stake in the 
> ground and said "certificates are always PEM encoded" and be done with 
> all these problems we keep having with the data type of certificates.
This is what the directory server console does, which allows you to 
copy/paste CA certs, cert requests, cert request responses, etc. 
directly in the UI, and easily convert that pem/ascii format to other 
formats.
>
> As an aside I'm also skeptical of the robustness of allowing binary 
> data at all in our implementation. Trying to support binary data has 
> been nothing but a headache and a source of many many bugs. Do we 
> really need it?
>




More information about the Freeipa-devel mailing list