[Freeipa-devel] JSON problems (the woes of binary data)

Jason Gerard DeRose jderose at redhat.com
Fri Feb 26 21:49:37 UTC 2010


On Fri, 2010-02-26 at 15:59 -0500, John Dennis wrote:
> The Problem:
> ------------
> 
> I've been looking at the encoding exception which is being thrown when 
> you click on the "Services" menu item in our current implementation. By 
> default we seem to be using JSON as our RPC mechanism. The exception is 
> being thrown when the JSON encoder hits a certificate. Recall that we 
> store certificates in LDAP as binary data and in our implementation we 
> distinguish binary data from text by Python object type, text is 
> *always* a unicode object and binary data is *always* a str object. 
> However in Python 2.x str objects are believed to be text and are 
> subject to encoding/decoding in many parts of the Python world.

The CLI communicates to the server over XML-RPC, but the webUI
communicates to the server over JSON-RPC.  Dealing with JSON on the web
client is fast and easy, XML difficult and slow.

> Unlike XML-RPC JSON does *not* have a binary type. In JSON there are 
> *only* unicode strings. So what is happening is that that when the JSON 
> encoder sees our certificate data in a str object it says "str objects 
> are text and we have to produce a UTF-8 unicode encoding from that str 
> object". There's the problem! It's completely nonsensical to try and 
> encode binary to to UTF-8.

Yeah, I do wish JSON had a binary literal type.  This is obviously a bug
in my JSON-RPC code, but also an issue we need to solve for the UI.
When we send binary to the webUI, what is our intent?  I think that
displaying it as base64 encoded text is not generally what the user
wants.  I think displaying a link that will allow them to download the
file is generally a better idea.  Perhaps the Param should indicate how
it should be handled in the webUI.

> The right way to handle this is to encode the binary data to base64 
> ASCII text and then hand it to JSON. FWIW our XML-RPC handler does this 
> already because XML-RPC knows about binary data and elects to 
> encode/decode it to base64 as it's marshaled and unmarshaled. But JSON 
> can't do this during marhasling and unmarshaling because the JSON 
> protocol has no concept of binary data.
> 
> The python JSON encoder class does give us the option to hook into the 
> encoder and check if the object is a str object and then base64 encode. 
> But that doesn't help us at the opposite end. How would we know when 
> unmarshaling that a given string is supposed to be base64 decoded back 
> into binary data? We could prepend a special string and hope that string 
> never gets used by normal text (yuck). Keeping a list of what needs 
> base64 decoding is not an option within JSON because at the time of 
> decoding we have no information available about the context of the JSON 
> objects.

I think sending it as a dict with a special key, something like:

  {'__base64__': b64encode(my_str)}

> That means if we want to use JSON we really should push the base64 
> encode/decode to the parts of the code which have a priori knowledge 
> about the objects they're pushing through the command interface. This 
> would mean any command which passes a certificate should base64 encode 
> it prior to sending it and base64 decode after it come back from a 
> command result. Actually it would be preferable to use PEM encoding, and 
> by the way, the whole reason why PEM encodings for certificates was 
> developed was exactly for this scenario: transporting a certificate 
> through a text based interchange mechanism!
> 
> Possible Solutions:
> -------------------
> 
> As I see it we have these options in front of us for how to deal with 
> this problem:
> 
> * Drop support for JSON, only use XML-RPC

We can't do this and keep the flexibility we need in the UI.  Also,
there is a strong trend to use JSON over XML lately (RPC or otherwise),
so I think we do ourselves a disservice by dropping the JSON-RPC.

> * Once we read a certificate from LDAP immediately convert it to PEM 
> format. Adopt the convention that anytime we exchange certificates it 
> will be in PEM format. Only convert from PEM format when the target 
> demands binary (e.g. storing it in LDAP, passing it to a library 
> expecting DER encoded data, etc.).
> 
> * Come up with some hacky protocol on top of JSON which signals "this 
> string is really binary" and check for it on every JSON encode/decode 
> and cross our fingers no one tries to send a legitimate string which 
> would trigger the encode/decode.
> 
> Question: Are certificates the one and only example of binary data we 
> exchange?

At this time, I believe so.  But it would be nice to have a plan for how
do deal with this in the future for other binary data.

> Recommendation:
> ---------------
> 
> My personal recommendation is we adopt the convention that certificates 
> are always PEM encoded. We've already run into many problems trying to 
> deduce what format a certificate is (e.g. binary, base64, PEM) I think 
> it would be good if we just put a stake in the ground and said 
> "certificates are always PEM encoded" and be done with all these 
> problems we keep having with the data type of certificates.

+1.  Regardless how (or if) we decide to handle generic binary data,
this seems a good approach for the certificate.

> As an aside I'm also skeptical of the robustness of allowing binary data 
> at all in our implementation. Trying to support binary data has been 
> nothing but a headache and a source of many many bugs. Do we really need it?
> 
> -- 
> John Dennis <jdennis at redhat.com>
> 
> Looking to carve out IT costs?
> www.redhat.com/carveoutcosts/
> 
> _______________________________________________
> Freeipa-devel mailing list
> Freeipa-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/freeipa-devel




More information about the Freeipa-devel mailing list