[Freeipa-devel] JSON problems (the woes of binary data)
Jason Gerard DeRose
jderose at redhat.com
Fri Feb 26 21:49:37 UTC 2010
On Fri, 2010-02-26 at 15:59 -0500, John Dennis wrote:
> The Problem:
> ------------
>
> I've been looking at the encoding exception which is being thrown when
> you click on the "Services" menu item in our current implementation. By
> default we seem to be using JSON as our RPC mechanism. The exception is
> being thrown when the JSON encoder hits a certificate. Recall that we
> store certificates in LDAP as binary data and in our implementation we
> distinguish binary data from text by Python object type, text is
> *always* a unicode object and binary data is *always* a str object.
> However in Python 2.x str objects are believed to be text and are
> subject to encoding/decoding in many parts of the Python world.
The CLI communicates to the server over XML-RPC, but the webUI
communicates to the server over JSON-RPC. Dealing with JSON on the web
client is fast and easy, XML difficult and slow.
> Unlike XML-RPC JSON does *not* have a binary type. In JSON there are
> *only* unicode strings. So what is happening is that that when the JSON
> encoder sees our certificate data in a str object it says "str objects
> are text and we have to produce a UTF-8 unicode encoding from that str
> object". There's the problem! It's completely nonsensical to try and
> encode binary to to UTF-8.
Yeah, I do wish JSON had a binary literal type. This is obviously a bug
in my JSON-RPC code, but also an issue we need to solve for the UI.
When we send binary to the webUI, what is our intent? I think that
displaying it as base64 encoded text is not generally what the user
wants. I think displaying a link that will allow them to download the
file is generally a better idea. Perhaps the Param should indicate how
it should be handled in the webUI.
> The right way to handle this is to encode the binary data to base64
> ASCII text and then hand it to JSON. FWIW our XML-RPC handler does this
> already because XML-RPC knows about binary data and elects to
> encode/decode it to base64 as it's marshaled and unmarshaled. But JSON
> can't do this during marhasling and unmarshaling because the JSON
> protocol has no concept of binary data.
>
> The python JSON encoder class does give us the option to hook into the
> encoder and check if the object is a str object and then base64 encode.
> But that doesn't help us at the opposite end. How would we know when
> unmarshaling that a given string is supposed to be base64 decoded back
> into binary data? We could prepend a special string and hope that string
> never gets used by normal text (yuck). Keeping a list of what needs
> base64 decoding is not an option within JSON because at the time of
> decoding we have no information available about the context of the JSON
> objects.
I think sending it as a dict with a special key, something like:
{'__base64__': b64encode(my_str)}
> That means if we want to use JSON we really should push the base64
> encode/decode to the parts of the code which have a priori knowledge
> about the objects they're pushing through the command interface. This
> would mean any command which passes a certificate should base64 encode
> it prior to sending it and base64 decode after it come back from a
> command result. Actually it would be preferable to use PEM encoding, and
> by the way, the whole reason why PEM encodings for certificates was
> developed was exactly for this scenario: transporting a certificate
> through a text based interchange mechanism!
>
> Possible Solutions:
> -------------------
>
> As I see it we have these options in front of us for how to deal with
> this problem:
>
> * Drop support for JSON, only use XML-RPC
We can't do this and keep the flexibility we need in the UI. Also,
there is a strong trend to use JSON over XML lately (RPC or otherwise),
so I think we do ourselves a disservice by dropping the JSON-RPC.
> * Once we read a certificate from LDAP immediately convert it to PEM
> format. Adopt the convention that anytime we exchange certificates it
> will be in PEM format. Only convert from PEM format when the target
> demands binary (e.g. storing it in LDAP, passing it to a library
> expecting DER encoded data, etc.).
>
> * Come up with some hacky protocol on top of JSON which signals "this
> string is really binary" and check for it on every JSON encode/decode
> and cross our fingers no one tries to send a legitimate string which
> would trigger the encode/decode.
>
> Question: Are certificates the one and only example of binary data we
> exchange?
At this time, I believe so. But it would be nice to have a plan for how
do deal with this in the future for other binary data.
> Recommendation:
> ---------------
>
> My personal recommendation is we adopt the convention that certificates
> are always PEM encoded. We've already run into many problems trying to
> deduce what format a certificate is (e.g. binary, base64, PEM) I think
> it would be good if we just put a stake in the ground and said
> "certificates are always PEM encoded" and be done with all these
> problems we keep having with the data type of certificates.
+1. Regardless how (or if) we decide to handle generic binary data,
this seems a good approach for the certificate.
> As an aside I'm also skeptical of the robustness of allowing binary data
> at all in our implementation. Trying to support binary data has been
> nothing but a headache and a source of many many bugs. Do we really need it?
>
> --
> John Dennis <jdennis at redhat.com>
>
> Looking to carve out IT costs?
> www.redhat.com/carveoutcosts/
>
> _______________________________________________
> Freeipa-devel mailing list
> Freeipa-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/freeipa-devel
More information about the Freeipa-devel
mailing list