[Freeipa-devel] JSON problems (the woes of binary data)
John Dennis
jdennis at redhat.com
Fri Feb 26 20:59:53 UTC 2010
The Problem:
------------
I've been looking at the encoding exception which is being thrown when
you click on the "Services" menu item in our current implementation. By
default we seem to be using JSON as our RPC mechanism. The exception is
being thrown when the JSON encoder hits a certificate. Recall that we
store certificates in LDAP as binary data and in our implementation we
distinguish binary data from text by Python object type, text is
*always* a unicode object and binary data is *always* a str object.
However in Python 2.x str objects are believed to be text and are
subject to encoding/decoding in many parts of the Python world.
Unlike XML-RPC JSON does *not* have a binary type. In JSON there are
*only* unicode strings. So what is happening is that that when the JSON
encoder sees our certificate data in a str object it says "str objects
are text and we have to produce a UTF-8 unicode encoding from that str
object". There's the problem! It's completely nonsensical to try and
encode binary to to UTF-8.
The right way to handle this is to encode the binary data to base64
ASCII text and then hand it to JSON. FWIW our XML-RPC handler does this
already because XML-RPC knows about binary data and elects to
encode/decode it to base64 as it's marshaled and unmarshaled. But JSON
can't do this during marhasling and unmarshaling because the JSON
protocol has no concept of binary data.
The python JSON encoder class does give us the option to hook into the
encoder and check if the object is a str object and then base64 encode.
But that doesn't help us at the opposite end. How would we know when
unmarshaling that a given string is supposed to be base64 decoded back
into binary data? We could prepend a special string and hope that string
never gets used by normal text (yuck). Keeping a list of what needs
base64 decoding is not an option within JSON because at the time of
decoding we have no information available about the context of the JSON
objects.
That means if we want to use JSON we really should push the base64
encode/decode to the parts of the code which have a priori knowledge
about the objects they're pushing through the command interface. This
would mean any command which passes a certificate should base64 encode
it prior to sending it and base64 decode after it come back from a
command result. Actually it would be preferable to use PEM encoding, and
by the way, the whole reason why PEM encodings for certificates was
developed was exactly for this scenario: transporting a certificate
through a text based interchange mechanism!
Possible Solutions:
-------------------
As I see it we have these options in front of us for how to deal with
this problem:
* Drop support for JSON, only use XML-RPC
* Once we read a certificate from LDAP immediately convert it to PEM
format. Adopt the convention that anytime we exchange certificates it
will be in PEM format. Only convert from PEM format when the target
demands binary (e.g. storing it in LDAP, passing it to a library
expecting DER encoded data, etc.).
* Come up with some hacky protocol on top of JSON which signals "this
string is really binary" and check for it on every JSON encode/decode
and cross our fingers no one tries to send a legitimate string which
would trigger the encode/decode.
Question: Are certificates the one and only example of binary data we
exchange?
Recommendation:
---------------
My personal recommendation is we adopt the convention that certificates
are always PEM encoded. We've already run into many problems trying to
deduce what format a certificate is (e.g. binary, base64, PEM) I think
it would be good if we just put a stake in the ground and said
"certificates are always PEM encoded" and be done with all these
problems we keep having with the data type of certificates.
As an aside I'm also skeptical of the robustness of allowing binary data
at all in our implementation. Trying to support binary data has been
nothing but a headache and a source of many many bugs. Do we really need it?
--
John Dennis <jdennis at redhat.com>
Looking to carve out IT costs?
www.redhat.com/carveoutcosts/
More information about the Freeipa-devel
mailing list