[Freeipa-devel] Re: Encoding of Kerberos principal
Don Davis
dodavis at redhat.com
Wed Jul 8 14:35:50 UTC 2009
On 07/08/2009 05:09 AM, Jason Gerard DeRose wrote:
> On Tue, 2009-07-07 at 10:08 -0400, Don Davis wrote:
>>> > Is it safe to assume that the principal is UTF-8 encoded
>>> > (as far as the MIT Kerberos library is concerned)?
>>>
>>> the short answer is "no;" in general, principal names
>>> are claimed to be encoding-agnostic, meaning that they're
>>> not null-terminated, and the code is supposed to ignore the
>>> bytes' internal structure; a principal-name is just void*+length,
>>> more-or-less.
>
> Well, as far as the principal and realm, I'm only using it two ways:
>
> 1. Client-side I extract the principal/realm from the default credential
> cache of the user running the process.
>
> 2. Client or server side, I get the default realm (which just comes
> from /etc/krb5.conf, AFAIK).
>
> I don't know if the Kerberos standard even allows non-ascii characters
> in the realm, so I think the only gotcha is with the user-name portion
> of the principal.
>
> Anyway, in the above situations, can I expect any consistency? Does the
> Kerberos server negotiate the character encoding with the client? Like
> what happens if I have non-ascii characters in my user-name and and
> authenticating from a Windows box to a Linux authentication server?
hi, jason --
i think you're doing as well as you can, but you can't expect perfect
consistency. the krb protocol does _not_ negotiate a character-set.
the krb spec says, on p.52:
"In practice, many implementations treat [name strings] as if they
were 8-bit strings of whichever character set the implementation
defaults to, without regard to correct usage of character-set
designation escape sequences. The default character set is often
determined by the current user's operating system-dependent locale.
At least one major implementation places unescaped UTF-8 encoded
Unicode characters in the [name strings]. This failure to adhere
to the ... specifications results in interoperability issueswhen
conflicting character encodings are utilized by the Kerberos
clients, services, and KDC..."
-- http://www.ietf.org/rfc/rfc4120.txt
in other words, if a customer's hosts all use the same locale, they
might get OK results when AD & MIT try to talk to each other about
names. from my reading, i expect krb implementations are most likely
to screw up on names and passwords that contain a mixture of characters
from different languages (like using a greek letter as part of an
english or german name). this weird-but-legal usage is where the
above-mentioned "escape sequences" kick in, and krb implementations
vary in how well they deal with such mixed-language names.
unfortunately, the kerberos spec relies heavily on the ASN.1 spec,
which was & is screwed-up on the subject of i8n, just as ASN.1 is
screwed up in other ways. if you need a more-precise/standard answer,
i'd suggest you read:
* pp.52-54 of rfc4120: http://www.ietf.org/rfc/rfc4120.txt
* skim the wikipedia description of iso-2022, which is the
legacy i8n mechanism, predating unicode & utf8, that krb
implementations are supposed to support correctly:
http://en.wikipedia.org/wiki/ISO/IEC_2022
* skim iso-8859 (iso latin), too. this is the modern standard
for most of the commercially-important alphabetic languages
(but not asia), and it's arguably what you wish krb would
support well: http://en.wikipedia.org/wiki/ISO/IEC_8859
for interoperability, there's no substitute for testing. further,
i think we need to decide early which foreign character-sets we
can put off worrying about, based on which languages our paying
customers use. the prioritized list i've used elsewhere is:
Europe > Japan > Korea > China > Russia > India > Middle East,
but redhat's customer-base is probably different.
- don
-
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/freeipa-devel/attachments/20090708/0b6ce6b8/attachment.htm>
More information about the Freeipa-devel
mailing list