[Freeipa-devel] [PATCH 0070] Normalization check only for IDNA domains

Fri Jun 27 08:58:21 UTC 2014

On Fri, 27 Jun 2014, Jan Cholasta wrote:
>On 27.6.2014 10:29, Alexander Bokovoy wrote:
>>On Fri, 27 Jun 2014, Jan Cholasta wrote:
>>>On 27.6.2014 10:15, Alexander Bokovoy wrote:
>>>>On Fri, 20 Jun 2014, Martin Basti wrote:
>>>>>On Fri, 2014-06-20 at 10:32 +0200, Jan Cholasta wrote:
>>>>>>On 18.6.2014 16:49, Martin Basti wrote:
>>>>>>>Due to compability with older versions, only IDNA domains should be
>>>>>>>checked
>>>>>>>Patch attached.
>>>>>>
>>>>>>I'm not particularly happy about the u'\xdf' special case. Isn't
>>>>>>there a
>>>>>>better way to do this check?
>>>>>I cant find better way. u'\xdf' is mapped to ss, and ss is not IDN
>>>>>string.
>>>>>
>>>>>Or just remove this validation.
>>>>>
>>>>>>(BTW I really think this should be a warning, not an error, but that
>>>>>>would require larger amount of work, so I guess it's OK for now.)
>>>>>(More pain than gain)
>>>>Main thing in this patch is that the check should not be done against
>>>>non-IDN strings. I want this version of the patch to go in for that
>>>>reason as currently you cannot even complete ipa-adtrust-install run due
>>>>to IDN normalisation check being applied to non-IDN domains.
>>>
>>>On non-IDN domains, the only effect of IDN normalization is that it
>>>lower-cases the names (right?), so the check should compare
>>>lower-cased original name with the normalized name, instead of
>>>special-casing certain characters etc.
>>.. what's the reason to do such comparison then? lower-cased non-IDN
>>name will be equal to lower-cased normalized non-IDN name by definition,
>>so the check is not needed in this case, at all.
>
>The point is that it works for both IDN and non-IDN, without 
>u'\xdf'-style hacks.
No, your proposal of comparing low-cased value and normalized value is
not going to work because low-cased value is in general not equal to
normalized value for IDN names, only for non-IDN ones, due to the fact
that lower case for non-ASCII Unicode character may map to a completely
different character than in normalization situation. Take, for example,
Turkish alphabet where there are six letters with different case rules
(uppercase dotted i, dottless lowercase i, upper- and lowercase G with
breve accent, and upper- and lowercase S with cedilla), which will break
your generalized check. 

So you'll anyway will need to split these cases.

-- 
/ Alexander Bokovoy