[Freeipa-devel] [PATCH 0070] Normalization check only for IDNA domains

Mon Jun 30 08:43:32 UTC 2014

On Mon, 30 Jun 2014, Martin Basti wrote:
>On Fri, 2014-06-27 at 14:03 +0300, Alexander Bokovoy wrote:
>> On Fri, 27 Jun 2014, Martin Kosek wrote:
>> >On 06/27/2014 12:10 PM, Alexander Bokovoy wrote:
>> >> On Fri, 27 Jun 2014, Petr Spacek wrote:
>> >>> On 27.6.2014 11:21, Jan Cholasta wrote:
>> >>>> On 27.6.2014 10:58, Alexander Bokovoy wrote:
>> >>>>> On Fri, 27 Jun 2014, Jan Cholasta wrote:
>> >>>>>> On 27.6.2014 10:29, Alexander Bokovoy wrote:
>> >>>>>>> On Fri, 27 Jun 2014, Jan Cholasta wrote:
>> >>>>>>>> On 27.6.2014 10:15, Alexander Bokovoy wrote:
>> >>>>>>>>> On Fri, 20 Jun 2014, Martin Basti wrote:
>> >>>>>>>>>> On Fri, 2014-06-20 at 10:32 +0200, Jan Cholasta wrote:
>> >>>>>>>>>>> On 18.6.2014 16:49, Martin Basti wrote:
>> >>>>>>>>>>>> Due to compability with older versions, only IDNA domains should be
>> >>>>>>>>>>>> checked
>> >>>>>>>>>>>> Patch attached.
>> >>>>>>>>>>>
>> >>>>>>>>>>> I'm not particularly happy about the u'\xdf' special case. Isn't
>> >>>>>>>>>>> there a
>> >>>>>>>>>>> better way to do this check?
>> >>>>>>>>>> I cant find better way. u'\xdf' is mapped to ss, and ss is not IDN
>> >>>>>>>>>> string.
>> >>>>>>>>>>
>> >>>>>>>>>> Or just remove this validation.
>> >>>>>>>>>>
>> >>>>>>>>>>> (BTW I really think this should be a warning, not an error, but that
>> >>>>>>>>>>> would require larger amount of work, so I guess it's OK for now.)
>> >>>>>>>>>> (More pain than gain)
>> >>>>>>>>> Main thing in this patch is that the check should not be done against
>> >>>>>>>>> non-IDN strings. I want this version of the patch to go in for that
>> >>>>>>>>> reason as currently you cannot even complete ipa-adtrust-install
>> >>>>>>>>> run due
>> >>>>>>>>> to IDN normalisation check being applied to non-IDN domains.
>> >>>>>>>>
>> >>>>>>>> On non-IDN domains, the only effect of IDN normalization is that it
>> >>>>>>>> lower-cases the names (right?), so the check should compare
>> >>>>>>>> lower-cased original name with the normalized name, instead of
>> >>>>>>>> special-casing certain characters etc.
>> >>>>>>> .. what's the reason to do such comparison then? lower-cased non-IDN
>> >>>>>>> name will be equal to lower-cased normalized non-IDN name by definition,
>> >>>>>>> so the check is not needed in this case, at all.
>> >>>>>>
>> >>>>>> The point is that it works for both IDN and non-IDN, without
>> >>>>>> u'\xdf'-style hacks.
>> >>>>> No, your proposal of comparing low-cased value and normalized value is
>> >>>>> not going to work because low-cased value is in general not equal to
>> >>>>> normalized value for IDN names, only for non-IDN ones, due to the fact
>> >>>>> that lower case for non-ASCII Unicode character may map to a completely
>> >>>>> different character than in normalization situation. Take, for example,
>> >>>>> Turkish alphabet where there are six letters with different case rules
>> >>>>> (uppercase dotted i, dottless lowercase i, upper- and lowercase G with
>> >>>>> breve accent, and upper- and lowercase S with cedilla), which will break
>> >>>>> your generalized check.
>> >>>>> So you'll anyway will need to split these cases.
>> >>>>>
>> >>>>
>> >>>> I see.
>> >>>>
>> >>>> I'm still not comfortable with carrying the bit of knowledge about u'\xdf' in
>> >>>> this particular spot. Can we check that a name is IDN some other way than
>> >>>> "domain_name.is_idn() or u'\xdf' in value"?
>> >>>
>> >>> Why can't we simply fix string constants in ipa-adtrust-install and avoid
>> >>> adding hacks for it?
>> >> Because they are correct, in the sense that they follow what is defined
>> >> for Active Directory. Yes, AD puts them in that case into DNS. There is
>> >> simply no reason to force lower case for non-IDN names.
>> >>
>> >> That said, a newer fix is attached, where error message is formatted
>> >> properly.
>> >
>> >I would personally be OK with the change if the is_* are fixed as Honza
>> >proposed, current way is not so Python-ic/readable. I.e.:
>> >
>> >Instead of
>> >+            is_idna = True in [encodings.idna.ToASCII(x) != x for x in labels]
>> >Use
>> >+            is_idna = any(encodings.idna.ToASCII(x) != x for x in labels)
>> >
>> >Instead of
>> >+                is_nonnorm = True in [encodings.idna.nameprep(x) != x for x in
>> >labels]
>> >use
>> >+                is_nonnorm = any(encodings.idna.nameprep(x) != x for x in labels)
>> >
>> >However, we can wait till Monday for Martin2's feedback.
>> I've fixed this and also made a proper split on all dots that could
>> separate labels according to RFC3490:
>>
>>     U+002E ( . ) FULL STOP
>>     U+FF0E ( ． ) FULLWIDTH FULL STOP
>>     U+3002 ( 。 ) IDEOGRAPHIC FULL STOP
>>     U+FF61 ( ｡ ) HALFWIDTH IDEOGRAPHIC FULL STOP
>>
>>
>> _______________________________________________
>> Freeipa-devel mailing list
>> Freeipa-devel at redhat.com
>> https://www.redhat.com/mailman/listinfo/freeipa-devel
>
>Hi,
>I analyzed how python detects IDNA labels.
>
>Python tests if domain is IDNA in this way:
>
>def ToASCII(label):
>    try:
>        # Step 1: try ASCII
>        label = label.encode("ascii")
>    except UnicodeError:
>        pass
>    else:
>        # Skip to step 3: UseSTD3ASCIIRules is false, so
>        # Skip to step 8.
>        if 0 < len(label) < 64:
>            return label
>        raise UnicodeError("label empty or too long")
>
>    # Step 2: nameprep
>    label = nameprep(label)
>...
>
>We can use 'label = label.encode("ascii")' to detect if IDNA is needed,
>without idna.ToASCII() conversion, and then use:
>
>is_nonnorm = any(encodings.idna.nameprep(x) != x for x in labels)
Sounds good but don't forget exceptions' handling. :)

-- 
/ Alexander Bokovoy