[Freeipa-devel] [PATCH 0070] Normalization check only for IDNA domains

Mon Jun 30 07:43:07 UTC 2014

On Fri, 2014-06-27 at 14:03 +0300, Alexander Bokovoy wrote:
> On Fri, 27 Jun 2014, Martin Kosek wrote:
> >On 06/27/2014 12:10 PM, Alexander Bokovoy wrote:
> >> On Fri, 27 Jun 2014, Petr Spacek wrote:
> >>> On 27.6.2014 11:21, Jan Cholasta wrote:
> >>>> On 27.6.2014 10:58, Alexander Bokovoy wrote:
> >>>>> On Fri, 27 Jun 2014, Jan Cholasta wrote:
> >>>>>> On 27.6.2014 10:29, Alexander Bokovoy wrote:
> >>>>>>> On Fri, 27 Jun 2014, Jan Cholasta wrote:
> >>>>>>>> On 27.6.2014 10:15, Alexander Bokovoy wrote:
> >>>>>>>>> On Fri, 20 Jun 2014, Martin Basti wrote:
> >>>>>>>>>> On Fri, 2014-06-20 at 10:32 +0200, Jan Cholasta wrote:
> >>>>>>>>>>> On 18.6.2014 16:49, Martin Basti wrote:
> >>>>>>>>>>>> Due to compability with older versions, only IDNA domains should be
> >>>>>>>>>>>> checked
> >>>>>>>>>>>> Patch attached.
> >>>>>>>>>>>
> >>>>>>>>>>> I'm not particularly happy about the u'\xdf' special case. Isn't
> >>>>>>>>>>> there a
> >>>>>>>>>>> better way to do this check?
> >>>>>>>>>> I cant find better way. u'\xdf' is mapped to ss, and ss is not IDN
> >>>>>>>>>> string.
> >>>>>>>>>>
> >>>>>>>>>> Or just remove this validation.
> >>>>>>>>>>
> >>>>>>>>>>> (BTW I really think this should be a warning, not an error, but that
> >>>>>>>>>>> would require larger amount of work, so I guess it's OK for now.)
> >>>>>>>>>> (More pain than gain)
> >>>>>>>>> Main thing in this patch is that the check should not be done against
> >>>>>>>>> non-IDN strings. I want this version of the patch to go in for that
> >>>>>>>>> reason as currently you cannot even complete ipa-adtrust-install
> >>>>>>>>> run due
> >>>>>>>>> to IDN normalisation check being applied to non-IDN domains.
> >>>>>>>>
> >>>>>>>> On non-IDN domains, the only effect of IDN normalization is that it
> >>>>>>>> lower-cases the names (right?), so the check should compare
> >>>>>>>> lower-cased original name with the normalized name, instead of
> >>>>>>>> special-casing certain characters etc.
> >>>>>>> .. what's the reason to do such comparison then? lower-cased non-IDN
> >>>>>>> name will be equal to lower-cased normalized non-IDN name by definition,
> >>>>>>> so the check is not needed in this case, at all.
> >>>>>>
> >>>>>> The point is that it works for both IDN and non-IDN, without
> >>>>>> u'\xdf'-style hacks.
> >>>>> No, your proposal of comparing low-cased value and normalized value is
> >>>>> not going to work because low-cased value is in general not equal to
> >>>>> normalized value for IDN names, only for non-IDN ones, due to the fact
> >>>>> that lower case for non-ASCII Unicode character may map to a completely
> >>>>> different character than in normalization situation. Take, for example,
> >>>>> Turkish alphabet where there are six letters with different case rules
> >>>>> (uppercase dotted i, dottless lowercase i, upper- and lowercase G with
> >>>>> breve accent, and upper- and lowercase S with cedilla), which will break
> >>>>> your generalized check.
> >>>>> So you'll anyway will need to split these cases.
> >>>>>
> >>>>
> >>>> I see.
> >>>>
> >>>> I'm still not comfortable with carrying the bit of knowledge about u'\xdf' in
> >>>> this particular spot. Can we check that a name is IDN some other way than
> >>>> "domain_name.is_idn() or u'\xdf' in value"?
> >>>
> >>> Why can't we simply fix string constants in ipa-adtrust-install and avoid
> >>> adding hacks for it?
> >> Because they are correct, in the sense that they follow what is defined
> >> for Active Directory. Yes, AD puts them in that case into DNS. There is
> >> simply no reason to force lower case for non-IDN names.
> >>
> >> That said, a newer fix is attached, where error message is formatted
> >> properly.
> >
> >I would personally be OK with the change if the is_* are fixed as Honza
> >proposed, current way is not so Python-ic/readable. I.e.:
> >
> >Instead of
> >+            is_idna = True in [encodings.idna.ToASCII(x) != x for x in labels]
> >Use
> >+            is_idna = any(encodings.idna.ToASCII(x) != x for x in labels)
> >
> >Instead of
> >+                is_nonnorm = True in [encodings.idna.nameprep(x) != x for x in
> >labels]
> >use
> >+                is_nonnorm = any(encodings.idna.nameprep(x) != x for x in labels)
> >
> >However, we can wait till Monday for Martin2's feedback.
> I've fixed this and also made a proper split on all dots that could
> separate labels according to RFC3490:
> 
>     U+002E ( . ) FULL STOP
>     U+FF0E ( ． ) FULLWIDTH FULL STOP
>     U+3002 ( 。 ) IDEOGRAPHIC FULL STOP
>     U+FF61 ( ｡ ) HALFWIDTH IDEOGRAPHIC FULL STOP
> 
> 
> _______________________________________________
> Freeipa-devel mailing list
> Freeipa-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/freeipa-devel

Hi,
I analyzed how python detects IDNA labels.

Python tests if domain is IDNA in this way:

def ToASCII(label):
    try:
        # Step 1: try ASCII
        label = label.encode("ascii")
    except UnicodeError:
        pass
    else:
        # Skip to step 3: UseSTD3ASCIIRules is false, so
        # Skip to step 8.
        if 0 < len(label) < 64:
            return label
        raise UnicodeError("label empty or too long")

    # Step 2: nameprep
    label = nameprep(label)
...

We can use 'label = label.encode("ascii")' to detect if IDNA is needed,
without idna.ToASCII() conversion, and then use:

is_nonnorm = any(encodings.idna.nameprep(x) != x for x in labels)

-- 
Martin^2 Basti