[Freeipa-devel] [PATCH 0070] Normalization check only for IDNA domains

Fri Jun 27 11:03:29 UTC 2014

On Fri, 27 Jun 2014, Martin Kosek wrote:
>On 06/27/2014 12:10 PM, Alexander Bokovoy wrote:
>> On Fri, 27 Jun 2014, Petr Spacek wrote:
>>> On 27.6.2014 11:21, Jan Cholasta wrote:
>>>> On 27.6.2014 10:58, Alexander Bokovoy wrote:
>>>>> On Fri, 27 Jun 2014, Jan Cholasta wrote:
>>>>>> On 27.6.2014 10:29, Alexander Bokovoy wrote:
>>>>>>> On Fri, 27 Jun 2014, Jan Cholasta wrote:
>>>>>>>> On 27.6.2014 10:15, Alexander Bokovoy wrote:
>>>>>>>>> On Fri, 20 Jun 2014, Martin Basti wrote:
>>>>>>>>>> On Fri, 2014-06-20 at 10:32 +0200, Jan Cholasta wrote:
>>>>>>>>>>> On 18.6.2014 16:49, Martin Basti wrote:
>>>>>>>>>>>> Due to compability with older versions, only IDNA domains should be
>>>>>>>>>>>> checked
>>>>>>>>>>>> Patch attached.
>>>>>>>>>>>
>>>>>>>>>>> I'm not particularly happy about the u'\xdf' special case. Isn't
>>>>>>>>>>> there a
>>>>>>>>>>> better way to do this check?
>>>>>>>>>> I cant find better way. u'\xdf' is mapped to ss, and ss is not IDN
>>>>>>>>>> string.
>>>>>>>>>>
>>>>>>>>>> Or just remove this validation.
>>>>>>>>>>
>>>>>>>>>>> (BTW I really think this should be a warning, not an error, but that
>>>>>>>>>>> would require larger amount of work, so I guess it's OK for now.)
>>>>>>>>>> (More pain than gain)
>>>>>>>>> Main thing in this patch is that the check should not be done against
>>>>>>>>> non-IDN strings. I want this version of the patch to go in for that
>>>>>>>>> reason as currently you cannot even complete ipa-adtrust-install
>>>>>>>>> run due
>>>>>>>>> to IDN normalisation check being applied to non-IDN domains.
>>>>>>>>
>>>>>>>> On non-IDN domains, the only effect of IDN normalization is that it
>>>>>>>> lower-cases the names (right?), so the check should compare
>>>>>>>> lower-cased original name with the normalized name, instead of
>>>>>>>> special-casing certain characters etc.
>>>>>>> .. what's the reason to do such comparison then? lower-cased non-IDN
>>>>>>> name will be equal to lower-cased normalized non-IDN name by definition,
>>>>>>> so the check is not needed in this case, at all.
>>>>>>
>>>>>> The point is that it works for both IDN and non-IDN, without
>>>>>> u'\xdf'-style hacks.
>>>>> No, your proposal of comparing low-cased value and normalized value is
>>>>> not going to work because low-cased value is in general not equal to
>>>>> normalized value for IDN names, only for non-IDN ones, due to the fact
>>>>> that lower case for non-ASCII Unicode character may map to a completely
>>>>> different character than in normalization situation. Take, for example,
>>>>> Turkish alphabet where there are six letters with different case rules
>>>>> (uppercase dotted i, dottless lowercase i, upper- and lowercase G with
>>>>> breve accent, and upper- and lowercase S with cedilla), which will break
>>>>> your generalized check.
>>>>> So you'll anyway will need to split these cases.
>>>>>
>>>>
>>>> I see.
>>>>
>>>> I'm still not comfortable with carrying the bit of knowledge about u'\xdf' in
>>>> this particular spot. Can we check that a name is IDN some other way than
>>>> "domain_name.is_idn() or u'\xdf' in value"?
>>>
>>> Why can't we simply fix string constants in ipa-adtrust-install and avoid
>>> adding hacks for it?
>> Because they are correct, in the sense that they follow what is defined
>> for Active Directory. Yes, AD puts them in that case into DNS. There is
>> simply no reason to force lower case for non-IDN names.
>>
>> That said, a newer fix is attached, where error message is formatted
>> properly.
>
>I would personally be OK with the change if the is_* are fixed as Honza
>proposed, current way is not so Python-ic/readable. I.e.:
>
>Instead of
>+            is_idna = True in [encodings.idna.ToASCII(x) != x for x in labels]
>Use
>+            is_idna = any(encodings.idna.ToASCII(x) != x for x in labels)
>
>Instead of
>+                is_nonnorm = True in [encodings.idna.nameprep(x) != x for x in
>labels]
>use
>+                is_nonnorm = any(encodings.idna.nameprep(x) != x for x in labels)
>
>However, we can wait till Monday for Martin2's feedback.
I've fixed this and also made a proper split on all dots that could
separate labels according to RFC3490:

    U+002E ( . ) FULL STOP
    U+FF0E ( ． ) FULLWIDTH FULL STOP
    U+3002 ( 。 ) IDEOGRAPHIC FULL STOP
    U+FF61 ( ｡ ) HALFWIDTH IDEOGRAPHIC FULL STOP


-- 
/ Alexander Bokovoy
-------------- next part --------------
>From 3d495c5f9358dc6708800ba62a9cbbc202010d86 Mon Sep 17 00:00:00 2001
From: Alexander Bokovoy <abokovoy at redhat.com>
Date: Fri, 27 Jun 2014 13:01:28 +0300
Subject: [PATCH 3/3] Perform IDNA normalization check only for IDN domain
 names

---
 ipalib/parameters.py | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/ipalib/parameters.py b/ipalib/parameters.py
index 1dff13c..e16b352 100644
--- a/ipalib/parameters.py
+++ b/ipalib/parameters.py
@@ -1965,12 +1965,15 @@ class DNSNameParam(Param):
             #compare if IDN normalized and original domain match
             #there is N:1 mapping between unicode and IDNA names
             #user should use normalized names to avoid mistakes
-            normalized_domain_name = encodings.idna.nameprep(value)
-            if value != normalized_domain_name:
-                error = _("domain name '%(domain)s' and normalized domain name"
-                          " '%(normalized)s' do not match. Please use only"
-                          " normalized domains") % {'domain': value,
-                          'normalized': normalized_domain_name}
+            labels = re.split(u'[.\uff0e\u3002\uff61]', value, flags=re.UNICODE)
+            is_idna = any(encodings.idna.ToASCII(x) != x for x in labels)
+            if is_idna:
+                is_nonnorm = any(encodings.idna.nameprep(x) != x for x in labels)
+                if is_nonnorm:
+                    error = _("domain name '%(domain)s' should be normalized to"
+                              " have each label in lower case: %(normalized)s") % {
+                              'domain': value,
+                              'normalized': [encodings.idna.nameprep(x) for x in labels].join('.')}
             if error:
                 raise ConversionError(name=self.get_param_name(), index=index,
                                       error=error)
-- 
1.9.3