Skip to content

Reference for punycode-encoding of IDN #63

Closed
@domel

Description

According to the comment:

The real issue is that this is probably not just about updating references. With the warning that I have not studied this thread enough to be confident that I fully understand the issues involved, let me try to explain.

If the discussion is about internatiionalized domain names (IDNs), there should be no references to RFC 3490 or 3491 (aka "IDNA2003") at all. Such references should be replaced by references to one or more of RFC 5890ff and the surrounding text should be checked carefully to make sure it is not dependent on assumptions or conventions of IDNA2003 (RFC 3490/3491) that were changed in IDNA2008 (RFC 5890ff),

Now, as far as Punycode is concerned, IDNA2008 generally, and 5891 in particular, do not change the algorithm itself at all (see RFC 5891 Section 4.4). What it changes are the references inside 3942 that point to IDNA2003 that are updatedto point to IDNA2008. However, depending on how you are using "punycode" terminology, that is fairly important. The issues goes back to the core differences between IDNA2003 and IDNA2008. Conceptually, IDNA2008 is not a tuned and updated version of IDNA2003. It is as much of a change as the change of the default CCS for HTML from ISO 8859-1 to Unicode. In the particular case of IDNs, one of those conceptual differences is that IDNA2003 was quite permissive as to what names were valid, even allowing different Unicode strings to convert to the same string after Punycode conversion although (obviously) the reverse conversion could only produce one of those strings. IDNA2008 is much more focused on identifier integrity and restrictions intended to reduce or eliminate ambiguity, mapping problems, etc. As a result, the "to_asciii" and "to_unicode" terminology and, if one needs to be precise, "Punycode string" terminology, are no longer useful and, IIR, do not appear in the IDNA2008 documents at all. And strings that can be generated by the Punycode algorithm that are not valid IDNA2008 labels are, to use a technical term, garbage: there is no way to talk about their validity. So, in general, contemporary documents should not be referring to Punycode at all, especially as a type of string, but should be using the IDNA2008 terminology of "A-labels" and "U-labels", defined in RFC 5890.

I think that it's not only the issue to change from

Punycode-encoding of Internationalized Domain Names in IRIs [RFC3492]

to

Punycode-encoding of Internationalized Domain Names in IRIs [RFC5890]

I think we should rephrase some parts of the green box in the spec and then change the reference.

Metadata

Assignees

No one assigned

    Labels

    i18n-needs-resolutionIssue the Internationalization Group has raised and looks for a response on.spec:substantiveChange in the spec affecting its normative content (class 3) –see also spec:bug, spec:new-feature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions