BIGpedia.com - Internationalizing Domain Names in Applications - Encyclopedia and Dictionary Online
encyclopedia search

Internationalizing Domain Names in Applications

Internationalizing Domain Names in Applications (IDNA) is a mechanism defined in 2003 for handling internationalized domain names containing non-ASCII characters. Such domain names could not be handled by the existing DNS and name resolver infrastructure. Rather than redesigning the existing DNS infrastructure, it was decided that non-ASCII domain names should be converted to a suitable ASCII-based form by web browsers and other user applications; IDNA specifies how this conversion is to be done.

IDNA was designed for maximum backward compatibility with the existing DNS system, which was designed for use with names using only a subset of the ASCII character set.

An IDNA-enabled application is able to convert between the restricted-ASCII and non-ASCII representations of a domain, using the ASCII form in cases where it is needed (such as for DNS lookup), but being able to present the more readable non-ASCII form to users. Applications that do not support IDNA will not be able to handle domain names with non-ASCII characters, but will still be able to access such domains if given the (usually rather cryptic) ASCII equivalent.

ICANN issued guidelines for the use of IDNA in June 2003, and it was already possible to register .jp domains using this system in July 2003. Several other top-level domain registries started accepting registrations in March 2004.

Mozilla 1.4, Netscape 7.1 and Opera 7.11 are among the first applications to support IDNA.

Contents

ToASCII and ToUnicode

The conversions between ASCII and non-ASCII forms of a domain name are accomplished by algorithms called ToASCII and ToUnicode. These algorithms are not applied to the domain name as a whole, but rather to individual labels. For example, if the domain name is www.example.com, then the labels are www, example and com, and ToASCII or ToUnicode would be applied to each of these three separately.

The details of these two algorithms are complex, and are specified in the RFCs linked at the end of this article. The following gives an overview of their behaviour.

ToASCII leaves unchanged any ASCII label, but will fail if the label is unsuitable for DNS. If given a label containing at least one non-ASCII character, ToASCII will apply the Nameprep algorithm (which converts the label to lowercase and performs other normalization) and will then translate the result to ASCII using Punycode before prepending the 4-character string "xn--". This 4-character string is called the ACE prefix, where ACE means ASCII Compatible Encoding, and is used to distinguish Punycode-encoded labels from ordinary ASCII labels. Note that the ToASCII algorithm can fail in a number of ways; for example, the final string could exceed the 63-character limit for the DNS. A label on which ToASCII fails cannot be used in an internationalized domain name.

ToUnicode reverses the action of ToASCII, stripping off the ACE prefix and applying the Punycode decode algorithm. It does not reverse the Nameprep processing, since that is merely a normalization and is by nature irreversible. Unlike ToASCII, ToUnicode always succeeds, because it simply returns the original string if decoding would fail. In particular, this means that ToUnicode has no effect on a string that does not begin with the ACE prefix.

Example

As an example of how IDNA works, suppose the domain to be encoded is Bücher.ch. This has two labels, Bücher and ch. The second label is pure ASCII, and so is left unchanged. The first label is processed by Nameprep to give bücher, and then by Punycode to give bcher-kva, and then has xn-- prepended to give xn--bcher-kva. The final domain suitable for use with the DNS is therefore xn--bcher-kva.ch.

Spoofing concerns

Because IDNA allows websites to use full Unicode names, it also makes it possible to create a spoofed web site that looks exactly like another, including domain name and security certificate, but in fact is controlled by someone attempting to steal private information. These spoofing attacks potentially open users up to phishing attacks.

The spoofing attacks exploit the fact that different Unicode characters in different languages can look the same, depending on the font used. For example, Unicode character U+0430, Cyrillic small letter a ("а"), can look identical to Unicode character U+0061, Latin small letter a, ("a") which is the lowercase "a" used in English. Technically, characters that look alike in this way are known as homographs.

Although the browser may display identical glyphs for each character, it uses differing representations (in plain text, Unicode or Punycode) when locating the web sites or validating certificates. As a result, someone could register a domain name that appears identical to an existing domain but goes somewhere else. For example, the spoofed domain "pаypal.com" contains a Cyrillic a, not a Latin a.

On February 7 2005, Slashdot reported that this exploit was disclosed at the hacker conference Schmoocon with an example available at http://www.shmoo.com/idn/. On browsers supporting IDNA, the URL "https://www.pаypal.com/" appears to lead to paypal.com but instead leads to a spoofed PayPal web site that says "Meeow." Mozilla Firefox, which supports IDNA, shows the page as being at the paypal.com and with a verified security certificate. Firefox displays no warnings of any sort.

It is possible to work around this problem in Firefox, Mozilla and other Gecko-based browsers by turning off IDN support entirely. To do this, type "about:config" into the address bar, bringing up the list of browser settings. Then find the "network.enableIDN" setting, and change the value to "false". The browser will then report IDN URLs as nonexistent. Note that on some versions (particularly, Firefox 1.0), this work-around only works for the first session only. Closing the browser and restarting leaves the user vulnerable again (though the option remains disabled). This can be corrected by clearing the browser's cache.

On February 8 2005, a cross-browser method to disable IDN was released by Michael Scovetta of Scovetta Labs. It involves creating a custom proxy auto-configuration script that will deny access to any sites using IDN in the host name. A walkthrough tutorial is available at http://www.scovettalabs.com/download/IDNWalkthru.pdf, and a proxy.pac file is available at http://www.scovettalabs.com/download/IDNproxy.pac. This script has been reported to work for Firefox, Opera, and Mozilla across all platforms, but will need to be modified if a proxy is already in use.

On February 17, 2005, Mozilla developers announced that they will ship their next versions with IDN support still enabled, but showing the punycode URLs instead, thus thwarting any attacks while still allowing people to access websites on an IDN domain. This is a change from the earlier plans to disable IDN entirely for the time being. They are working to find a long-term resolution to the problem. [1]

DNS registries known to have adopted IDNA

External links



The contents of this article are licensed from Wikipedia.org under the GNU Free Documentation License.
How to see transparent copy

01-04-2007 01:21:04