Thursday, April 17, 2008

Internationalized Domain Names

In recent months Internationalized Domain Names -- url's and email addresses written in scripts other than Latin -- have been set up for testing by ICANN. You can see whether your browser is equipped to handle the IDNA protocol which these use by clicking on links at the bottom of this page. You can similarly test your email client here.

Note the difference in what appears in the browser address bar when you point Safari at the topmost (Arabic) site and the (9th down) Russian site. The former will be in the native script, while the latter will be in an ASCII translation called Punycode. This is done because Russian script and Roman script can be confusable and create security problems. Which scripts generate Punycode is determined by a "whitelist" in the Safari app. Info on this and other aspects of Safari support for IDNA can be found here.

IDNA is currently limited to the range of scripts included in Unicode 3.2 in 2002. Since then nearly 30 more have been added, and the IETF is working on an update that will accommodate Unicode 5.1 and any future version


Leif Halvard Silli said...

To discriminate against one script, Cyrillic, because it looks similar now and then, to Latin, is - well - discrimation. And, btw, the Latin letters also resembles the Cyrillic ones ...

The issue was better solved by using a font in the URL field which accentuated script differences instead of script likenesses. For instance, there are older forms of Cyrillic letters, which are not so easily taken for Latin letters. Etc.

Additionally, the browser could be to pay special attention to mixing of differing scripts in the same word etc, and to use use a warning color, in such cases, to let the user understand that this or that letter must be interpreted differently from the surrounding letters.

The same would also have to apply to numbers.

Tom Gewecke said...

You might want to convey your ideas to the people who make browsers, but I don't think they are practical myself.

The default Safari white list excludes Cherokee and Greek in addition to Cyrillic. You are totally free to add any or all of these to the list if you want to. I don't know offhand how FireFox and Opera handle the problem.

prasanna said...

hi, here is a site which as more than 100 international languages for Internationalized Domain Names try international domain