Wednesday, November 29, 2006

Typing Tifinagh

Tifinagh is a script used for some Berber languages, in particular Tamazight in Morocco. For more info see this page. An experimental keyboard, Tifinagh.keylayout, is available on my iDisk.

Fonts which contain Tifinagh include Code2000, Hapax Berbère, Hapax Touareg, Hapax Touareg DàG, MPH 2B Damase.

Tuesday, November 28, 2006

Typing Shavian

Shavian is a script named after G. B. Shaw designed to represent English with simple, phonetic characters. For more info see this page. "Shavian" written in Shavian looks like this:



There are only three fonts that contain the Unicode Shavian range, Code2001, Andagii, and MPH 2B Damase. For input I have made an experimental Shavian.keylayout available on my iDisk.

Monday, November 27, 2006

Typing Mongolian (Cyrillic)

Mongolian Cyrillic uses the same alphabet as Russian, but with two extra characters, Өө and Үү. Unfortunately these are not included as options in any of the Cyrillic keyboard layouts included with OS X, so you need to use the Character Palette or install a custom layout. You can download a MongolianCYR and MongolianQWERTY layouts that do have them here.

Lucida Grande is the only font that comes with OS X that has the two extra characters. If you have Office2004, then the Arial, Monaco, Times, and Times New Roman that come with it should also work. Others you can download are Everson Mono, Charis SIL, Doulos SIL, Code2000, and Gandhari Unicode. Some MS Office Chinese fonts have them, but these are double-width.

Some users have reported that the keyboard works in every app except MS Word. If that is a problem for you, try typing some Mongolian in another app and then copy/pasting into Word. This may force it to recognize the keyboard.

Saturday, November 25, 2006

Typing Yi/Lolo

Yi (or Lolo) is spoken by 4-5 million people in SW China and is written using 1165 characters representing individual syllables. For some interesting info, including lists of the syllables and sample text, see the Babelstone Yi Page.

The logical way to type Yi is with an input method like those used for pinyin Chinese. OS X includes a facility for creating custom IM's, so I made an experimental one for Yi. You type in the Latin letters for the syllable, hit return, and the right character is placed in your text.

To install this IM, download the file yi16.txt.dat from my iDisk. Then use the "Generate IM Plug-in" command in the Traditional Chinese input method menu, and select Yi in the latter. If you are using something earlier than Tiger, you follow a different procedure -- see this page for details.

Friday, November 24, 2006

For Unicode Sanskrit, Try OpenOffice

Doing Unicode Sanskrit in OS X apps like TextEdit or Pages faces two problems: You are restricted to the one font Devanagari MT, and this font cannot do some conjunct forms or handle the stress marks for Vedic texts correctly. My tests indicate that the app OpenOffice/X11 (not OpenOffice 3) can use Windows fonts to display correct Devanagari and some of these, like Sanskrit 2003, will position stress marks as they should be and also create proper conjuncts. A screenshot of the first line of the Rig Veda showing the stress marks can be found here.

Aurabesh?

Aurabesh is not a language or script, but an alternative alphabet for English which is used in the Star Wars Saga. It is not in Unicode and never will be, but you can play with it by downloading a font like this one, and using it while typing normally with the US keyboard layout. My name in Aurabesh looks like this:



Here is some more info.

Thursday, November 23, 2006

Doing Sanskrit with Dvorak

Anyone who wants to input Devanagari and transliterated Sanskrit using Dvorak keyboard layouts is welcome to try those located here in the folder sanskritdvorak. Our thanks to Paul Alix and also David Mundie.

Wednesday, November 22, 2006

Why Does My French Turn Into Chinese?

One of the common questions I see in the Apple Mail forum is from people using European languages who find their messages contain strange Chinese characters when received on a PC with Windows Outlook.

Here is my understanding of how this happens.

Certain kinds of messages are sent by Mail with two copies -- one in plain text with the charset UTF-8, and one in html with the charset Latin-1. There appear to be two bugs in Outlook. The first one causes it to confuse the two encodings and read Latin-1 characters beyond ascii in the html copy as if they were UTF-8. So, for example in the French phrase

pensé qu'il

it sees the é + space + q as a series of 3 bytes, E9 20 71, forming one character. (In UTF-8 a byte beginning with E signals a 3 byte character.)

E9 20 71 is not in fact a valid UTF-8 sequence, but Windows or Outlook has another bug: It doesn't care whether the sequence is valid or not. It looks at the binary for the last two bytes this sequence, which is

(E9) 00100000 01110001

and only reads the last 6 bits of each of them, assuming that the first 2 are 10 (which is what valid UTF-8 should normally have) instead of 00 and 01. So it interprets this as (E9) 10100000 10110001 or E9 A0 B1, which is valid UTF-8 for 頱. Thus "pensé qu'il" becomes "pens頱u'il."

Other accented characters may give different results, including question marks or complete absence of the character.

I don't know whether Vista will have the same behavior.

For fixes for this problem, see this note.

Unicode CJK Extension C

Are you dying to know what Chinese characters will likely be added to Unicode when CJK Extension C is approved? The contents of the current proposal for 4000+ new characters can be found at these urls:

http://std.dkuug.dk/JTC1/SC2/WG2/docs/N3134.doc
http://std.dkuug.dk/JTC1/SC2/WG2/docs/N3134A1.pdf
http://std.dkuug.dk/JTC1/SC2/WG2/docs/N3134A2.pdf
http://std.dkuug.dk/JTC1/SC2/WG2/docs/N3134A3.pdf
http://std.dkuug.dk/JTC1/SC2/WG2/docs/N3134AB.xls

Vietnamese in OpenOffice/X11

A poster in the Apple Forums has pointed out that none of the standard keyboards used for Vietnamese input will work in OpenOffice, because it doesn't recognize their deadkeys, which are essential for typing the large number of diacritics used in this language. It turns out that you have to make a custom X11 keymapping file in order to do Vietnamese in this app. The basic info is contained in this note:

Making a Keyboard for OpenOffice/X11

And more details can be found here.

Tibetan in OS X

Until recently, I thought that the only way to do correct Unicode Tibetan in OS X was to purchase the Tibetan language kit from XenoTypeTech. This is because OS X requires an AAT font for this script, and that was the only source. But I have discovered that the program OpenOffice/X11 can display correct Tibetan using free Windows OpenType fonts. Also that there are some free keyboard layouts which work with OpenOffice. For full info, see my note at

Typing Tibetan

Missing Keyboards

Scripts included in Unicode 5.0 for which fonts exist but there is as yet no Mac input keyboard are:

Syloti Nagri, Kharoshthi, Tagalog, Hanunoo, Buhid, Deseret

Unicode 5 Scripts

Unicode 5, released August 2006, includes 5 new scripts. Here is the current status of their usability on the Mac as far as I know. Where there is a font but no keyboard, the characters can of course still be entered directly from the Character Palette.

N'ko

Fonts: Code2000
Keyboards: Xenotypetech

Phoenician

Fonts: ALPHABETUM Unicode, Code2001 and MPH 2B Damase
Keyboards: Phoenician.keylayout

Phags-Pa

Fonts: Babelstone
Keyboards: Phags-pa.keylayout

Balinese

Fonts: None
Keyboards: None

Sumero-Akkadian Cuneiform

Fonts: (Hittite glyphs) FreeIdg
Keyboards: None

Unicode 4.1 Scripts

Unicode 4.1, released in the Spring of 2005, included 8 new scripts. Here is the current status of their usability on the Mac as far as I know. Where there is a font but no keyboards, characters can of course still be entered directly using the Character Palette. But it seems like we should have more keyboards by now.

Buginese

Fonts: Code2000, MPH 2B Damase
Keyboards: Xenotypetech

Glagolitic

Fonts: Dilyana, MPH 2B Damase
Keyboards: Redlers.com

Coptic

Fonts: ALPHABETUM Unicode, Code2000, MPH 2B Damase, New Athena Unicode
Keyboards: Coptic2005

Tifinagh

Fonts: Code2000, Hapax Berbère, Hapax Touareg, Hapax Touareg DàG, MPH 2B Damase
Keyboards: Tifinagh.keylayout

Syloti Nagri

Fonts: MPH 2B Damase
Keyboards: none

Old Persian

Fonts: ALPHABETUM Unicode, Code2001, MPH 2B Damase
Keyboards: OPersian

Kharoshthi

Fonts: ALPHABETUM Unicode, MPH 2B Damase
Keyboards: none

New Tai Lue

Fonts: none
Keyboards: none

Unicode 4.0 Scripts

Unicode 4.0, released in the Spring of 2003, included 5 new scripts. Here is the current status of their usability on the Mac as far as I know. Where there is a font but no keyboards, characters can of course still be entered directly using the Character Palette.

Limbu

Fonts: Code2000, MPH 2B Damase
Keyboards: Xenotypetech


Tai Le

Fonts: Fixedsys Excelsior, MPH 2B Damase, Tai Le Valentinium
Keyboards: Xenotypetech


Linear B

Fonts: ALPHABETUM Unicode, Code2001, MPH 2B Damase, Penuturesu
Keyboards: LinearB.keylayout


Cypriot

Fonts: ALPHABETUM Unicode, Code2001, MPH 2B Damase
Keyboards: Cypriot.keylayout

Ugaritic

Fonts: ALPHABETUM Unicode, Andagii, Code2001, MPH 2B Damase
Keyboards: Ugaritic.keylayout

Osmanya

Fonts: Andagii, Code2001, MPH 2B Damase
Keyboards: Xenotypetech

Shavian

Fonts: Andagii, Code2001, MPH 2B Damase
Keyboards: Shavian.keylayout

Unicode 3.2 Scripts

Unicode 3.2, released in March 2002, included 4 scripts not currently part of OS X. Here is the status of their usability on the Mac as far as I know. Where there is a font but no keyboards, characters can of course still be entered directly using the Character Palette.

Tagalog

Fonts: Baybayin Lopez, Bikol Mintz, Bisaya Hervas, Fixedsys Excelsior, Tagalog Doctrina 1593, Tagalog Stylized
Keyboards: none


Hanunoo

Fonts: MPH 2B Damase
Keyboards: none


Buhid

Fonts: Code2000
Keyboards: none


Tagbanwa

Fonts: Tagbanwa Font
Keyboards: Tagbanwa.keylayout

Unicode 3.1 Scripts

Unicode 3.1, released in March 2001, included 3 scripts not currently part of OS X. Here is the status of their usability on the Mac as far as I know. Where there is a font but no keyboards, characters can of course still be entered directly using the Character Palette.

Old Italic

Fonts: ALPHABETUM Unicode, Cardo, Code2001, MPH 2B Damase
Keyboards: Redlers.com


Gothic

Fonts: ALPHABETUM Unicode, Cardo, Code2001, MPH 2B Damase, Vulcanius
Keyboards: Gothic.keylayout


Deseret

Fonts: Code2001, MPH 2B Damase, Apple Symbols
Keyboards: None

Unicode 3.0 Scripts

Unicode 3.0, released in September 1999, included 10 scripts not currently part of OS X. Here is the status of their usability on the Mac as far as I know. Where there is a font but no keyboards, characters can of course still be entered directly using the Character Palette.

Syriac

Fonts: Beth Mardutho
Keyboards: Pormann and AramaicNT


Thaana

Fonts: Code2000, Free Serif, MPH 2B Damase, MV Boli, Mv Elaaf, Mv GroupX Avas, Mv Iyyu, Mv Lady Luck, Mv MAG Round, Mv Sega, Thaana Unicode Akeh, TITUS Cyberbit Basic
Keyboards: Quinon


Sinhala

Fonts: Xenotypetech
Keyboards: Xenotypetech


Myanmar

Fonts: Xenotypetech
Keyboards: Xenotypetech

Ethiopic

Fonts: SIL
Keyboards: SIL

Ogham

Fonts: ALPHABETUM Unicode, Caslon, Code2000, Everson Mono Unicode, Fixedsys Excelsior, TITUS Cyberbit Basic, Beith-Luis-Nion, Beth-Luis-Nion, Cog, Craobh Ruadh, Crosta, Everson Mono Ogham, Maigh Nuad, Pollach, Ragnarok Ogham, TITUS Ogham
Keyboards: Evertype

Runic

Fonts: ALPHABETUM Unicode, Cardo, Caslon, Chrysanthi Unicode, Code2000, Everson Mono Unicode, Fixedsys Excelsior, Free Monospaced, Hnias, Junicode, TITUS Cyberbit Basic, Junicode
Keyboards: Thomaswebb Rune Keyboard

Khmer

Fonts: Xenotypetech
Keyboards: Xenotypetech

Mongolian

Fonts: Code2000, NSimSun-18030, SimSun-18030, STFangsong, STHeiti, STKaiti, STSong
Keyboards: Manchu Keyboard

Yi

Fonts: Code2000, NSimSun-18030, SIL Yi, SimSun-18030, STFangsong, STHeiti, STKaiti, STSong
Keyboards: yi16.txt.dat

Unicode 1.0 and 2.0 Scripts

Unicode 1.0, released in June 1993, and 2.0, released in July 1996, included 8 scripts not currently part of OS X. Here is the status of their usability on the Mac as far as I know. Where there is a font but no keyboards, characters can of course still be entered directly using the Character Palette.

Bengali

Fonts: Ekushey
Keyboards: Ekushey


Oriya

Fonts: None
Keyboards: None


Telugu

Fonts: Nick Shanks and Xenotypetech
Keyboards: Xenotypetech and Telugu.keylayout.


Kannada

Fonts: Nick Shanks and Xenotypetech
Keyboards: Nick Shanks and Xenotypetech

Malayam

Fonts: Xenotypetech
Keyboards: Xenotypetech, and here.

Lao

Fonts: Alice0–Alice5, Arial Unicode MS, Code2000, JG Basic Lao, JG Chantabouli Lao, JG Lao Old Arial, JG Lao Oldface, JG LaoTimes, Lao Unicode, Phetsarath OT, Saysettha OT, Saysettha Unicode, VanVieng Unicode, XiengThong Unicode
Keyboards: lao.keylayout

Georgian

Fonts: Arial Unicode MS, BPG Classic 99U, BPG Paata Khutsuri U, Code2000, Everson Mono Unicode, MPH 2B Damase, Sylfaen, TITUS Cyberbit Basic
Keyboards: Apple Georgia and Quinon

Tibetan

Fonts: Xenotypetech
Keyboards: Xenotypetech, and here.

Missing Scripts

Below is a list of the scripts included in Unicode 5.0 but not yet part of the fonts/keyboards supplied with OS X 10.4. In many cases these scripts can nonetheless be used on the Mac by downloading or purchasing components from the Internet.

N'ko, Phoenician, Balinese, Phags-Pa, Sumero-Akkadian Cuneiform, Buginese, Glagolitic, Coptic, Tifinagh, Syloti Nagri, Old Persian, Kharoshthi, New Tai Lue, Limbu, Tai Le, Linear B, Cypriot, Ugaritic, Osmanya, Shavian, Tagalog, Hanunoo, Buhid, Tagbanwa, Old Italic, Gothic, Deseret, Syriac, Thaana, Sinhala, Myanmar, Ethiopic, Ogham, Runic, Khmer, Mongolian, Yi, Bengali, Oriya, Telugu, Kannada, Malayam, Lao, Georgian, Tibetan.

Apple Internationalization Status (2006)

The OS X user interface supports the following language localizations for menus and dialogues: English, Japanese, French, German, Spanish, Italian, Dutch, Swedish, Danish, Norwegian, Finnish, Traditional Chinese, Simplified Chinese, Korean, and Brazilian Portuguese. Russian is also available for download from apple.ru.

OS X display can handle any language covered by Unicode and for which an appropriate font has been installed, although individual apps may have lesser capabilities. Apple itself provides input keyboards and fonts for Arabic, Azeri, Armenian, Bulgarian, Byelorussian, Catalan, Cherokee, Chinese (simplified and traditional), Croatian, Czech, Danish, Dari, Devanagari, Dutch, English, Estonian, Faroese, Finnish, French, German, Greek (regular and polytonic), Gujarati, Gurmurkhi (Punjabi), Hawaiian, Hebrew, Hungarian, Icelandic, Inuktitut, Irish, Italian, Japanese, Korean, Latvian, Lithuanian, Macedonian, Maori, Nepali, Northern Sami, Norwegian, Pashto, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Tamil, Thai, Turkish, Ukrainian, Uzbek, Vietnamese, and Welsh.

The iPod user interface supports Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Simplified Chinese, Spanish, Swedish, Traditional Chinese and Turkish.

iPod display of song info and notes covers the interface languages plus Bulgarian, Croatian, Romanian, Serbian, Slovak, Slovenian and Ukrainian.