Friday, December 29, 2006

????'s When You Post in a Forum With Safari?

Someone in the Apple discussions asked why his Greek and Cyrillic postings in another forum turned into question marks when he used Safari but displayed correctly when he used FireFox.

This occurs because some forums have the encoding of their web pages set to Latin-1 even though it is understood that members will post in languages that cannot be covered by that charset. When that is the case, a non-Latin character has to be converted into a "Numerical Character Reference (NCR)" escape code. For example, Greek Alpha α becomes &#945, where the number is the Unicode code point (decimal) for the character. This is essentially an html kludge developed for the situation years ago when computer and internet technology was so limited that the only safe way to display non-Latin characters was to convert them to such ASCII-only codes.

It happens that FireFox, when faced with a page where the encoding is totally wrong for the characters being input, will automatically produce these NCR's instead of the real character. Safari does not do that, so what it puts into the forum appears as question marks when viewed as Latin-1.

If you must use Safari, download/install UnicodeChecker, set its Preferences/XHTML to Decimal, and use it to convert your non-Latin text into NCR's before posting. This can be done by selecting the text and going to Safari > Services > Unicode > Unicode to HTML Entities.

Forums intended to accommodate languages beyond English and those of W. Europe should, if at all possible, have the encoding UTF-8 and not Latin-1. The Apple forums themselves are a good example of the correct approach, where Safari works perfectly to input any language you want.

Tuesday, December 26, 2006

Foreign Language Broadcasts via Internet

For listening to foreign language news and other broadcasts, internet radio is taking the place that used to be occupied by short-wave listening for many people. A Google search will quickly bring you to the web pages of a huge number of stations that offer listening to their live broadcasts in RealAudio, WindowsMedia, or other formats. If you want a more systematic way of doing this, I recommend the Reciva Radio Portal, which has over 5000 stations in its database and lets you create customized lists of the ones you want to listen to.

For something more radio-like, check out the AE Wifi Internet Radio, which is essentially a small dedicated computer that connects to the Reciva Portal over your local wireless network. It's available from various sources in the U.S. I got one for Christmas and it works beautifully.

Sunday, December 24, 2006

Writing Phags-Pa

Phags-pa is a script once used for Mongolian and Chinese and occasionally today for Tibetan. Andrew West has just produced a font for this which can be downloaded here. A keyboard layout for OS X which mirrors Andrew's version for Windows can be gotten from my iDisk.

Unfortunately as far as I can tell neither OS X nor X11 apps can yet display Phags-pa totally correctly, only Windows. Here is a test page.

Update Nov. 2007: OS X 10.5 Leopard can display Phags-pa correctly in Safari and Pages, but not in TextEdit.

Thursday, December 21, 2006

Where Is The Romanian S-Comma?

A Mac user in California asked today where they could find the ș (s-comma) needed for Romanian in the OS X keyboard layout for that language. The answer is that this character happens to be on a key which does not exist on keyboards sold in the US (known as ANSI or 101 keyboards), but only on keyboards sold in Europe (known as ISO or 102).

The solution is to download and install an alternative Romanian layout. Two sources for one can be found here.

How could Apple do this? I don't have the details, but I think it may be because Macs sold in Romania have a physical keyboard labeled for use in more than one country/language in the region, which required some compromises in key placement. The ș wound up on the extra key, and the layout provided with OS X has to follow that, so Romanians in Romania can type their own language.

Wednesday, December 20, 2006

Problems Using PC Keyboards?

Recently someone was trying to type on his Mac with a PC Arabic keyboard and reported that the layout did not match what OS X was using. Unfortunately that's true for a lot of languages other than US English -- Mac and PC keyboards may have somewhat different layouts, and OS X only has software for the Mac versions. Since the creation of the Mac Mini more people seem to be using PC keyboards.

The solution is to install custom layouts that match the PC models. On my iDisk you can find some for Arabic, Russian, Azeri, Urdu, Mongolian, and Tamil. For various European languages, you can get layouts that may work better with PC keyboards from Logitech, as explained here. Beyond that, you can use Ukelele to make your own.

Monday, December 18, 2006

Macedonian Keyboard Error

Yesterday a poster in the Apple Forums pointed out that the Macedonian keyboard layout supplied by Apple has a mistake in it, where the character з (U+0437) has been replaced by э (U+044D). The uppercase version is OK, however. Also the layout has as deadkey which does not belong there. A corrected keyboard, Macedonianz.keylayout, is available on my iDisk.

I checked the Panther layout and found the same errors. It seems strange I've never seen this reported earlier. Is it possible that before now no one has used OS X to type Macedonian since Panther was released?

Friday, December 15, 2006

Writing Ancient Egyptian

The most common ways of representing Ancient Egyptian are hieroglyphs and their Latin transcription.

The total number of hieroglyphs, for example recorded in Hieroglyphica, is nearly 7000. They have not yet been put in Unicode, though a proposal to cover a basic set of about 1200 of them is in the works. In the meantime, the solution is to use custom non-Unicode fonts along with special editing programs that let you arrange the symbols in the various ways they are found naturally. On OS X, MacScribe can be used for this. An example of how it works can be seen here.

For Latin transcription alone, Unicode is possible. The most common standard alphabet currently used has 24 letters (all consonents, no vowels):

ȝ ỉ y ʿ w b p f m n r h ḥ ḫ ẖ s š ḳ k g t ṯ d ḏ

10 of these are not found in English. They can all be entered via the Character Palette in OS X, but it is a lot easier to use a custom keyboard, such as the EgyptTrans.keylayout you can download from my iDisk. There are also transcription systems which use only ASCII, such as found in the Manuel de Codage. MacScribe uses a system like this for input.

Egyptian was also written in the hieratic and demotic scripts and in Coptic. For the first two I am unaware of any Unicode proposals or fonts, but Coptic is in Unicode 4.1 and you can download a Coptic2005.keylayout from my iDisk. You also need one of the fonts that covers Coptic -- ALPHABETUM Unicode, Code2000, MPH 2B Damase, or New Athena Unicode.

Friday, December 8, 2006

Writing Esperanto

Esperanto is the most popular of various artificial languages, invented in 1887 by L.L. Zamenhof. For good info, check out the Wikipedia article.

Esperanto uses essentially the same alphabet as English, but with the extra letters ĉ, ĝ, ĥ, ĵ, ŝ and ŭ. To type these in OS X, you can activate the the US Extended keyboard layout in System Preferences/International/Input menu. The letters with the ^ (circumflex) over them can be typed by doing Option + 6 followed by the letter itself. The ŭ (u-breve) is made by doing Option + b followed by u.

You can also download an Esperanto keyboard layout from my iDisk, which will let you type the special letters more easily (the accented characters are at Option + the base character).