Saturday, February 24, 2007

Missing Vietnamese Characters?

If you are having trouble browsing Unicode Vietnamese web pages, like the BBC Vietnamese site, seeing boxes or question marks instead of characters with two diacritics, it probably means you need an extra font. The problem arises because certain Vietnamese sites stipulate the use of the Arial font in their html code, but the Arial (2.60) that comes with OS X 10.4 does not contain the precomposed Vietnamese characters in the Latin Extended Additional Unicode block. Most people do not notice a problem, because at some point they installed a trial or other version of MS Office2004, which puts a more complete version of Arial (3.05) in your Home/Library/Fonts folder. If you *do* have missing characters, try to find a copy of this font for your machine.


Andj said...

Actually, the situation with BBC Vietnamese is somewhat more complicated. Some articles (esp. older articles) were typed with third party Vietnamese input systems that used precomposed characters only.

At some point there was a shift in BBC, and the Microsoft Vietnamese keyboard layout was used. This uses combining diacritics for the 5 tones.

Articles viewed on the site may contain a mix of these two approaches.

Tom Gewecke said...

Thanks for that info! Indeed, I checked the lead story on the site right now and found that the first 10 paragraphs were precomposed and the next 9 decomposed!