Saturday, December 8, 2012

Unicode Bug in Pages

A poster in the ASC forums recently reminded me of a weird long-standing bug in the Pages app :  You cannot directly input the Unicode characters ZWJ (zero width joiner, 200D) or ZWJN (zero width non-joiner, 200C).  When you try to do so, they simply don't ever appear in the text.  I don't know of any other app which has this problem, which was noted in this blog back in 2008.

The main result of this bug is that there are certain character sequences used languages which employ the Arabic, S. Asian, and SE Asian scripts which cannot be written properly. A particularly notable example is the "Sri" in the name of the country Sri Lanka.  In its native Sinhala script, this is written with the sequence 0DC1 0DCA 200D 0DBB 0DD3.  When the 200D is left out, the result is wrong as shown here :

A possible workaround is to write your text which needs these characters in TextEdit and copy/paste into Pages.

