Tuesday, September 23, 2008

Getting started with Indic font development

Creating Indic fonts for Unicode is no longer a matter of drawing the letters, or parts of letters, and mapping them to the characters on the keyboard. Some of us call this old technique 'hacked' encoding.

In other words, the the 'characters' in the font are not what they actually are! This technique does not work well in modern applications that are becoming more and more Unicode aware. Applications attempt to 'predict' the language of the text. For this to be possible, Unicode is the only solution. Since I'm going to be talking mostly about Indic Unicode fonts, the term Unicode fonts used in a general sense often refers to Indic Unicode fonts in these articles.

Creating Unicode fonts throws a number of challenges. Below are the main ones.

1. Creating the glyph repertoire. Unicode fonts generally will have more glyphs than characters. (To understand the difference, see http://unicode.org/reports/tr17/). I typically do not sit and decide all the glyphs required before I start drawing them, especially when I am developing the first font for a script.

2. Understanding how the characters combine to form 'letters'. I enjoy learning to read different scripts -- so for me, this is the fun part :). There are plenty of documents available on the Web. However, they all seem to give very general information and not a single one is sufficient to understand the rules. I'll provide details from what I understood from each script as I go along.

3. Implementing the combining rules in the layout technology: OpenType for Windows (and some Linux) and AAT for Mac OS. This is something legacy font designers who created hacked fonts find it hard to understand. Why should a typeface designer write computer code? They are right. Which is why good typography for Indic fonts has a lot more work than just designing the type.

The challenges above are not presented in any order. In fact, I look at all of them before I start working on the font. Creating the first font is the most difficult. Subsequent ones are just cookie-cut, unless you want to introduce contextual behavior for the shapes. More on this later.

I've developed fonts for Tamil, Malayalam, Hindi, Sinhala, Jawi (Arabic script to write Malay). Most recently, I worked on Bengali and Telugu. Since Telugu is fresh in my head, and stretched my understanding of complex scripts quite a bit, I may start with that first.

With the exception of Tamil and Malay, I am not a native speaker of any of these languages. I've learnt to read the script but I have no idea what most of the words mean. So, there are bound to be finer details that I may have missed out. However, the fonts I created are good for general use. My hope is to make these articles useful to other non-native speakers as well.

All Indic languages share common characteristics. Once you know one, to learn another script is quite trivial.

In all cases, I attempted a font to learn the script.

No comments: