Saturday, November 20, 2004

The Technical Nitty-Gritty: Telugu on Win XP/2k3

(Couldn't get easier than this, trust me on this. Easier to get it out of the way first before going on to the other funky stuff)

  1. Go to Start > Control Panel > Regional Language and Settings.
  2. Select the second tab.
  3. Tick the checkmark for "Install Complex Script Support (Including Thai)
  4. Pop the CD in. Get the 10MB install.
  5. Go to the next tab. Select the language bar. Add Telugu to the list.
  6. Take a deep breath. You're done!

The good folk at Bhasha India seem to have some screenshots up on the process. Bear in mind, though, that Telugu (as also Kannada, Gurmukhi, Bengali, Oriya, Malayalam, Gujarati) is (are) currently unsupported in Win2k. Also bear in mind that the default Telugu keyboard is Inscript; it is NOT RTS-compatible.

Our next post, which will be in a couple of hours' time, will deal with these two things, getting Telugu up at an OS-level in win2k, and then later extending that to give native RTS-compatibility. Yup, found a hack, folks.

Towards Telugu Language Processing: Aims

(Where we set the context in which our efforts lie)

1) Aims:

We'll first define what we mean by Telugu language processing (TLP; a totally stupid abbreviation, but very convinient). We identify Telugu language processing as being able to :-

  1. Read OS-recognisable, widely copy-pastable, standards-compliant Telugu characters.
  2. Type such Telugu characters in regular, standards-compliant HTML files, through standards-compliant browsers and/or text editors with all regular word-processing actions.
The emphasis here, therefore, is on the following:-
  1. Proper rendering of Telugu characters
  2. A universal, or near-universal, replication of the same rendering in other systems

This, we believe, is possible only with an intra-, and an inter-, OS level recognition of our rendered Telugu characters as being Telugu characters, rather than simply getting the right shapes and figures up. We achieve this by choosing an internationally accepted character standard, Unicode, and seeing how we may render Telugu characters in the said standard.

2) Steps to TLP-compliance using Unicode:

A finer point here, but we distinguish between "Unicode Telugu compliance", which is being able to read Unicode Telugu glyphs, and "TLP-compliance using Unicode", which is being able to read and write Unicode glyphs. Unicode in itself isn't our objective, but is a very useful means to the said objective.

Broadly speaking, there are three steps to being fully TLP-compliant (in no particular order, not chronological, not on the basis of priority):-

  • Getting the font right: Being able to see Telugu letters ('aksharaalu')
  • Getting the rendering right: Being able to see rendered Telugu glyphs ('guNintaalU, vottulatO aksharaalu')
  • Typing them in.

A final point of clarification before ending this post. While I elaborated on Telugu characters specifically, it is important to bear in mind that the effort for other Indic languages is very similar; we focus on Telugu first, and then extend it to other Indic languages.

How ZeM posts work: This is a title.

(This is a quasi-abstract; we try to summarize everything here into neat, simple sentences)

1) This is a Level 1 Heading.

    1.1) This is a Level 2 Heading.
This is how we cite - Me

This is a para where our actual text would lie. While this post in itself is useless for you readers, it is nevertheless, a ready reckoner for me while I'm posting. My next post onwards will be 100% on-topic.

Welcome to ZeM!

This blog actually grew out of my efforts at explaining how to be able to read my Telugu blog, di. di. ka. I made the 'mistake' of googling before posting there, and presto, suddenly hit paydirt; found stuff that I've been searching for over two years now. Which is when I realised a single post wouldn't be enough for this; Indic language processing, in fact, would be an ongoing technical project that's best discussed in English to reach a wider audience.

Just to give a slightly brief personal overview, and as I've mentioned in an email I just wrote, am obsessive enough about the topic to be even fanatical at times. I was, indeed, about to spend all of Friday night working on this in my cubicle, when a co-worker came over and basically dragged me out to dinner; the time lag between my post at Flocci and this one can basically be explained by some garlic naan, palak paneer and chicken tikka.

Things that, of course, will not stop ZeM anymore; while I don't want to promise things that I may or may not deliver, let's just say I'm on the verge of doing something very very exciting, and frankly, I still haven't gotten over the initial exhuberance.