Towards Telugu Language Processing: Aims
(Where we set the context in which our efforts lie)
1) Aims:
We'll first define what we mean by Telugu language processing (TLP; a totally stupid abbreviation, but very convinient). We identify Telugu language processing as being able to :-
- Read OS-recognisable, widely copy-pastable, standards-compliant Telugu characters.
- Type such Telugu characters in regular, standards-compliant HTML files, through standards-compliant browsers and/or text editors with all regular word-processing actions.
- Proper rendering of Telugu characters
- A universal, or near-universal, replication of the same rendering in other systems
This, we believe, is possible only with an intra-, and an inter-, OS level recognition of our rendered Telugu characters as being Telugu characters, rather than simply getting the right shapes and figures up. We achieve this by choosing an internationally accepted character standard, Unicode, and seeing how we may render Telugu characters in the said standard.
2) Steps to TLP-compliance using Unicode:
A finer point here, but we distinguish between "Unicode Telugu compliance", which is being able to read Unicode Telugu glyphs, and "TLP-compliance using Unicode", which is being able to read and write Unicode glyphs. Unicode in itself isn't our objective, but is a very useful means to the said objective.
Broadly speaking, there are three steps to being fully TLP-compliant (in no particular order, not chronological, not on the basis of priority):-
- Getting the font right: Being able to see Telugu letters ('aksharaalu')
- Getting the rendering right: Being able to see rendered Telugu glyphs ('guNintaalU, vottulatO aksharaalu')
- Typing them in.
A final point of clarification before ending this post. While I elaborated on Telugu characters specifically, it is important to bear in mind that the effort for other Indic languages is very similar; we focus on Telugu first, and then extend it to other Indic languages.
4 Comments:
Any upcoming posts on browser support?
You do know that the telugu scripts look awful in Mozilla Firefox. Any updates on that situation? Did they correct the issues in a fresh release?
By Telugu fonts, you mean the stuff that Eenadu and others use? Then, fuhgetaboutit; those are NOT Unicode fonts. They are teh evil.
There is already one bug rised abour some rendering problems in mozilla, in telugu unicode script.
But mozilla, with pango rendering on Linux is supposed to show telugu as good as telugu. (I never Tried, but from that bug info and screen shots provided I believed)
You can try it.
i have a good idea to develop a telugu to english dictionary like word web dictionary....
i am working on telugu for past six months can i contact u
Post a Comment
<< Home