July 2007

Using Indian Languages on Computers

Views: 101

“Given the fact that provision of e-Governance services to broad-based citizens are likely to pose many other challenges which we are all familiar with, localisation can help to take a significant distance in provision of  e-Governance services”. Says,  Dr. Srinivasan Ramakrishnan (ramki@cdacindia.com), Director General, Centre for Development of Advanced Computing (CDAC) Pune.

Centre for Development of Advanced Computing (CDAC) is a pioneer in developing and proliferating the use of Indian languages on computers. What is the broad vision and goals of CDAC’s technology mission?

The broad vision and goals of C-DAC’s technology mission is what we call “nurturing living languages and dissolving language barriers”. This is particularly relevant in an inheritantly multilingual country like India where we are in a unique position of having 22 professionally recognised languages.  Its broad goals include:

  • Seamless interaction with computers by the common man and needed tools for common man developers.
  • Setting language standards
  • Contributing to linguistic tools and lexicon resources for building spellcheckers, grammar checkers, thesauri, ontology, etc.
  • World class research in the area of Indian languages, semantic web and semantic search.
  • Multimodal interactions such as optical character recognition, handwriting character recognition, speech recognition, text-to-speech
  • Solutions in the area of Indian language machine assisted translation systems, cross lingual information retrieval, summariser
  • Frameworks which enable support to application software developers (such as those in e-Governance domain), ISVs (an independent software vendor (ISV) is a business term for companies specialising in making or selling softwares, especialy for niche markets), in a number of contexts and platforms including evolving worldwide web standards to support localisation.

How important is localisation in furthering e-Governance, especially in developing countries?

To the extent localisation, especially in developing countries, refers to the context of access to services which they can relate to easily, it brings into picture attributes of user interface in local languages. User interface is something, which most of the people are likely to be familiar with and hence, there is a removal of an additional layer of barrier in provision of e-Governance services. Given the fact that provision of e-Governance services to broad-based citizens are likely to pose many other challenges which we are all familiar with, localisation can help to take a significant distance in provision of e-Governance services.

In many situations, localisation is necessary for successful implementation of  e-Governance in any country.  It makes it easy for a commoner with minimal education in the rural areas to obtain services through computers and internet.  In India, where over 90% populace are non-English speaking, localisation brings added advantage by which people can fill up forms, get information of government schemes and do interactions of day to day relevance readily. Localisation also removes fear or mental block in usage of such systems cutting across age groups. I would definitely say that for any e-Governance project to be successful, the choice of interface and interactions in local language is absolutely essential.  Besides, we can also think of multi-modal interactions in local languages in the coming years, making access to service even more friendly.

What are the different multilingual technologies CDAC is adopting for localisation?

As I mentioned earlier, C-DAC has over the last 18 years started work in the area of Indian Language Computing technologies. Technology standards and products started from Initial Ionic State Configuration Interaction (IISCI) to Unicode, from linguistic tools, lexical resources and Corpora for building spellcheckers, grammar checkers, thesauri in all the 22 official languages, productivity tools, domain specific dictionaries, synonym dictionaries, i-Plugin, GIST SDK, transliteration tools, database translation utilities, bulk HTML converters.  There have been many multilingual technologies  C-DAC has been adopting.

The process/tools used for localisation depends upon the application software to be localised, environment in which it is working (operating system) and whether it is adhering to standards or not. Besides, localisation can mean going beyond translation of the user interface to include date, time, currency, calendar and such attributes of local context/culture. C-DAC has developed Indian language technologies addressing all these issues.

In fact, hard coded localisable strings pose great difficulty, since customised tools need to be developed for each of them and that too for specific application and operating system.  At times these had also been necessary. C-DAC has also been instrumental in developing a localisation framework which facilitates semi-automatic conversion of Website to Indian languages. The localisation framework developed is based on a translation memory approach.

In case of localisation of Free and Open Source Software such as BharateeyaOO, browsers, email client, multi-protocol messenger, etc. standard tools available in the free and open source domain are being used.

How important is standards in local language computing?

Standards are of utmost importance. Standards are like traffic rules; whether one likes it or not one need to follow the same, otherwise troubles abound. Non compliance to the standards can pose serious threat for data compatibility, portability across platforms. Huge amount of resources may have to be deployed for conversion of legacy data to adapt to the standards.

India being multilingual country with 22 scheduled languages plus English as associate language it is important (rather should be made mandatory) to follow standards. Adherence to standards also makes life simple as in case of Indian language searches and cross lingual information retrieval, semantic web and so on.

India being multilingual country with 22 scheduled languages plus English as associate language it is important (rather should be made mandatory) to follow standards. Adherence to standards also makes life simple as in case of Indian language searches and cross lingual information retrieval, semantic web and so on. Standards also open up opportunity on Global Business fronts. Major standard adherence by state governments may include inputting methods, storage standards, font standards, lexicons, and most importantly the standards for the W3C.

How developed is India, as compared to other countries in Asia, in local language technology?

As far as local language computing is concerned, there are two goals – first, to enable support to local languages in systems and devices and secondly work in the area of Natural language  processing, speech technologies, image processing and soon.

As compared to other countries local language technology, especially the enabling one has  been satisfactory (with few exceptions), while on the natural language processing and other  areas there is a lot to do. Unlike other countries India has a problem through the compounding  effect of 22 languages; and has posed serious technological challenges. So, as far  as development in this area goes, it is little below the satisfaction. This is just because of complexities in the given domain.

Please tell us some of your future plans in advancing the localisation programme in CDAC?

C-DAC has already started work on localisation of some basic tools which are required on day  to day basis by common man (this includes the Government working staff as well) in all the  22 scheduled languages. They include Open Office suite, browsers, email client, multi- protocol messengers, content management system, GIS packages and so on. For 10 Indian  languages (Tamil, Hindi, Telugu, Marathi, Malayalam, Oriya, Assamese, Urdu, Kannada &  Punjabi) the work has been completed while the work for the rest of the languages is scheduled  to be completed by the end of this year. These will provide a sound foundation for  comprehensive localisation of e-Governance applications in all languages in the country.

On-going work as indicated earlier will help address emerging technologies as well in the coming years.

Comments

comments

Click to comment

Leave a Reply

Your email address will not be published.

Latest News

To Top