Creating documents in local languages with ease is now well established. Use of free software for this is still only rare in e-Governance . However, localisation in e-Governance requires much more than that. It requires fluency in use of local languages in mass usage . For this to happen, the localisation components will have to snugly fit into the government system, in particular into the currently used database and public interface systems. Here, localised free/opensource has tremendous advantage which has the necessary freedom to adapt without costing a bomb. The national asset of software talent is of little worth if it cannot or does not take on this challange.
Background: Over ripe
What has happened over the last few years was laying the ground for breaking the language barrier in the digital world. National and international state initiatives, voluntary movements, academic research, individual zeal and commercial interests have all propelled it.
Often debates were reduced to national pride versus multi-national conspiracy theories if not cacophony. The element of truth on both sides was getting clouded due to immaturity of technology. At this juncture the Free/GNU platform started making strides on software front in many ways but a little slowly on Indian languages. But this slowness was steadily towards universal standards and hence almost deliberate. GNU/Linux community then adopted unicode standard after a deep thought and could do so, easily.
The technological springboard
Over last few years, efforts like those of corporations (most notably IBM) have worked towards making many localisations technically and commercially feasible. These include establishing locale specifications i.e culturally specific invariants for a given language/nation/script, harping on unicode standard, supporting GNU/Linux vendors and supporting GNU/Linux as platform for many of its applications, ICU, ECLIPSE etc. For many young minds seeking career in software, ‘Linux’ started appearing as an alternative after all. Sun Microsystems (along with support from IBM again) towards OpenOffice.org made office users see an alternative. Similarly support by many other international corporations have made ‘Linux’ a respectable name in corporate and government parlance. This the undersigned sees as the springboard for a quantum jump.
As a passing observation, one is compelled to remind that in all this emphasis on ‘Linux’ and ‘Open Source’, one misses the ‘free’ (as in freedom) element of the GNU/Linux movement. But that remains the challenge for ethically inspired proponents of free software like the undersigned.
Free software in localisation
Localisation in general, and that in Indian language in particular, poses the issue of free software frontally. If the mass of Indian population has to access benefits of IT revolution, the proprietor tags and associated price tags will prove to be the insurmountable obstacles. Freedom will be necessary condition to reach out to people. The economic model that will make this viable and financial muscle, that government may provide would provide the sufficient condition. The government policy will play a major role in this.
Till recently, the undersigned was convening the localisation effort through a non-profit, un-organised group of trainee-volunteers called Indictrans team. What Indictrans team has done over the last one year is the emphasis on deploying the free software in e-Governance and to a slight extent in education and rural context.
Principally our aim was to identify stumbling blocks in adoption of the free localised open source software in e-Governance. More importantly we looked at the difficulties in government-citizen interface. Team looked at the problems in adopting unicode standard in applications developed for government functioning and much more.
This may be seen in contrast to many others who started much earlier than us and have been doing the yeoman’s service to indic localisation. They are completing the important tasks of localising GUI’s like GNOME/KDE or some applications. See www.indlinux.org site for more details of language teams. While users who would want a complete GUI in local language are growing, they will significantly impact the use of software only after a few years. There are teams working on some even more ambitious projects like making machine translations from one Indian language to another etc. The machine translation, we expect will mature only after a few years, i.e. after massive corpa are analysed and lexicons built. These tasks require much greater resources, skills and much deeper commitment. Many enthusiasts have been working on these projects with frugal resources. Indictrans team always acknowledged that we stood on the shoulders of these teams.
Following is a brief description of two major tasks accomplished by Indictrans team in the area of localisation. First was standardisation i.e. conversion of live data and file-journey-management database from non-unicode to unicode standard. The second was Voterlist search engine for Chief Electoral Officer of Maharashtra.
Standardisation: Conversion to unicode
When we started in August 2003, we decided to choose government offices as the starting point for implementation of our technology as we saw a lot of potential in interacting with the community through this channel. The biggest obstacle that we could see was the huge silos of data already created in older technology. This was preventing some bureaucrats who were positively inclined towards use of Free/Open Source solutions in their offices like directorate of IT Maharashtra, in adopting the solutions.
We therefore decided to first provide solution for converting this legacy data into new open standard i.e. Unicode. The Department of IT Maharashtra already had a system in place called DJMS (Document Journey Management System). It was a browser-based system supported by documents and a database of metadata for these documents. This application was used for transparently tracking each and every document as it passes from one table to another. Their major problem was making this available in local language i.e. Marathi in an affordable manner. They also wanted a standard solution to manage Marathi content at administrative level (i.e. Database level). They were using ISM/ISFOC as solution for Marathi. Unicode sounded perfect solution for them as it offered minimal reengineering at administration level and affordable alternative at user level.
We undertook the job converting the documents from ISFOC to Unicode using a converter developed by us. The files were of various formats like doc,xls,rtf ,html. There were about 6000+ files. We also converted the database, which was based on DB2 so that the documents could be searched in Marathi using the existing application. DIT now uses open source applications like Openoffice.org and Mozilla across different platforms. All this was completed in less than six months.
Voterlist search implementation
In May 2004, while, we had several technologies and tools, we were looking for some challenge where these tools could be demonstrated in mass-deployment. Just then we saw a huge hue and cry in the press and electronic media about missing names in voterlist during the Lok Sabha elections concluded in April. We announced our intention to use unicode standard to overcome the problem (see Indian Express Marathi publication dated 9 May 2004). Maharashtra state assembly elections were due in a few months (October 2004).
We, proactively, and without much knowledge of the state of art in electoral roll computerisation, approached the State election authorities. Naturally, the election authorities were skeptical, but open and positively inclined to explore. We had the technology to break the language barriers in the long chain of the process of voterlist making and could make the access to the list truly universal. In voterlist search engine implementation, in effect, we broke the language barriers in a mass deployment application.
The original data was in ascii standard. We made the data available in unicode standard, searchable using modern tools and rdbms (pgsql, tsearch2). We made display available in unicode as also in non-unicode. Thus, we allowed access to latest as well as oldest OS. We provided interaction across the net. All this while fully working with free/open source. We must admit that we used ‘dynamic font’ technology as a last resort to support computers where the user may not even wish to install the free ascii based devanagari fonts on MSWindows legacy systems.
We demonstrated that whether on desktop or on the web, whether on new (unicode compliant) or old, whether a rich or a poor PC holder, (refer to licensing costs) there is no barrier to work in Indian language. We provided a smooth transcription from English to Indic language and vice versa. We have also demonstrated the interoperability i.e unicode and non-unicode (whether ISCII standard or non-standard) content can be seen by each other and interaction is possible across the divide.
While our software was put up on the web as free software under GPL, the implementation was done for the Chief Election Officer (CEO) of Maharashtra by an agency already in rate-contract with the Maharashtra Government.
The software has been accepted by CEO (Maharashtra) as part of Election Commission, after serious testing by C-DAC Pune. The coverage was for about a 1.25 crore (12.5 million) voters. This implementation is seen as a pilot in Mumbai and Thane. For more details see voter list page on www.indictrans.org.
Components of localisation: Felt needs
Apart from the voterlist search program, we came out with the following broad range of tools/solutions towards localisation. They are also our perception of the felt needs in the direction of deployment of localisation. These are also candidate tools for componentisation so that pluggable modules can be reused.
- System level localisation: OS (Locale specification and translations of GNOME) messages in Marathi and Gujarati. This work was halted at Indictrans in the hope that C-DAC project will take it up.
- Font: Created and maintained fonts for Devanagari and Gujarati (Unicode opentype, available on TDIL, performance on all platforms displayed on www.indictrans.org).
- Conversion: Text conversions from legacy font-encoding to unicode to make way for open source to be used.
- Undertaken a major job for Government of Maharashtra (see letter of appreciation www.indictrans.org http://www.indictrans.org/src/letter_of_appreciation.jpg) This includes HTML and other formats
- Recently converted HTML files of IGNCA to Unicode (as per suggestion at Pune Localisation Review meet organised by TDIL)
- Conversion of sample of land record data of Bhoomi project for NIC Karnataka,
- Similarly for Rajasthan Raj Corporation
- Conversion of MarathiWorld.com site (1800 pages from Mithi font to unicode)
- Conversions from Akruti font to uniocde for IIT
- Inputbhaaratii: Several applications for inputting Indic on web were developed. Software implementation of inscript on Java (without the need for a plug-in like ISM), as also IIT Mumbai’s KeyLekh layout (see http://www.indictrans.org/typebhaaratii/) Thus, anyone can use web based Indian language without the server having a commercial plug in. And that is saved in Indian languages in unicode.
- Naamabharatii (name transliteration see http://www.indictrans.org/naamabhaaratii/): Many offices/lists have Indian names in (usual English and need to be converted to Indian language before they can adopt Indian language systems. There was no solution on open source. The need was encountered in government of Maharashtra as also LIC etc. A programme using dictionary+hueristic has been developed and is available on Indictrans site for demo.
- Application localisation: Horde, a messaging framework has been translated into Marathi.
- Developemnt of bootable CD for localised GNU/Linux: Gnubhaaratii see tutorials on the site in Hindi, Gujarati, Marathi and English.
- Localised GIS: Grass, a free GIS map being used to put time series data of state election commission on Panchayat wise map of Maharashtra, a voluntary offer. This involves using pgsql database linked to mapping grass programme. Clickable maps for any geo-referenced or geographical database can be created.
- Localised software for rural development: As a consultant to Tata Institute of Social Science, Mumbai for their rural campus (500 Km south, called Tuljapur), a series of localised applications were developed for induction into the development and curricular (B.A. in Social Work) work of TISS. The setting up of network and localised OS Gnubhaaratii with localised applications (like Open office and Mozilla) was undertaken and completed along with training.
- Localised geometry: DrGeo is a programme on interactive geometry. It has been localised with tutorials in Hindi, Marathi, Gujarati.
- CBSE IT curriculum: We have attempted partially to enable implementation of CBSE curriculum in Indian language using open office and other free/open source software on GNU/Linux or proprietary platform.
- Localised editor: Yudit has been localised in Hindi , Marathi and Gujarati
- Team building: Motivating and cultivating a team of science and other graduates to work on localisation with the introduction of GNU/Linux and open standards like Unicode in e-Governance, the nature of e-Governance applications have changed a lot.It is now more open for participation by users and more obvious to operate as against an imposed responsibility from management. This openness also fits nicely into the requirements to fulfil the guidelines mentioned in recently introduced ‘right-to-information act’.
IT revolution has many aspects and therefore it is essential to look at one of the major aspect of total cost of ownership for the nation as a whole and see beyond the bureacratic barriers. With that perspective free/localised software is the obvious, feasible and perhaps the only choice for mass of people to be benefited from the IT revolution. If not the digital divide will yawn further.