Pan Localisation regional initiative:Developing local language computing

Article

Pan Localisation regional initiative:Developing local language computing By Elets News Network 01-June-2004

Information has now become such an integral part of our society, that its access is considered as a basic human right. This is because development of rural and urban developing populations is getting increasingly dependent upon access to information. This is specifically applicable to Asia which houses the largest developing population. ICTs, including the Internet, is the largest repository of this information. And, though Asians have become the largest group of Internet users since 2001, these users still form only about 4.5 % of the total Asian population. This shows that there is enormous potential for Internet usage in Asia.

However, in addition to being most populous, Asia is also the most culturally and linguistically diverse region of the world. There are 2197 languages spoken in Asia, which is the largest number of languages spoken in any one region. Only about 20% of these people can communicate in English. This makes English language content available on ICTs inaccessible to a large majority of Asians. This particularly affects those living in the rural areas of developing countries in Asia.

Investments have been put into developing ICT infrastructures in Asia. Nevertheless, the persisting digital divide attests that the current path towards providing connectivity and technology infrastructure alone would not enable the majority of Asian populations to benefit from the present information availability. There are multiple problems perpetuating this divide. One obvious reason is that these populations cannot circumvent the obstacle of English language content. Unless these large non-English speaking populations have the ability to generate and access content in their native languages, they will not be able to use ICTs for their development effectively.

Enabling ICTs in the local language of the user is known as “localisation”. Specifically, it is enabling computing experience in linguistic culture of the user. Linguistic culture is not just limited to the language but how the language is used by the environment of the user. Thus, for Punjabi speakers in India, the computer should display the language in Gurmukhi script and for Punjabi speakers in Pakistan, the same language should be displayed in Arabic script.

Localisation of ICTs requires definition and implementation of standards. These standards include character set encoding, keyboard (and keypad) layout, collation/sorting sequence, locale and ICT terminology. In addition to definition of standards, applications also need to be developed for local language computing to support access and generation of local language content. There is a large variety of applications required, some being more fundamental in nature, while others are more advanced and complex but equally vital for end users. These applications include fonts, lexicon, thesaurus, spell checker, grammar checker, text-to-speech system, speech recognition system, machine translation system and optical character recognition system.

Survey of state of localisation in Asia

Localisation and development of applications is only starting for many Asian languages. The reasons have been lack of commercial incentives (as these markets do not promise of large financial returns for software vendors) and the complexity of the local Asian languages. A survey was conducted during a recent training on “Fundamentals of Local Language Computing” held as part of the PAN Localisation project (details presented later) at Lahore, Pakistan, in January 2004. Localisation experts and developers from 13 different countries participated in this training and provided the data collated in the following tables .

Before any content can be generated or any application is developed, some basic standards for encoding the language must be developed. These include character set encoding (e.g. Unicode), keyboard layout, key pad layout (e.g. for mobile telephones), collation sequence (to enable applications like databases), terminology translation and locale definition (to enable computer interface in local language). The survey responses are tabulated in Table 1.

This survey is limited to only the countries from which representatives attended PAN Localisation Project training (see the “Training” link at www.PANL10n.net). The data was provided by the training participants, not independently verified, therefore some variation may exist from the responses received. The data is still representative of the bigger picture for Asian region.

The responses indicate that the encoding and keyboard layouts are standardised for most languages. This would allow devel ping basic desktop publishing capability, and has been achieved through national and international efforts, e.g. organisation like Unicode Consortium (www.unicode.org). However, much work needs to be done to define other standards needed to further process the data. For example, collation sequences for the languages have to be defined to enable applications which sort linguistic data, like voter lists, etc. Based on the standards, the applications may be developed on Microsoft or Linux platforms, two most popular end-user desktop operating systems. The survey also tried to determine the level of application support on these two platforms. The questions were divided into two categories of applications: basic applications which realise the standards and allow basic desktop publishing for the end-user, and advanced applications used to assist user to generate and access content in local languages.

The basic applications include utilities which enable to realise the encoding standard (Keyboard and Fonts), sort and search data (Collation and Find/Replace utilities) and allow basic word processing facilities, like spelling checker, thesaurus and Natural Language Processing component (e.g. Word/Line Break Determiner for languages like Lao and Khmer, Bidirectional Algorithms for Arabic script based languages like Urdu and Farsi, etc.). The responses for

Microsoft platform are tabulated in Table 2 and for Linux platform are given in Table 3.

As the responses indicate, there is currently more support on Microsoft platform for keyboard and fonts to do basic local language data processing. Linux is catching up as more solutions are being developed but does not provide the same level of support at this time. It should be noted that all the support indicated for Microsoft platform is not developed by Microsoft and is sometimes developed by third parties.

Additional utilities, out of which collation is perhaps most necessary for data processing (NLP is also significant for some languages), are still missing for most of the languages and much work needs to be done to fill this gap. Japanese and Thai are ahead of all other languages surveyed.

For wider access to content to literate and illiterate non-English speaking population of Asia and for quicker content generation, some advance applications must be developed. Automatic Machine Translation systems can provide instant access to existing English data on the Internet. Text-to-speech systems can provide access to illiterate populations. Automatic speech recognition can help create local language and culturally meaningful content quickly and similarly optical character recognition system can help convert published material into electronic content for exchange. These applications can be instrumental in bridging the digital divide. The status of these applications for Asian languages is given in Table 4 (for Microsoft platform) and Table 5 (for Linux platform).

As can be seen from the responses of the survey, though many initiatives are underway for various languages, there is hardly any significant development completed in this area. Only Japanese language applications are currently available. For many of the languages, there is not even efforts which have started looking in these areas. Most of the times research and development in these areas are guided by policies. Relevant policies which would address local language computing consist of linguistic policies of the country, their ICT policies and specifically their localisation policies. The survey also included questions regarding existence of such policies. The responses are tabulated in Table 6.

Interestingly though few countries have localisation policy, many are working toward developing one. Most countries also have Linguistic and IT policies, but they may or may not drive localisation policy. This needs to be further investigated.

As the survey indicates, localisation initiative has not been either rigorously taken up or pursued consistently in many Asian countries. To address these issues, a regional effort to develop local language computing capacity for Asia has been taken up by the International Development Research Centre (IDRC) through its Pan Asia Networking (PAN) programme, in collaboration withNational University of Computer and Emerging Sciences (NUCES) through its Centre for Research in Urdu Language Processing (CRULP), and is called PAN Localisation project.

PAN Localisation project

PAN Localisation project focuses on documenting the problems and researching the solutions to enable localisation of ICTs. This project is unique as it will be the first study of its kind which looks at the common problems faced by Asian region and research into a comprehensive solution. It is thus a timely and an urgently needed initiative for Asian underdeveloped populations and will be instrumental towards providing an equitable access to information is this digitally divided information society.

The core objectives of PAN Localisation project are to research into the following three fundamental dimensions of localisation for Asian languages:

To develop sustainable human resource capacity in the Asian region for R and D in local language technology
To raise current levels of technological support for Asian languages
To advance policy for local language content creation and access across Asia for development In this project, CRULP is coordinating efforts across Asia within ICT researchers, practitioners, linguists and policy makers from the governmental agencies, universities and private sector of six countries of Asia including:
Bangladesh: BRAC University, working for Bangla
Bhutan: Department of IT, Ministry of Information and Communications, working for Dzongkha

Be a part of Elets Collaborative Initiatives. Join Us for Upcoming Events and explore business opportunities. Like us on Facebook , connect with us on LinkedIn and follow us on Twitter, Instagram.

Tags: Asia data processing desktop publishing FEATURES International Development Research Centre Lahore Linux Local language computing Microsoft Natural Language Processing Pakistan Speech recognition still representative Text-to-Speech Text-to-speech s

Pan Localisation regional initiative:Developing local language computing

Article

Uttar Pradesh Accelerating Green Mobility

The global automotive industry is undergoing a transformative shift towards electric mobility, prompted by concerns over...

By Elets News Network 15-04-2024

Article

Uttar Pradesh Leads Pharma Innovation

India's pharmaceutical sector is on an impressive growth path, aiming to transform global access to affordable medicines...

By Elets News Network 15-04-2024

Article

Embracing Uttar Pradesh's Tourism Potential for Global Leadership

While India perennially entices millions of tourists with its myriad attractions, the recent G20 Summit in 2023 showcas...

By Elets News Network 15-04-2024

Article

One District One Product Transforming Uttar Pradesh's Economic Landscape

In a bid to revitalize its economic landscape and empower local communities, Uttar Pradesh brought a transformative poli...

By Elets News Network 13-04-2024

Article

Logistical Brilliance Uttar Pradesh's Warehousing Infrastructure

Uttar Pradesh has immense potential in the warehousing and logistics sector being a pivotal transit point connecting var...

By Elets News Network 13-04-2024

Article

Uttar Pradesh's Thrust Towards Sustainable Development

Uttar Pradesh, India's most populous state, is experiencing a transformative wave propelled by Corporate Social Responsi...

By Elets News Network 13-04-2024

Article

Balancing the Power Equation: A Holistic Approach to Energy

In the pursuit of a sustainable and reliable energy future, the state of Haryana has emerged as a leader of progress, me...

By Elets News Network 12-04-2024

Article

Harnessing Renewable Energy for Sustainable Development in Haryana

Energy is the lifeblood of modern development, and its efficient utilization is pivotal for sustainable progress. Haryan...

By Elets News Network 11-04-2024

Article

Uttar Pradesh - Where Industries Grow Investments Soar

Uttar Pradesh (UP) has experienced significant industrial growth in recent years. The state has a robust industrial infr...

By Elets News Network 10-04-2024

Article

Uttar Pradesh Metro Redefining Urban Commuting

Urban mobility in Uttar Pradesh is transcending newer heights. In a pivotal move, the Agra Metro Rail Project, inaugurat...

By Elets News Network 10-04-2024

35 IPS transferred in Gujarat, Anupam Singh Gehlot made Commissioner Surat, J R Mothaliya made IG Ahmedabad Range

Centre Empanels IAS & IPS Officers for Key Administrative Posts

Andhra Pradesh: P. Raja Babu made MD Skill Development Corporation, B.R. Ambedkar named Director Mid Day Meals

Odisha: Arindam Dakua made Collector Cuttack and Anupam Saha appointed Collector Jagatsinghpur

West Bengal: Dibyendu Das made Additional Chief Electoral Officer, Amit Roy Chaudhury appointed Special Secretary Food and Supplies

Latest News

Centre: Saurabh Garg appointed Secretary, Ministry of Statistics

35 IPS transferred in Gujarat, Anupam Singh Gehlot made Commissioner Surat, J R Mothaliya made IG Ahmedabad Range

Odisha: Vineet Bhardwaj made Mission Director of Odisha Livelihoods Mission

Vini Mahajan given additional charge of Secy Environment, Rajesh Kumar Singh assigned additional charge of Secy Textiles

PSU appointment vacancies notified: Check List

Centre Empanels IAS & IPS Officers for Key Administrative Posts

Pan Localisation regional initiative:Developing local language computing By Elets News Network 01-June-2004

Related Article

Uttar Pradesh Accelerating Green Mobility

Uttar Pradesh Leads Pharma Innovation

Embracing Uttar Pradesh's Tourism Potential for Global Leadership

One District One Product Transforming Uttar Pradesh's Economic Landscape

Logistical Brilliance Uttar Pradesh's Warehousing Infrastructure

Uttar Pradesh's Thrust Towards Sustainable Development

Balancing the Power Equation: A Holistic Approach to Energy

Harnessing Renewable Energy for Sustainable Development in Haryana

Uttar Pradesh - Where Industries Grow Investments Soar

Uttar Pradesh Metro Redefining Urban Commuting

Rajasthan striking a balance between increasing revenue and ensuring road safety

Haryana Strides Towards Sustainable Energy Future

Driving Atmanirbhar Bharat -TCIL's Proactive Measures for Domestic Technology Promotion

Elets IndiaAI Summit, Bengaluru

Elets National EV Summit 2024