Focus on regional languages grows as W3C India gets its own office

India's growing focus on Internet services being provided in regional languages got a major push with W3C India getting its own space, the government releasing a Text to Speech and OCR software, for regional languages and Twitter unveiling their Hindi website.
Written by Manan Kakkar, Contributor

Back in 1949, Hindi was adopted as the official language of India on the 14th of September. The day has ever since been celebrated as Hindi Diwas (diwas=day). To mark the occasion this year, the week saw several key announcements around development and data creation in regional languages.

The President of India in her speech called for more projects like Shrutlekhan and MANTRA. Shrutlekhan, is a Hindi speech recognition software developed by IBM and the Center for Development of Advanced Computing (C-DAC). MANTRA is a translation tool used by government organizations and was developed by an arm of C-DAC.

Popular micro-blogging and social networking website, Twitter, announced the Hindi version of their website. As Anupam Saxena at Medianama points out, it is a half-baked implementation. However, one of the possible reasons could be the availability of tools like Microsoft’s Indic Input tool.

The two concrete announcements on the topic were from Minister of State for Communications and Information Technology, Sachin Pilot. As part of The Department of Information Technology’s (DIT) program Technology Development for Indian Language (TDIL), Sachin Pilot announced:

  • Text to Speech software for 6 Indian languages (Hindi, Marathi, Bangla, Telugu, Tamil & Malayalam)
  • Online Optical Character Recognition software for 2 languages (Hindi and Punjabi)

The Text to Speech software is available only via request and works on Windows and Linux. The OCR software can be accessed on the TDIL website. You need to register to try the OCR functionality and unfortunately my attempt at uploading and getting a result failed.

The second key announcement was the World Wide Web Consortium (W3C) getting a permanent address in India. The World Wide Web Consortium (W3C) is in-charge of defining web standards that ensure websites and browsers can understand each other. Headed by Tim-Berners Lee, the organization has 19 offices in 20 countries. (Germany & Austria have a common office in Berlin.) W3C India has been operating since May 2010 but under the TDIL program. Focusing on a “regional web,” W3C India is now an entity under the DIT.

W3C India will be working closely with government organizations on e-governance projects and came up with a best practices paper on the topic. The comprehensive document (PDF link) covers a wide variety of topics including web technologies, mobile devices and fonts. India’s e-governance initiatives have received international interest with countries looking to India for guidance on similar projects.

The diversity in languages across the Indian landscape is one of the key challenges in taking Internet to the masses. While the state governments have been working on several e-governance platforms for the citizens, making information available in regional languages to enable efficient use of these services is as much a last-mile challenge as Internet infrastructure in India. The central government and corporate India are taking steps to improve infrastructure and drive Internet adoption in the country.

As the Wall Street Journal points out, Hindi might be India's social web language.

Editorial standards