X
Business

Linguistic balkanization and the Internet

A recent UN summit lamented the fact that domain names use only ASCII characters. Though this is not a problem for English speakers, given that they don't use accented characters, much less characters from other languages such as Russian, Chinese, or any language that uses a "non-Roman" alphabet, it galls those who speak languages that aren't English...
Written by John Carroll, Contributor

A recent UN summit lamented the fact that domain names use only ASCII characters. Though this is not a problem for English speakers, given that they don't use accented characters, much less characters from other languages such as Russian, Chinese, or any language that uses a "non-Roman" alphabet, it galls those who speak languages that aren't English...

...or at least, it galls attendees to UN-inspired conferences. Granted, I am a native English speaker, and the fact that domain names only use non-accented Roman letters affects me less. On the other hand, has anyone really reflected on the fact that linguistic differences carry huge costs, both economic and social?

Computer programming is a truly international discipline, as you can't be a very effective programmer if you don't understand a fair bit of English. This is what enabled me to work in Switzerland alongside people whose native language was Russian, Italian, German, Spanish, Polish, Chinese, Tegalu (an Indian language) and French. All computer languages are in English. In C#, you don't have a "pourchaque" that matches the functionality of the "foreach" loop. Most documentation is released first - and often only - in English. Most of the leading software companies are American, leading to a bias towards English. That bias is less of a "bias" when one considers that English is well-entrenched as the international trade language. When you have international conferences in Singapore, for instance, they aren't conducted in Chinese.  Most VC firms assume you can pitch your idea in English.

Localization of computer applications is extremely important, as it is what makes a computer program or web site most usable to end-users (I NEVER write a site or product without localization in mind). I'll never pretend, however, that localization is easy to do, or that it is "cost free." I also make sure that the first supported language besides the local market language is English. That's not just American bias. If I want to get something out fast, the language that has the widest utility to the most people is English, because more people in the world understand that (at a minimum) than any other language.

Admitting the importance of localization, however, does not mean the protocols we use to communicate between internet nodes should be something that is localizable as well. Right now, every person on this earth understands how to write email addresses, and access a URL. It is a common system that is well understood and easy to use.

If we start to make it so that domain names and email addresses are "localizable," however, we start to create communication issues. First, do note that URLs aren't just used by humans. Programs regularly use URLs as an address. That English-language orientation in software development has proven very useful, and that utility extends to a consistent and simple addressing scheme exemplified in ASCII-character domain names.

Second, most Americans, or French, or Poles, or Russians won't be able to access a Chinese site if it was done in Chinese characters. Granted, that assumes people understand the site once they are there, but as noted a) most sites that aim at a global audience support AT LEAST English, and b) reading a foreign language site is a lot easier to do if you know how to access it. It seems to me that you are less likely to bother trying to understand a foreign language you run across on the Internet if you can't find it in the first place (most westerners wouldn't have a clue how to use their keyboards to compose the characters of a Chinese URL).

Bottom line, the "concern" over the ASCII nature of URLs seems more designed as a means to appeal to populist sensibilities than a concern over something that really matters to regular people. The French government, as an example, is much more concerned about the use of English word by French citizens than French people themselves, who spice their French with "francophied" versions of English words as liberally as Texans sprinkle Tabasco sauce on their breakfast food.

People do need to get used to the fact that, as globalization makes cross-cultural contact ever more common, the impetus behind a common language increases exponentially. This is not necessarily a bad thing when one considers the human costs of linguistic difference. People have a lot more of an ability to resolve common differences if they can speak to each other at some basic level.

At the micro-level, however, ASCII URLs are just a simple expression of a need for cross-cultural communication in the digital world, one that doesn't preserve the balkanization resulting from differences in spoken and written languages. ASCII-language URLs are a good thing, IMO, just as English-oriented programming languages are a good thing. It is part of the market fabric that makes computer programming - and computing in general - one of the most global-oriented markets in existence.

Editorial standards