Linguistic balkanization and the Internet

Linguistic balkanization and the Internet

Summary: A recent UN summit lamented the fact that domain names use only ASCII characters. Though this is not a problem for English speakers, given that they don't use accented characters, much less characters from other languages such as Russian, Chinese, or any language that uses a "non-Roman" alphabet, it galls those who speak languages that aren't English...

SHARE:
TOPICS: Browser
57

A recent UN summit lamented the fact that domain names use only ASCII characters. Though this is not a problem for English speakers, given that they don't use accented characters, much less characters from other languages such as Russian, Chinese, or any language that uses a "non-Roman" alphabet, it galls those who speak languages that aren't English...

...or at least, it galls attendees to UN-inspired conferences. Granted, I am a native English speaker, and the fact that domain names only use non-accented Roman letters affects me less. On the other hand, has anyone really reflected on the fact that linguistic differences carry huge costs, both economic and social?

Computer programming is a truly international discipline, as you can't be a very effective programmer if you don't understand a fair bit of English. This is what enabled me to work in Switzerland alongside people whose native language was Russian, Italian, German, Spanish, Polish, Chinese, Tegalu (an Indian language) and French. All computer languages are in English. In C#, you don't have a "pourchaque" that matches the functionality of the "foreach" loop. Most documentation is released first - and often only - in English. Most of the leading software companies are American, leading to a bias towards English. That bias is less of a "bias" when one considers that English is well-entrenched as the international trade language. When you have international conferences in Singapore, for instance, they aren't conducted in Chinese.  Most VC firms assume you can pitch your idea in English.

Localization of computer applications is extremely important, as it is what makes a computer program or web site most usable to end-users (I NEVER write a site or product without localization in mind). I'll never pretend, however, that localization is easy to do, or that it is "cost free." I also make sure that the first supported language besides the local market language is English. That's not just American bias. If I want to get something out fast, the language that has the widest utility to the most people is English, because more people in the world understand that (at a minimum) than any other language.

Admitting the importance of localization, however, does not mean the protocols we use to communicate between internet nodes should be something that is localizable as well. Right now, every person on this earth understands how to write email addresses, and access a URL. It is a common system that is well understood and easy to use.

If we start to make it so that domain names and email addresses are "localizable," however, we start to create communication issues. First, do note that URLs aren't just used by humans. Programs regularly use URLs as an address. That English-language orientation in software development has proven very useful, and that utility extends to a consistent and simple addressing scheme exemplified in ASCII-character domain names.

Second, most Americans, or French, or Poles, or Russians won't be able to access a Chinese site if it was done in Chinese characters. Granted, that assumes people understand the site once they are there, but as noted a) most sites that aim at a global audience support AT LEAST English, and b) reading a foreign language site is a lot easier to do if you know how to access it. It seems to me that you are less likely to bother trying to understand a foreign language you run across on the Internet if you can't find it in the first place (most westerners wouldn't have a clue how to use their keyboards to compose the characters of a Chinese URL).

Bottom line, the "concern" over the ASCII nature of URLs seems more designed as a means to appeal to populist sensibilities than a concern over something that really matters to regular people. The French government, as an example, is much more concerned about the use of English word by French citizens than French people themselves, who spice their French with "francophied" versions of English words as liberally as Texans sprinkle Tabasco sauce on their breakfast food.

People do need to get used to the fact that, as globalization makes cross-cultural contact ever more common, the impetus behind a common language increases exponentially. This is not necessarily a bad thing when one considers the human costs of linguistic difference. People have a lot more of an ability to resolve common differences if they can speak to each other at some basic level.

At the micro-level, however, ASCII URLs are just a simple expression of a need for cross-cultural communication in the digital world, one that doesn't preserve the balkanization resulting from differences in spoken and written languages. ASCII-language URLs are a good thing, IMO, just as English-oriented programming languages are a good thing. It is part of the market fabric that makes computer programming - and computing in general - one of the most global-oriented markets in existence.

Topic: Browser

John Carroll

About John Carroll

John Carroll has delivered his opinion on ZDNet since the last millennium. Since May 2008, he is no longer a Microsoft employee. He is currently working at a unified messaging-related startup.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

57 comments
Log in or register to join the discussion
  • Writer Happens to be Rather Under-informed

    Contrary to the writer's assertions, native Hangul-character domain names have long been available to Koreans. Even such websites as "Google" can be typed in Korean, for instance, and the user will be taken to the appropriate Google page (likely to be Google's country-specific page for Korea, which comes courtesy of Google). That other language groups don't quite have it their way yet is perhaps a testament to the fact that Korea is the most wired nation in the world, where practically every household is connected to the Internet via broadband. Koreans, indeed, have been at the forefront of many innovations, including the push to enable Korean language domain names.
    Sirkyu
    • Fine...

      ...I've heard that there is some experimentation along those lines in China, but didn't know Korea had gone ahead and made Korean urls.

      What does that have to do with the other points made in this post?

      Yes, right now, if I go to ZDNET korea, I'm not going to understand what is being said. Machine translation just makes a humorous version of articles. I don't expect that to last, however.

      In future, when machine translation IS really good, having a universal structure for addressing things that ditches the ridiculous mess we have in terms of spoken (and written) communications in even a SMALL way will be considered a boon. Those sites that EVERYONE can access (which is, right now, ASCII url) will be the ones that get the most traffic.

      Like I said, korean language urls (or russian urls, or...) just ensure that NOBODY but koreans or russians ever access them. That's short term thinking, IMO.

      I'm well aware of Korea's advances in broadband. In fact, I've sung its praises in many blogs posts and articles on the subject.
      John Carroll
    • More input please

      I actually speak some Korean (not very well, but...) and I was wondering how the Hangul-character DNS interacts with the standard ASCII-based one (or does it?). Is this a complete separate system, or is there some sort of transliteration that goes on?

      My concerns are much the same as John's, but since there's no chance that English will displace all other languages any time soon (and I'm not at all certain that I want it to) some accomodation for non-English languages and non-Roman character sets will have to be made one way or another (by popular demand, if nothing else). The important thing is to do it in a way that preserves the existing universal access (just because I can read Hangul, doesn't mean I can type it on my US keyboard). If what S. Korea has done already can help, so much the better.
      John L. Ries
  • Not a prob, John

    Give it a rest. The only people who actually [b]type[/b] URLs are the ones the site targets. If someone in .ru wants to do the local equivalent of "borscht.com" I'm all for it -- it frees up the ASCII "borscht.com" for people who don't normally take an interest in Cyrillic character sets.

    Like me.

    In the meantime, people in St. Petersburg can stop trying to figure out how their local borscht deli got mangled from its [i]real[/i] name into some English abomination. And if you think Russian is going to be bad, you've never dealt with Semitic languages.

    Meanwhile, as far as 'puters are concerned it's just another file handle. I don't parse file handles, and I don't expect programs to -- they're tokens, and they [b]shouldn't[/b] be transparent. Basic software engineering "need to know" issue, that.
    Yagotta B. Kidding
    • Think longer term...

      ...when machine translation is a lot more bulletproof. A world where EVERYONE knows how to generate a URL...however mangled it might seem from a local standpoint...is going to seem a boon.

      I really don't want us to extend linguistic incompatibilities into the digital age. Heck, most peole use Indian numberals in phone numbers (which came to the west by way of Arabic mathmaticians). There is value in a consistent schema for communicating an address, and a URL is just another instance of that.

      [i]Meanwhile, as far as 'puters are concerned it's just another file handle. I don't parse file handles, and I don't expect programs to -- they're tokens, and they shouldn't be transparent. Basic software engineering "need to know" issue, that.[/i]

      Fair point. Maybe a middle ground should be that EVERY site has an ASCII URL, but in each country they have a mapping table using local characters that links a local name to an ASCII name. So, Russians might use the cyrillic version of "borscht.com" (a terribly popular website in Russia, Ihear), whereas the rest of the world would use the ASCII borscht.com.
      John Carroll
      • The stupid GRINGOS need to learn a few things about other languages and

        cultures. I think it would be just fine if Gringos had to figure out how to type a name in Chinese if they want to go to a Chinese only site. FireTruck em if they can't figure it out.

        TELL US AGAIN, WHY DID MICROSOFT HIRE YOU???
        DonnieBoy
        • Please move to China.

          And start your stupid ranting there. Now that would be a true comedy...
          No_Ax_to_Grind
        • or if they don't like it....

          they can invent something better. We don't owe anyone a goddamn thing.
          JoeMama_z
        • I speak french

          ...and am starting to learn spanish. It's not an issue of whether I can learn languages. I just can't learn ALL languages, and there is value in having a consistent and simple addressing scheme that EVERYONE can generate from their keyboards.

          Why did Microsoft hire me? Because I rock.
          John Carroll
          • OMG - I spit my iced tea on the monitor!

            "Why did Microsoft hire me? Because I rock."

            Totally, man, and you just proved it, too!
            Confused by religion
          • Obvious need

            [i]Why did Microsoft hire me? Because I rock.[/i]

            Well, Steve certainly needs dance lessons.
            Yagotta B. Kidding
          • There's yet another kettle of fish you didn't bother diving into...

            [b]...and am starting to learn spanish. It's not an issue of whether I can learn languages. I just can't learn ALL languages, and there is value in having a consistent and simple addressing scheme that EVERYONE can generate from their keyboards.[/b]

            There's also the no longer little matter of site spoofing and phishing. Having a simple, easy to use schema is important to keep the site spoofing and phishing at bay.

            Yeah... I can see it now... In order to be sensetive to non-english speaking peoples, we have to adopt a code page that adopts not only english but Russian, Japanese, Chinese and Arab characters. Yeah.. so now I need a new keyboard with 1000 odd keys to acommodate the extra characters, all operating systems will need to learn a new trick or two about being able to type in multiple directions on the fly, (Not all languages are written/read left to right) and anti-phishing software will need to learn how to distinguish the English C from the one used in the Cyrillic or Arabic alphabet.
            Wolfie2K3
      • That's one possibility

        Another would be to develop a standard way of transliterating non-Roman character sets and non-ASCII Roman letters (like the German ess-tzet or Romance accented vowels) into 7-bit ASCII. Wouldn't have to make a lot of sense phonetically (every written language has its own unique set of quirks, after all) as long as it's easy to translate everything back to ASCII.
        John L. Ries
  • It will be very easy to support 16 bit characters for domain names.

    For an all Chinese site, why not have a domain name in Chinese. English speaking people do not need to be able to go to that site. Same with Arabic, if the whole site is in Arabic, better to have an Arabic domain name. It is high time we supported other languages, and high time we gave control of the internet over to the UN.
    DonnieBoy
    • What ever gave you that stupid idea?

      "English speaking people do not need to be able to go to that site. Same with Arabic..."

      Maybe you missed the memo, can you say "World Market"?
      No_Ax_to_Grind
    • Give me one good reason....

      why we should give control over OUR internet to anyone else?
      JoeMama_z
    • Donnieboy, I have a friend just like you...

      And don't take this the wrong way, but he's infuriating to talk to. Gross oversimplification and complete disregard for "the real world". In a similar sense he makes these vast, sweeping "it would all just be better" statements regularly.

      Yes, wouldn't it be great if we could all just get along? Everyone just spoke their own language and lived in their own, isolated, quiet and peaceful corner of the world where the Americans didn't ruin everything for everyone and the UN just floated down on their winged unicorns and used their magic to cure all the world's ills. Sure, it would be great, but you're as cracked as the Liberty Bell if you think that idea has got a snowball's chance in hell.

      The author of the article is writing, as I think the UN and others are debating, whether to allow the sky to continue to be blue. Its always been blue, but some poeple think blue's too...American I guess. Perhaps we could change it to purple or some sort of aqua.

      The benefits of allowing language specific URLs are so vastly outweighed by the downfalls, that I can't see anybody jumping off the couch to start translating. Eventually, as he says, auto-translators will do the conversions for us.

      In the mean time, I think your argument that we (english speakers) need not see arabic or chinese sites is ludacris, insulting and increadibly small-minded. What if I want to read about the Xinhua News? What if I want to see how the rebuilding of the Baghdad museum is coming but from the horse's mouth? Is this none of my business because I don't speak the same language? Localizing and cutting off access in this way essentially takes away what makes the internet so universal. Having a universal language to access it should be encouraged, not attacked.

      I think the argument's moot anyway, if any advertisor or retailer wants the American buck, they HAVE to put the URL in english.
      banquo79
    • People might well want to access foreign language sites

      After all, there are lots of people who either know or are trying to learn multiple languages, but computer keyboards really don't make that easy to support (my US keyboard only supports the characters used in American English, but without the cent sign); certainly doesn't make it intuitive to type Spanish (though German's OK), much less Korean.
      John L. Ries
    • Fine...

      ...so have that a parallel system that is layered atop a consistent universal ASCII-based addressing system. That's fine, and acceptable. There is value, however, in having a single consistent addressing scheme that EVERYONE can generate from their keyboards, whether they are russian or chinese or french or speak Xhosa.
      John Carroll
    • Limited thinking will get you NO WHERE...

      [b]For an all Chinese site, why not have a domain name in Chinese. English speaking people do not need to be able to go to that site. Same with Arabic, if the whole site is in Arabic, better to have an Arabic domain name. It is high time we supported other languages, and high time we gave control of the internet over to the UN.[/b]

      Right. Be sure to remember that when you go to that ALL Chinese web site to find drivers for that motherboard (or other peripheral made in China) you just bought. What? You don't read Mandarin or Cantonese? Hard cheese! Guess you're SOL there getting that device to work.

      As far as the UN is concerned... They can take a flying leap. There's NOTHING the UN has done in the last few decades that's really worth squat.

      All they can do is condemn (oooh.. real scary!) dictators and the like for their bad actions. Like that's got 'em shaking in their boots. If anything we've got such FINE examples of UN progress - like the Oil for Food scandal with Sodom Insane and UN Secretary General Kofi Annanan's otherwise worthless kid taking bribes.

      Right. So we'll just turn control over the Internet over to them and watch the whole of it go to heck in a handbasket.
      Wolfie2K3