Lost in translation: More on the crimes of Google translation

Stevej offers in comments about my previous posting an explanation as to why machine translation is so bad, concluding (go read the whole comment):Try it again in a Western language and you'll find the results are much more consistent. It's just the difficulty of translating to/from Asian languages that you are focused on.

Stevej offers in comments about my previous posting an explanation as to why machine translation is so bad, concluding (go read the whole comment):

Try it again in a Western language and you'll find the results are much more consistent. It's just the difficulty of translating to/from Asian languages that you are focused on. It has nothing to do with Google's translate program.

I'm aware of the reasons for the engarblement. Unfortunately, those reasons have everything to do with what people think they get from Google's translation service.Google translation doesn't stand up to the test of a Western language either. Customers are offered clarity and get nonsense.

Here's my point, if it wasn't clear before: Now that "translation" is becoming an element of search that people are using to create "news," as was the case with the Matsuzake story I discussed earlier, the consequences of our poorly developed technology being presented as "state-of-the-art" become clear. Bad  data starts to drive out good because poor translation gives the bad data an imprimatur of quality.

Thanks for explaining what you think the reason for the engarblement, though I disagree with the notion that Japanese pictograms represent 'ideas'  and not words. Kanji is heavily reliant on compound words made up of characters that represent the same thing a word in English does, a specific concept, such as "fu" for attach or append or sending something depending on what it is combined with, that are combined to convey a specific meaning. By your logic, German would make no sense, either, yet you say it can be more reliably translated.

So, I tried the first few lines of Goethe's Nearness of the Beloved in Google translation and got:

I think yours,
if me the sun glow of the sea radiates;
I think yours,
if the moon flare paints itself in sources.

The construction of the phrase "I think of you," the trope Goethe repeats twice in this stanza, is pretty simple, yet completely ruined by this machine translation. The last word, which Google translates as "sources," is supposed to be springs, sources of water bubbling to the surface.

This suggests that German is a language of ideas and not words, as well. In fact, all languages are amalgams of symbols (words made of characters) and meaning that are difficult to translate.

In any case, Google translation doesn't stand up to the test of a Western language either.

My argument is that Google--and other translation systems--should come with better sourcing information to prevent the spread of misinformation such as the possibly retranslated article about the Matsuzake posting, and, where they do a worse job than no translation at all, stop offering a product. It's not honest to call what these systems do "translation."

We're actually using technology that reinforces the isolation of languages by making character sets the basis of reliability--does anyone really think that the Japanese are speaking to one another like the article translation? Yes, because they read these bad translations.

Comic renditions of language like the lunatic phrases of the Matsuzake article, just as drawing an Asian with big teeth and a coolie hat, reinforce the idea that we are different, better, superior. That is the basis of the jokes in Borat, as well, though Sacha Cohen Baron manages to be genuinely funny rather than just offensive because he turns the mistranslation back on the bigots he meets. If you can't do a language justice, don't claim you can translate it. English is not the only clear language on the planet, though you'd get that idea if you relied on Google. 

It isn't just Asian languages, as Goethe has shown us. I speak and read Russian well enough to pick my way through an article in Pravda and have seen butchered transations of the Pravda page (despite the fact that there is an excellent English-language version of the site) posted on other sites. African, Arabic, Hindi and other languages are difficult to translate, too. Google doesn't offer "beta" versions of most of those because they know it's not going to be accurate.

There is apparently no industry-driven standard for what is acceptable translation. As we enter an information age, we're being served the equivalent of snake oil and spoiled meat that made the early industrial age so deadly to ordinary folks of that pre-Food and Drug Administration time.

So, it would be better if, until machine translation is accurate it is not advertised as reliable in any way. Calling something "beta" doesn't cut it in terms of cautioning users that the data they get will be inaccurate.