This is the first in a series of blogs that aim to explore a variety of dark matters: scientific, economic, statistical, fictional and religious.
“Breaking Google Translate” was initially intended as a serious introduction to machine translation and all of the errors in this blog were caused by Google Translate, not myself. The same goes for any sense of humour.
(This essay was first written in German and then translated into English by Google Translate.)
Statistical Machine Translation is a branch of Artificial Intelligence that translates documents written by the power of statistical analysis into various languages. Unlike other methods of computer-aided translation, Stat. MT. no involvement of human translators.
The history of Stat. MT begins in biblical times when Humpty Dumpty of the Tower of Babel fell and the king and his army first tried to find a way to save Humpty Dumpty and all the languages of humanity. For century after century there were only small paving, z. For example, bilingual dictionaries such as the Rosetta stone and common languages such as Esperanto, written by careful scientists. In spite of the work of these brave souls, it was quite clear that the King and his army could not recapture Humpty Dumpty, the Tower of Babel, or humanity.
But with the development of computer science in the second half of the twentieth century, scientists began to imagine the possibility of computers as fast, cheap Humpty-Dumpty (i.e., translation) repairers. First, researchers develop rule-based systems that function almost like human translators, i. These computer programs used grammatical rules and bilingual dictionaries to build word-to-word translations. However, because of this, this method encountered similar difficulties as human translators, namely the ambiguity of linguistic utterances, individual words but also whole sentences, and distinguished between the word order of the source and target languages. Such problems mean that rule-based translation systems require a tremendous amount of linguistic knowledge of each source and target language, which is very time consuming and costly. In practice, human translators worked even cheaper and faster than computer programs, and the dream of rebuilding Humpty Dumpty and the Tower of Babel disappeared.
But that’s not the end of this story. The basic problem of rule-based systems was the incapacity to decide between various possible meanings of words and contexts. Possibility and probability are the daily business of statistics and so in the 80’s computer scientists began to conduct research with statistical analysis of bilingual documents. These “example-based” systems contain a plurality of parallel corpora consisting of sentence pairs, each of which is a translation of each other. From these available translations, a probabilistic translation model is trained by assigning probabilities. When such a system encounters a new sentence, it compares the sentence with its database of examples and makes it likely that a word of the target language is a good translation, a word of the source language.
At the moment, this probabilistic reconstruction of Humpty Dumpty is taking on an image that resembles more than a portrait of Picasso and less than an egg. But machine translation is still a young subject. Of course, it is not ready to take over the role of human translators. Nevertheless, there are good reasons to be optimistic. In little more than a decade after the failure of rule-based translation systems, sample-based systems were built that are used by millions of people every day on the Internet. Today, these systems are practically not independent and users need to have some knowledge of both source and target languages in order to get an acceptable translation. But the ability to use the Internet as a giant lab and access to the massive databanks of international organizations suggest that machine translation should continue to evolve rapidly.
In the near future it will not be unreasonable to expect that for newspaper articles, technical documents and similar contributions, machine translation will soon be the fastest, cheapest and simplest translation method. And the flattened spheroid of Humpty Dumpty and his Tower of Babel will be even clearer.