Sunday, March 14, 2010




Google released Street View imagery for Hong Kong. Hong Kong is a fun and busy place where it’s easy to get lost, perhaps less so with these photo maps.






Reader Play, Google’s Slideshow of Interesting Stuff




OK, I can see myself waste some time with this. Google Reader Play is a casual, push style way to feed yourself distracting/ enlightening/ inspiring snippets, imagery and videos. You can star a specific page, Like it or share it, but just jumping from page to page using the bottom navigation works too.

Note Reader Play is personalized. Google writes, “We use the technology behind Recommended Items in Reader to populate Reader Play with the most interesting content on the web. While you don’t need a Google account to use Reader Play, your experience will be personalized if you sign in.” Google says that “Reader Play adapts to your tastes” (click Like, and more stuff like that should appear, Google suggests).

You can also set this app to auto-play, which sort of clashes with specific YouTube videos though... it would have been smarter to wait until a video ended playing before moving to the next bit. Now, I noticed that the actual source or author of a particular piece ends up as a kind of by-the-way footnote in this stream of stuff – even clicking on the “from” link will merely load that blog into Reader Play, and not open the source site – but I guess that might be the way of RSS and/ or the future.



No, we can’t translate "Yes we can"
By Roger Browne

Ten years ago, the best-available translation software analysed the source text to determine its structure: subject, object, nouns, verbs, phrases, etc. From the structure tree, a new text could be generated in the target language.

The precise details of Google’s translation algorithms are not published, but the structure tree is not the main mechanism. Instead, there is a corpus – an enormous database of parallel works. These are works available in more than one language as a result of a previous human translation.

Based on equivalents found in the corpus, Google obtains translations for various multi-word fragments from the source text, then blends those together into what is usually a coherent sentence in the target language.

The system doesn’t work so well on fragments that weren’t translated in the corpus. For example, the phrase “Yes we can” was used prominently in Barack Obama’s election campaign, and was therefore included untranslated in many foreign language news reports. You can see this in a search for [obama “yes we can”] on google.de.

As a result, Google Translate is not always able to translate that phrase, even when used in a context unrelated to Barack Obama.

In a test I performed today, I found that the phrase “Yes we can” was not translated into these languages:

Catalan, Czech, Dutch, Finnish, French, German, Hungarian, Italian, Polish, Portugese, Slovak, Spanish, Turkish.

It was translated into these languages:

Afrikaans, Albanian, Arabic, Belarusian, Bulgarian, Chinese, Croatian, Danish, Estonian, Filipino, Galician, Greek, Haitian, Hebrew, Hindi, Icelandic, Indonesian, Irish, Japanese, Korean, Latvian, Lithuanian, Macedonian, Malay, Maltese, Norwegian, Persian, Romanian, Russian, Serbian, Slovenian, Swahili, Thai, Ukrainian, Vietnamese, Welsh, Yiddish.

What can we conclude from this? Probably that the corpus for each language on the first list includes a higher proportion of Obama campaign reports than the corpus for any language on the second list.

No comments:

Post a Comment