{"id":306746,"date":"2010-09-27T08:45:10","date_gmt":"2010-09-27T15:45:10","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=306746"},"modified":"2016-10-17T11:08:57","modified_gmt":"2016-10-17T18:08:57","slug":"software-aids-language-learners","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/software-aids-language-learners\/","title":{"rendered":"Software Aids Language Learners"},"content":{"rendered":"

By Gary Alt, Writer, Microsoft<\/em><\/p>\n

Imagine mining the web to learn a language. No, not the jargon of webspeak, where IMHO means \u201cin my humble opinion\u201d or F2F is \u201cface to face,\u201d but real, spoken languages, such as Spanish, Hindi, or Japanese. That\u2019s the notion that intrigued Ming Zhou (opens in new tab)<\/span><\/a>, Matt Scott, and their colleagues at Microsoft Research Asia (opens in new tab)<\/span><\/a> as they studied how the web\u2019s zillions of words, in scores of languages, could be utilized for exploring and learning new tongues.<\/p>\n

The resulting application, which they\u2019ve named Engkoo\u2014which loosely translates to \u201cEnglish vault\u201d in Mandarin Chinese\u2014is a groundbreaking piece of software that takes advantage of natural-language-processing and speech technologies to build massive sets of bilingual terms and sentences. Engkoo currently helps Chinese speakers who are learning English, but the technology could be applied to any two languages that are widespread on the web.<\/p>\n

\u201cEngkoo is a new kind of language-assistance technology for Chinese people, to enable them to ultimately master English as a native speaker might,\u201d says Scott, development lead of the Innovation Engineering (opens in new tab)<\/span><\/a> group and Engkoo project manager. \u201cIt unifies human translation mined from the web, machine translation, and a language-learning experience into one user-friendly search-and-explore interface.<\/p>\n

\u201cBy continuously discovering and analyzing high-quality translations on the Internet, Engkoo can be used to close the ever-expanding translation gap between English and Chinese. The technology itself is language independent and can be extended to other language pairs in the future.\u201d<\/p>\n

\"Ming

Ming Zhou<\/p><\/div>\n

Adds Zhou, senior researcher and research manager of the Natural Language Computing (opens in new tab)<\/span><\/a> group:<\/p>\n

\u201cEngkoo aims to improve the quality of English learning and translation in China. New words are added to both Chinese and English every day, while other words change in meaning and usage. Traditional translation dictionaries can\u2019t keep up and don\u2019t always provide results that reflect current common usage.<\/p>\n

\u201cAdditionally, Engkoo addresses other challenges, such as the difficulty of finding fluent sources of English learning material in schools, the necessity of assisting information workers who increasingly need to communicate globally, and combating the proliferation of the \u2019Chinglish\u2019 found on many public signs and billboards around China.\u201d The latter are poor Chinese-to-English translations that sometimes amuse, but more often confuse, English-speaking visitors.<\/p>\n

\u201cEngkoo is not designed primarily for learning a new language from the ground up,\u201d Zhou says. \u201cRather, it\u2019s designed to be an asset to those who already use or study English, such as English-as-a-second-language (ESL) students.\u201d<\/p>\n

So how exactly does Engkoo work? Zhou explains.<\/p>\n

\u201cEngkoo is the synthesis of multiple research technologies,\u201d he says. \u201cThe primary technology is that of mining human translation knowledge from the web. This helps power other features. The mining works by scanning the web to find parallel Chinese and English content from separate web pages or within the same page, such as from news websites that may publish an article in both Chinese and English. Using multiple adaptive-pattern techniques, the system can extract out language-aligned sentences and words, thus powering Engkoo\u2019s sample-sentence and term-definition features. The system uses multiple statistical methods to filter out noisy data and rank the sentences and definitions\u2014similar, in some ways, to how a search engine works.\u201d<\/p>\n

The result is an enormous lexicon of bilingual terms and sentence paradigms. Engkoo then layers in information from existing dictionaries and reference sources, and, voil\u00e0, you have the world\u2019s largest lexicon linking Chinese and English.<\/p>\n

Engkoo\u2019s ability to mine the vastness of the web provides it with powerful capabilities, going far beyond crude translations. It analyzes the parallel Chinese and English websites and then ranks the quality of the translation, thus building an ever-expanding repository of terms and sentences, ranked by the reliability and elegance of the translation.<\/p>\n

When a user enters words or sentences into the Engkoo search box, the software combs through its ranked data set to find the best translation. This works in both directions; the search terms can be in Chinese or English. What\u2019s more, Engkoo provides sample sentences that show how the translated words and phrases are used, thus helping the learner grasp the nuances of the foreign language.<\/p>\n

How is this different from the many online translation applications already on the web? That\u2019s like asking how calculus is different from arithmetic.<\/p>\n

\"Matt

Matt Scott<\/p><\/div>\n

\u201cEngkoo is different because it seamlessly unifies dictionary, machine translation, and language learning into an easy-to-use interface,\u201d Scott says, \u201cproviding the user with more context, fresher results, more robust ways to avoid near-miss queries, and new ways to explore language. It excels relative to other services that specialize in a dictionary, machine translation, and language learning.\u201d<\/p>\n

The size and source of Engkoo\u2019s lexicon is massive, comprising more than 10 million terms and sample sentences, at least twice the estimated lexicon size of the largest competitor in the Chinese market. But it\u2019s the source of this massive lexicon that really sets Engkoo apart. Because it uses novel web-mining technology to extract high-quality human translation knowledge from the Web, it\u2019s essentially creating a dynamic dictionary from translated news transcriptions and other Internet content. What makes this useful is that it\u2019s \u201creal\u201d English\u2014relevant and endlessly expanding.<\/p>\n

Achieving state-of-the-art quality in machine translation (MT) required a collaborative effort across Microsoft groups and partners in academia.\u00a0The key Microsoft partnership was between the Natural Language Processing (opens in new tab)<\/span><\/a> group at Microsoft Research Redmond (opens in new tab)<\/span><\/a>, led by Bill Dolan (opens in new tab)<\/span><\/a>, and the Natural Language Computing group. The groups have worked together on many fruitful MT collaborations. For instance, in 2008, the groups, along with other research partners, received the top Chinese-English MT quality ranking in the National Institute of Standards and Technology\u2019s prestigious Open MT evaluation series. Building on those research results, which were incorporated into Engkoo, the groups also worked together on other initiatives, including the Chinese-English MT engine now used in Bing Translator (opens in new tab)<\/span><\/a>.<\/p>\n

But it\u2019s as a language-learning service that Engkoo shines, exposing novel, useful learning features not found in any competing product. For example, studies have shown that English learners in China find it difficult to compare two similar English words, such as \u201ctaught\u201d and \u201cinstructed.\u201d Engkoo addressed this challenge with an innovation that harnesses research from the area of human-computer interaction. The resulting comparison tool enables users to search for a word and then, within a tabbed window environment, search for similar words. Each word appears as its own tab, which can be dragged and dropped for side-by-side comparisons. That way, users can compare sets of similar words, complete with definitions and sample sentences. This comparison tool has been hailed in the Chinese press and online community.<\/p>\n

Engkoo also provides the ability to explore example sentences by categorizing them by difficulty or domain. Users can learn at their own rate by selecting easy, medium, or difficult English, and they can choose English from domains such as written, oral, or technical. These classifications were performed through a novel machine-learning technique and applied on a massive scale.<\/p>\n

\"'Fuzzy'

‘Fuzzy’ search deployed within Engkoo.<\/p><\/div>\n

Relative to other language-learning tools, Engkoo offers a unique, phonetic-based “fuzzy” search adapted to local pronunciation habits of mainland Chinese. By studying users, the Engkoo team discovered that Chinese ESL learners often search for words as they sound, such as those they heard from foreign colleagues or from music or television. So, for example, a Chinese user might search for \u201dshampin,\u201d which mainland Chinese speakers commonly say when their intent is \u201dchampagne.\u201d Such behavior reveals a major limitation of other language-learning services, because many of those words cannot be found\u2014hence the learning process stops abruptly. Engkoo, by contrast, maps such words, enabling the user not only to find them, but also to learn the correct English spelling.<\/p>\n

The ability to observe the alignment of translated words or phrases in bilingual sample sentences is yet another groundbreaking tool in Engkoo. As the user mouses over either the Chinese sentence or the English translation, the corresponding words are highlighted in both. The alignment information not only clearly exposes the structural differences of translated sentence pairs, but also provides instant translations.<\/p>\n

Engkoo also provides the ability to learn and explore native English by statistically finding nearby words, or \u201ccollocations,\u201d a task that would be difficult\u2014if not impossible\u2014to discover on one\u2019s own without reading an enormous amount of English text. This system works because of a novel technique of leveraging part-of-speech wild cards. For example, users can find prepositions that typically follow the word \u201cterrific\u201d by simply searching for \u201cterrific prep.\u201d In this example, they could find sentences such as \u201cI think it looks terrific on you.\u201d These sentences are statistically significant because they are derived from web-scale language knowledge.<\/p>\n

Finally, there is the text-to-speech (TTS) feature in Engkoo that can convert input text into natural-sounding speech. This has proved one of users\u2019 favorite features, Scott notes. The state-of-the-art TTS technology has been developed and refined continuously by researchers in the Speech Group (opens in new tab)<\/span><\/a> at Microsoft Research Asia. The TTS used in Engkoo recently was rated as the best in intelligibility in the international TTS contest, Blizzard Challenge 2010, in both English and Chinese.<\/p>\n

The text-to-speech technology is based on a sophisticated, statistically trained model that succinctly captures the phonetic and rhythmic characteristics of native English speech. The model is then capable of synthesizing the sound evolution, the ups and downs of intonation change, the stressed or unstressed points of any given sentence. Besides being phonetically accurate, the spoken rhythm of the synthesized sentences is close to that of a native English speaker. This is invaluable to ESL learners, because the spoken rhythm is extremely difficult for a non-native English speaker to produce.<\/p>\n

Pronunciation and Intonation<\/h2>\n

\u201cEngkoo\u2019s speech synthesis has good pronunciation, and the intonation is not bad,\u201d says Frank Soong (opens in new tab)<\/span><\/a>, principal researcher and research manager of the Speech Group. \u201cOur English text-to-speech actually speaks better English than most of the English teachers in China.\u201d<\/p>\n

The TTS user interface is designed to facilitate easy playback and downloading to a user\u2019s MP3 player for later listening and practicing. There are more than a million MP3 downloads per month from the Engkoo website.<\/p>\n

Like most research developments, Engkoo, Zhou explains, rests on a strong foundation of past work.<\/p>\n

\u201cThe project began over a decade ago,\u201d he says. \u201cThe impetus for this research is one of the quintessential quests of the fields of natural language processing and computational linguistics: for computers to effectively assist people in understanding and using a foreign language.\u201d<\/p>\n

The group\u2019s first project was the English Writing Wizard, a program designed for ESL learners, which featured a manually compiled lexicon. This evolved into English Writing Assistant, a feature that shipped in Office 2003. The next step toward Engkoo took place during Microsoft Research TechFests in 2007 and 2008, with the Natural Language Computing group generating buzz around Lingo, a demo prototype of a language-learning tool that had evolved from the English Writing Assistant. That\u2019s when the Innovation Engineering group took over.<\/p>\n

Working with experts in the fields of speech processing, web-data management, and human-computer interaction, they refined Lingo to meet the needs of language learners, eventually launching engkoo.com (opens in new tab)<\/span><\/a> in 2009. The team continued to develop Engkoo\u2019s user scenarios and underlying technology in a process the call \u201cdeployment-driven research,\u201d and by early 2010, it was ready to release the feature broadly, shipping it in China as a part of Bing, where it is now called Bing Cidian, \u201ccidian\u201d translating to \u201cdictionary\u201d in English. This wide release brought millions of new users and established Engkoo as a highly popular product in the Asian market. Engkoo recently was named a finalist for The Wall Street Journal\u2019s 2010 Asian Innovation Awards (opens in new tab)<\/span><\/a>.<\/p>\n

Building on such success, the researchers plan to apply the Engkoo technology to new language pairs, such as Japanese and English. In the meantime, it\u2019s great to have two of the world\u2019s most widely spoken languages linked together in such a powerful learning tool.<\/p>\n","protected":false},"excerpt":{"rendered":"

By Gary Alt, Writer, Microsoft Imagine mining the web to learn a language. No, not the jargon of webspeak, where IMHO means \u201cin my humble opinion\u201d or F2F is \u201cface to face,\u201d but real, spoken languages, such as Spanish, Hindi, or Japanese. That\u2019s the notion that intrigued Ming Zhou, Matt Scott, and their colleagues at […]<\/p>\n","protected":false},"author":39507,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"categories":[194456,194462],"tags":[214886,200699,214895,201475,214883,214898,214901,214892,214880,186515,214889,186936,214904],"research-area":[13545],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-306746","post","type-post","status-publish","format-standard","hentry","category-natural-language-processing","category-speech-and-dialog","tag-bilingual-terms","tag-bing-translator","tag-chinese-to-english-translations","tag-engkoo","tag-english-vault","tag-english-as-a-second-language","tag-esl-students","tag-language-assistance-technology","tag-learn-a-language","tag-machine-translation","tag-mandarin-chinese","tag-natural-language-processing","tag-the-wall-street-journals-2010-asian-innovation-awards","msr-research-area-human-language-technologies","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199560,199565],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[144735,144736,144778],"related-projects":[],"related-events":[],"related-researchers":[],"msr_type":"Post","byline":"","formattedDate":"September 27, 2010","formattedExcerpt":"By Gary Alt, Writer, Microsoft Imagine mining the web to learn a language. No, not the jargon of webspeak, where IMHO means \u201cin my humble opinion\u201d or F2F is \u201cface to face,\u201d but real, spoken languages, such as Spanish, Hindi, or Japanese. That\u2019s the notion…","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/306746"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/39507"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=306746"}],"version-history":[{"count":3,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/306746\/revisions"}],"predecessor-version":[{"id":306764,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/306746\/revisions\/306764"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=306746"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=306746"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=306746"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=306746"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=306746"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=306746"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=306746"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=306746"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=306746"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=306746"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=306746"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}