{"id":307538,"date":"2008-03-05T12:00:52","date_gmt":"2008-03-05T20:00:52","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=307538"},"modified":"2016-10-18T17:07:09","modified_gmt":"2016-10-19T00:07:09","slug":"translating-web-entire-world","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/translating-web-entire-world\/","title":{"rendered":"Translating the Web for the Entire World"},"content":{"rendered":"

By Rob Knies, Managing Editor, Microsoft Research<\/em><\/p>\n

People all over the world use the Internet every day, to purchase goods or services, to search for information, to find diversions.<\/p>\n

But is the World Wide Web truly worldwide?<\/p>\n

It\u2019s difficult to make the case. Estimates claim that approximately 70 percent of Web pages today are created in the English language, while the percentage of non-English speakers is growing faster than that of English speakers. So what if you don\u2019t speak English? Or what if you do and you find an interesting page written in German? Or Russian? Or Chinese?<\/p>\n

Microsoft Research aims to please.<\/p>\n

Windows Live Translator<\/a>, a free translation portal and a Web service that powers many other translation scenarios, is the result of more than eight years of diligent machine-translation effort within Microsoft Research. With it, Microsoft Research offers a simple, intuitive translation service\u2014while making ongoing improvements to translation quality. In addition to the portal, its Bilingual Viewer features a unique, side-by-side Web-page viewer that translates entire Web pages with blinding speed between 25 sets of language pairs.<\/p>\n

For Stephen Richardson and Heather Thorne, who are leading an effort to evangelize Microsoft Research\u2019s machine-translation work for incorporation into a bevy of other Microsoft products and services, Windows Live Translator points the way to a future when the contents of the entire Web will be free of language-based limitations and it will be easy for users to communicate with people everywhere, from within any Microsoft product or Web service.<\/p>\n

\u201cThis,\u201d says Richardson, principal researcher in the Natural Language Processing<\/a> (NLP) group within Microsoft Research Redmond<\/a>, \u201cis a technology that will literally change the way the world works. We\u2019re in a place, here at Microsoft, where that can happen.\u201d<\/p>\n

The group\u2019s machine-translation technology was showcased during a couple of events in early March. MIX08<\/a>, Microsoft\u2019s ongoing conversation with next-generation Web and interactive-agency professionals, scheduled in Las Vegas from March 5 to 7, featured the integration of Windows Live Translator with the upcoming version of Internet Explorer<\/a>. And during TechFest 2008<\/a>, an annual gathering set in Redmond on March 5-6 in which Microsoft employees and media representatives from around the world got a chance to observe and discuss the latest projects from Microsoft Research\u2019s worldwide labs, current features and services, as well as future plans, were on display.<\/p>\n

\u201cOur vision,\u201d Richardson says, \u201cis to produce a machine-translation system and technology that can provide translation across all of the potential scenarios we can imagine, with Microsoft products and services around the world.\u201d<\/p>\n

It\u2019s been a long journey for Richardson, who began working on machine translation while an undergraduate in the 1970s.<\/p>\n

\u201cI was a junior in college,\u201d he recalls, \u201cand I was on a project where we trying to create a machine-translation system that we felt would change the world. Everybody\u2019s dream, right?<\/p>\n

\u201cOf course, it took a lot longer than I ever dreamt. But to be here now, involved with this great group of people putting out something that just has killer-app potential \u2026\u201d<\/p>\n

Thorne, director of business strategy for the Machine Translation<\/a> product team, comes to the project from an entirely different direction. Having studied Russian and International Studies during her undergraduate days, she found herself working on translation and interpretation while working for NASA on its joint space program with the Russians, and that led her to explore the state of the art of machine translation.<\/p>\n

\u201cGranted,\u201d she says, \u201cthis was 15 years ago. I remember discovering that quality was quite low. It was not able to replace the need for human translators.\u201d<\/p>\n

For certain uses, though, this is slowly changing.<\/p>\n

Four years ago, Thorne found her way to Microsoft, working for the Windows<\/a> organization. Then she heard about Microsoft Research\u2019s machine-translation work.<\/p>\n

\u201cWhen I discovered this team and what they were looking to do, that was a perfect fit for my background and my area of interest,\u201d she says. \u201cI realized that this would be a great opportunity to bring the experiences I\u2019d had in much bigger businesses into this small team, which felt much more like a startup.\u201d<\/p>\n

She joined NLP in March 2007 and has played an integral role in guiding the team\u2019s strategy toward integration of machine-translation technology into Microsoft offerings. For example, the team\u2019s scalable Web service is being applied to address specific user scenarios, such as integration into Live Search<\/a>, Internet Explorer, Windows Live Messenger, Office<\/a>, and many other products and services. Users can download a widget<\/a> that they can employ to add Translator to their own Web sites, and individuals can install a Windows Live Translator toolbar button<\/a> for translations with a mere click. With twice the number of downloads from non-English-speaking markets compared with English-speaking markets, it\u2019s clear that this service meets a need for international audiences.<\/p>\n

Still, it\u2019s been a formidable challenge to reach this point. Machine translation is a tough nut to crack. For a long time, machine translation was seen as largely unhelpful; users became frustrated with technology that often turned text in one language to gobbledygook in another.<\/p>\n

\u201cMachine translation had this bad reputation,\u201d Richardson recalls, \u201cof being unreadable sometimes.\u201d<\/p>\n

Perfection was proving stubbornly elusive. As it turns out, perfection itself was part of the problem.<\/p>\n

\u201cThere was an acronym from the 1960s: FAHQT\u2014fully automatic high-quality translation of general text,\u201d Richardson says. \u201cThat was the holy grail of machine translation. That\u2019s what everybody was trying for.\u201d<\/p>\n

FAHQT, though, turned out to be unrealistic. A couple of years ago, Jaap van der Meer, a pioneer in the translation industry, coined a new, more achievable acronym: FAUT\u2014fully automatic useful<\/em> translation. Instead of trying to devise a system robust enough to fool your school\u2019s infallible French teacher, how about developing one sufficiently accurate to provide translations that could provide real value to real users in real time?<\/p>\n

\u201cWhat we\u2019re trying to do is say, \u2018You know, machine translation as a science is not perfect,\u2019 \u201cRichardson says. \u201cIt\u2019s far from perfect\u2014just as search is far from perfect today. But there are a lot of things you can do to mitigate the imperfections and help customers get to the results they\u2019re looking for.\u201d<\/p>\n

On one hand, there are user-interface improvements, such as the Bilingual Viewer, showing side-by-side Web-page translations that enable a user to compare a translation to the original. On the other hand, there are ways to improve the research process itself to deliver the right degree of accuracy to the right user in the right situation.<\/p>\n

Enter MSR-MT, Microsoft Research\u2019s machine-translation project.<\/p>\n

MSR-MT is a data-driven machine-translation system behind Windows Live Translator that automatically acquires translation knowledge from previously human-translated text, combining linguistic knowledge and statistical processing into a hybrid approach. Using as input data millions of sentences from Microsoft technical materials that have been translated by humans, MSR-MT is capable of producing output in a single night that is on a qualitative par with systems that require months of human customization.<\/p>\n

The system already has proven its value within Microsoft, having been used in 2003 to translate nearly 140,000 customer-support Knowledge Base<\/a> articles into Spanish. The effort was extended to Japanese the next year and to French and German in 2005. Now, Microsoft\u2019s Knowledge Base materials have been translated into nine languages by MSR-MT.<\/p>\n

Such success has lowered the cost barrier to obtaining customized, higher-quality machine translation and is able to provide weekly updates and additions, a goal heretofore impossible to achieve. Bill Gates<\/a>, Microsoft chairman, gave the mature technology the green light in 2005, and things took off from there.<\/p>\n

\u201cWhat we focused on the past year or two was to take the work we\u2019ve used internally here at Microsoft and make it available outside the company in the most compelling initial scenario we could identify, which turned out to be Search,\u201d Richardson says, \u201cand then build a backbone system, a Web service that could not only supply translations to Search, but also would be the basis for anything else that we did in the future.\u201d<\/p>\n

The data-driven approach, Thorne adds, also enables Microsoft Research\u2019s machine-translation efforts to focus on customer needs.<\/p>\n

\u201cGiven that we probably can\u2019t translate everything well,\u201d she says, \u201cwe need to do a good job of understanding which Web sites people are looking at and what they are asking us to translate. What are the areas of the Web that people are really interested in?<\/p>\n

\u201cIf we have limited resources and limited amounts of data we can get, where do we need to focus our efforts? It\u2019s a combination of the technology getting better and us doing a better job of understanding the customer need.\u201d<\/p>\n

Such efforts, of course, require the efforts of many, as Richardson and Thorne are quick to note. Andreas Bode, the team\u2019s development lead, has been instrumental in creating the Web-service infrastructure and leading all development. Chris Wendt<\/a>, lead program manager, has worked closely with the other product teams to ensure successful integration of the Windows Live Translator Web service into their products. David Darnell has overseen the testing of the technology, and Arul Menezes<\/a> and Chris Quirk<\/a> were key contributors to the MSR-MT technology itself.<\/p>\n

In addition, collaboration with the Live Search team has proved essential, and the Windows International organization has provided avid support.<\/p>\n

\u201cThe reason why we have so many languages and gotten all the data we\u2019ve gotten across Microsoft,\u201d Richardson says, \u201cis because of the effort by the internal localization community, which was spearheaded by the Windows International group.\u201d<\/p>\n

That team also devised the side-by-side interface that makes Windows Live Translator so easy to use. Initially, the user interface was called the Flipper Flopper. That whimsical contribution has evolved into one of the technology\u2019s most popular features, the Bilingual Viewer.<\/p>\n

It\u2019s no surprise that much remains to be accomplished. New subject domains are being investigated, and product integration remains central to ongoing efforts.<\/p>\n

\u201cWe\u2019re always looking at improving the quality,\u201d Richardson says, \u201cand the more of the right kind of data that you have, and the more you do with it, the better quality you can get.\u201d<\/p>\n

For Thorne, it\u2019s been an invigorating experience.<\/p>\n

\u201cIt\u2019s really, really exciting to be so close to a product where the people I sit next to are literally the guys who wrote the code,\u201d she says. \u201cIt\u2019s a very complicated space, and yet it\u2019s still something that you can see and touch in this very tangible way. Everybody takes a lot of pride in what they do, and it\u2019s really exciting to see the progress and to see everybody\u2019s commitment to it.\u201d<\/p>\n

Richardson agrees wholeheartedly.<\/p>\n

\u201cWe\u2019ve always been a tight-knit group at NLP,\u201d he says, \u201cbut our machine-translation incubation group has worked their tails off to produce something that has jumped to the forefront of what people have said is cool about machine translation. That makes me incredibly proud and grateful.\u201d<\/p>\n","protected":false},"excerpt":{"rendered":"

By Rob Knies, Managing Editor, Microsoft Research People all over the world use the Internet every day, to purchase goods or services, to search for information, to find diversions. But is the World Wide Web truly worldwide? It\u2019s difficult to make the case. Estimates claim that approximately 70 percent of Web pages today are created […]<\/p>\n","protected":false},"author":39507,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"categories":[194456],"tags":[186515,215507,215504,215501],"research-area":[13545],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-307538","post","type-post","status-publish","format-standard","hentry","category-natural-language-processing","tag-machine-translation","tag-translation-portal","tag-windows-live-translator","tag-world-wide-web","msr-research-area-human-language-technologies","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199565],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[144736],"related-projects":[],"related-events":[199630],"related-researchers":[],"msr_type":"Post","byline":"","formattedDate":"March 5, 2008","formattedExcerpt":"By Rob Knies, Managing Editor, Microsoft Research People all over the world use the Internet every day, to purchase goods or services, to search for information, to find diversions. But is the World Wide Web truly worldwide? It\u2019s difficult to make the case. Estimates claim…","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/307538"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/39507"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=307538"}],"version-history":[{"count":5,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/307538\/revisions"}],"predecessor-version":[{"id":308564,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/307538\/revisions\/308564"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=307538"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=307538"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=307538"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=307538"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=307538"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=307538"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=307538"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=307538"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=307538"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=307538"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=307538"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}