{"id":6943,"date":"2008-08-22T09:54:00","date_gmt":"2008-08-22T17:54:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/translation\/2008\/08\/22\/statistical-machine-translation-guest-blog-updated-with-additional-paper\/"},"modified":"2008-08-22T09:54:00","modified_gmt":"2008-08-22T17:54:00","slug":"statistical-machine-translation-guest-blog-updated-with-additional-paper","status":"publish","type":"post","link":"https://www.microsoft.com\/en-us\/translator/blog\/2008\/08\/22\/statistical-machine-translation-guest-blog-updated-with-additional-paper\/","title":{"rendered":"Statistical Machine Translation – Guest Blog (Updated with additional paper)"},"content":{"rendered":"

Will Lewis is a program manager on the Microsoft Translator team, working on language quality and data acquisition.  Today’s guest blog is a high level explanation of how the engine works:  
\n

As many of you know, under the hood Microsoft Translator is powered by a Statistical Machine Translation (SMT) engine.  Statistical systems are different than rule-based ones in that the \u201crules\u201d mapping words and phrases from one language to another are learned by the system rather than being hand-coded.  Training an SMT requires amassing a large amount of parallel training data\u2014hopefully of good quality and from heterogeneous sources\u2014and training the engine on that data.  (By parallel, we mean a source of data where the content for one language is the same as the content for the other.)  The engine learns the correspondences between words and phrases in one language and those in another, which are often reinforced by repeated occurrences of the same words and phrases throughout the input.  For instance, in training the English-German system let\u2019s say, if the engine sees the phrase All rights reserved<\/I> on the English side and also notices Alle Rechte vorbehalten<\/I> on the German side, it may align these two phrases, and assign some probability to this alignment.  Repeated occurrences of the source and target phrases in the training data will only reinforce this alignment.
\n

Generally, having parallel data for a language pair means we can train engines in both directions (i.e., both the English-German and the German-English systems can be trained on the same input sentences).  Some of you had some questions regarding why it was that we released the English-Spanish system before we released Spanish-English.  There were really two reasons.  First, English-Spanish was the first general domain language pair we released.  Releasing one language pair allowed us to test the infrastructure before we started releasing more.  Second, the technology for Spanish-English was slightly different than that used for English-Spanish, and we need some additional time to do the necessary infrastructural changes to accommodate.  In the future, we plan to release new translation systems in pairs (with a couple of exceptions).  I can\u2019t reveal what languages we have planned next, but do expect some new ones soon!
\n

For those of you interested in technical discussions regarding our engines and how they work, please refer to some of the papers by the researchers who developed them.  Three recent papers of note are:
\n

Chris Quirk, Arul Menezes. Do we need phrases? Challenging the conventional wisdom in Statistical Machine Translation<\/B><\/A> May 2006 New York, New York, USA Proceedings of HLT-NAACL 2006<\/I>
\n

Chris Quirk, Arul Menezes. Dependency Treelet Translation: The convergence of statistical and example-based machine translation?<\/A><\/B> March 2006 Machine Translation 43-65 (Attached file)<\/P>
\n

Chris Quirk, Arul Menezes. Using Dependency Order Templates to Improve Generality in Translation<\/STRONG><\/A> July 2007 Association for Computational Linguistics<\/SPAN><\/P><\/p>\n

Dependency Treelet Translation The convergence of statistical and example-based machinetranslation.pdf<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"

Will Lewis is a program manager on the Microsoft Translator team, working on language quality and data acquisition.  Today’s guest blog is a high level explanation of how the engine works:   As many of you know, under the hood Microsoft Translator is powered by a Statistical Machine Translation (SMT) engine.  Statistical systems are different than rule-based ones in that the….<\/span><\/p>\n

CONTINUE READING \"Statistical Machine Translation – Guest Blog (Updated with additional paper)\"<\/span><\/a><\/p>","protected":false},"author":54,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[5],"tags":[],"acf":[],"yoast_head":"Statistical Machine Translation - Guest Blog (Updated with additional paper) - Microsoft Translator Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https://www.microsoft.com\/en-us\/translator/blog\/2008\/08\/22\/statistical-machine-translation-guest-blog-updated-with-additional-paper\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Statistical Machine Translation - Guest Blog (Updated with additional paper) - Microsoft Translator Blog\" \/>\n<meta property=\"og:description\" content=\"Will Lewis is a program manager on the Microsoft Translator team, working on language quality and data acquisition.  Today’s guest blog is a high level explanation of how the engine works:   As many of you know, under the hood Microsoft Translator is powered by a Statistical Machine Translation (SMT) engine.  Statistical systems are different than rule-based ones in that the....\" \/>\n<meta property=\"og:url\" content=\"https://www.microsoft.com\/en-us\/translator/blog\/2008\/08\/22\/statistical-machine-translation-guest-blog-updated-with-additional-paper\/\" \/>\n<meta property=\"og:site_name\" content=\"Microsoft Translator Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/microsofttranslator\" \/>\n<meta property=\"article:published_time\" content=\"2008-08-22T17:54:00+00:00\" \/>\n<meta name=\"author\" content=\"Microsoft Translator\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@mstranslator\" \/>\n<meta name=\"twitter:site\" content=\"@mstranslator\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Microsoft Translator\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https://www.microsoft.com\/en-us\/translator/blog\/2008\/08\/22\/statistical-machine-translation-guest-blog-updated-with-additional-paper\/#article\",\"isPartOf\":{\"@id\":\"https://www.microsoft.com\/en-us\/translator/blog\/2008\/08\/22\/statistical-machine-translation-guest-blog-updated-with-additional-paper\/\"},\"author\":{\"name\":\"Microsoft Translator\",\"@id\":\"https://www.microsoft.com\/en-us\/translator/blog\/#\/schema\/person\/0a163e1bf796b3bb651085032849cf37\"},\"headline\":\"Statistical Machine Translation – Guest Blog (Updated with additional paper)\",\"datePublished\":\"2008-08-22T17:54:00+00:00\",\"dateModified\":\"2008-08-22T17:54:00+00:00\",\"mainEntityOfPage\":{\"@id\":\"https://www.microsoft.com\/en-us\/translator/blog\/2008\/08\/22\/statistical-machine-translation-guest-blog-updated-with-additional-paper\/\"},\"wordCount\":505,\"publisher\":{\"@id\":\"https://www.microsoft.com\/en-us\/translator/blog\/#organization\"},\"articleSection\":[\"Developers\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https://www.microsoft.com\/en-us\/translator/blog\/2008\/08\/22\/statistical-machine-translation-guest-blog-updated-with-additional-paper\/\",\"url\":\"https://www.microsoft.com\/en-us\/translator/blog\/2008\/08\/22\/statistical-machine-translation-guest-blog-updated-with-additional-paper\/\",\"name\":\"Statistical Machine Translation - Guest Blog (Updated with additional paper) - Microsoft Translator Blog\",\"isPartOf\":{\"@id\":\"https://www.microsoft.com\/en-us\/translator/blog\/#website\"},\"datePublished\":\"2008-08-22T17:54:00+00:00\",\"dateModified\":\"2008-08-22T17:54:00+00:00\",\"breadcrumb\":{\"@id\":\"https://www.microsoft.com\/en-us\/translator/blog\/2008\/08\/22\/statistical-machine-translation-guest-blog-updated-with-additional-paper\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https://www.microsoft.com\/en-us\/translator/blog\/2008\/08\/22\/statistical-machine-translation-guest-blog-updated-with-additional-paper\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https://www.microsoft.com\/en-us\/translator/blog\/2008\/08\/22\/statistical-machine-translation-guest-blog-updated-with-additional-paper\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https://www.microsoft.com\/en-us\/translator/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Statistical Machine Translation – Guest Blog (Updated with additional paper)\"}]},{\"@type\":\"WebSite\",\"@id\":\"https://www.microsoft.com\/en-us\/translator/blog\/#website\",\"url\":\"https://www.microsoft.com\/en-us\/translator/blog\/\",\"name\":\"Microsoft Translator Blog\",\"description\":\"\",\"publisher\":{\"@id\":\"https://www.microsoft.com\/en-us\/translator/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https://www.microsoft.com\/en-us\/translator/blog\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https://www.microsoft.com\/en-us\/translator/blog\/#organization\",\"name\":\"Microsoft Corporation\",\"url\":\"https://www.microsoft.com\/en-us\/translator/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https://www.microsoft.com\/en-us\/translator/blog\/#\/schema\/logo\/image\/\",\"url\":\"https://www.microsoft.com\/en-us\/translator/blog\/wp-content\/uploads\/sites\/13\/2021\/05\/microsoft_logo_element-300x300-1.png\",\"contentUrl\":\"https://www.microsoft.com\/en-us\/translator/blog\/wp-content\/uploads\/sites\/13\/2021\/05\/microsoft_logo_element-300x300-1.png\",\"width\":300,\"height\":300,\"caption\":\"Microsoft Corporation\"},\"image\":{\"@id\":\"https://www.microsoft.com\/en-us\/translator/blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.youtube.com\/playlist?list=PLD7HFcN7LXRd4kd2XgZjIbQ8TwTC32Zc9\",\"https:\/\/www.facebook.com\/microsofttranslator\",\"https:\/\/twitter.com\/mstranslator\"]},{\"@type\":\"Person\",\"@id\":\"https://www.microsoft.com\/en-us\/translator/blog\/#\/schema\/person\/0a163e1bf796b3bb651085032849cf37\",\"name\":\"Microsoft Translator\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https://www.microsoft.com\/en-us\/translator/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/978e1fd70e1a6177e5cb285daa5ad026?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/978e1fd70e1a6177e5cb285daa5ad026?s=96&d=mm&r=g\",\"caption\":\"Microsoft Translator\"},\"url\":\"https://www.microsoft.com\/en-us\/translator/blog\/author\/mtteam\/\"}]}<\/script>","yoast_head_json":{"title":"Statistical Machine Translation - Guest Blog (Updated with additional paper) - Microsoft Translator Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https://www.microsoft.com\/en-us\/translator/blog\/2008\/08\/22\/statistical-machine-translation-guest-blog-updated-with-additional-paper\/","og_locale":"en_US","og_type":"article","og_title":"Statistical Machine Translation - Guest Blog (Updated with additional paper) - Microsoft Translator Blog","og_description":"Will Lewis is a program manager on the Microsoft Translator team, working on language quality and data acquisition.  Today’s guest blog is a high level explanation of how the engine works:   As many of you know, under the hood Microsoft Translator is powered by a Statistical Machine Translation (SMT) engine.  Statistical systems are different than rule-based ones in that the....","og_url":"https://www.microsoft.com\/en-us\/translator/blog\/2008\/08\/22\/statistical-machine-translation-guest-blog-updated-with-additional-paper\/","og_site_name":"Microsoft Translator Blog","article_publisher":"https:\/\/www.facebook.com\/microsofttranslator","article_published_time":"2008-08-22T17:54:00+00:00","author":"Microsoft Translator","twitter_card":"summary_large_image","twitter_creator":"@mstranslator","twitter_site":"@mstranslator","twitter_misc":{"Written by":"Microsoft Translator","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https://www.microsoft.com\/en-us\/translator/blog\/2008\/08\/22\/statistical-machine-translation-guest-blog-updated-with-additional-paper\/#article","isPartOf":{"@id":"https://www.microsoft.com\/en-us\/translator/blog\/2008\/08\/22\/statistical-machine-translation-guest-blog-updated-with-additional-paper\/"},"author":{"name":"Microsoft Translator","@id":"https://www.microsoft.com\/en-us\/translator/blog\/#\/schema\/person\/0a163e1bf796b3bb651085032849cf37"},"headline":"Statistical Machine Translation – Guest Blog (Updated with additional paper)","datePublished":"2008-08-22T17:54:00+00:00","dateModified":"2008-08-22T17:54:00+00:00","mainEntityOfPage":{"@id":"https://www.microsoft.com\/en-us\/translator/blog\/2008\/08\/22\/statistical-machine-translation-guest-blog-updated-with-additional-paper\/"},"wordCount":505,"publisher":{"@id":"https://www.microsoft.com\/en-us\/translator/blog\/#organization"},"articleSection":["Developers"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https://www.microsoft.com\/en-us\/translator/blog\/2008\/08\/22\/statistical-machine-translation-guest-blog-updated-with-additional-paper\/","url":"https://www.microsoft.com\/en-us\/translator/blog\/2008\/08\/22\/statistical-machine-translation-guest-blog-updated-with-additional-paper\/","name":"Statistical Machine Translation - Guest Blog (Updated with additional paper) - Microsoft Translator Blog","isPartOf":{"@id":"https://www.microsoft.com\/en-us\/translator/blog\/#website"},"datePublished":"2008-08-22T17:54:00+00:00","dateModified":"2008-08-22T17:54:00+00:00","breadcrumb":{"@id":"https://www.microsoft.com\/en-us\/translator/blog\/2008\/08\/22\/statistical-machine-translation-guest-blog-updated-with-additional-paper\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https://www.microsoft.com\/en-us\/translator/blog\/2008\/08\/22\/statistical-machine-translation-guest-blog-updated-with-additional-paper\/"]}]},{"@type":"BreadcrumbList","@id":"https://www.microsoft.com\/en-us\/translator/blog\/2008\/08\/22\/statistical-machine-translation-guest-blog-updated-with-additional-paper\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https://www.microsoft.com\/en-us\/translator/blog\/"},{"@type":"ListItem","position":2,"name":"Statistical Machine Translation – Guest Blog (Updated with additional paper)"}]},{"@type":"WebSite","@id":"https://www.microsoft.com\/en-us\/translator/blog\/#website","url":"https://www.microsoft.com\/en-us\/translator/blog\/","name":"Microsoft Translator Blog","description":"","publisher":{"@id":"https://www.microsoft.com\/en-us\/translator/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https://www.microsoft.com\/en-us\/translator/blog\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https://www.microsoft.com\/en-us\/translator/blog\/#organization","name":"Microsoft Corporation","url":"https://www.microsoft.com\/en-us\/translator/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https://www.microsoft.com\/en-us\/translator/blog\/#\/schema\/logo\/image\/","url":"https://www.microsoft.com\/en-us\/translator/blog\/wp-content\/uploads\/sites\/13\/2021\/05\/microsoft_logo_element-300x300-1.png","contentUrl":"https://www.microsoft.com\/en-us\/translator/blog\/wp-content\/uploads\/sites\/13\/2021\/05\/microsoft_logo_element-300x300-1.png","width":300,"height":300,"caption":"Microsoft Corporation"},"image":{"@id":"https://www.microsoft.com\/en-us\/translator/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.youtube.com\/playlist?list=PLD7HFcN7LXRd4kd2XgZjIbQ8TwTC32Zc9","https:\/\/www.facebook.com\/microsofttranslator","https:\/\/twitter.com\/mstranslator"]},{"@type":"Person","@id":"https://www.microsoft.com\/en-us\/translator/blog\/#\/schema\/person\/0a163e1bf796b3bb651085032849cf37","name":"Microsoft Translator","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https://www.microsoft.com\/en-us\/translator/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/978e1fd70e1a6177e5cb285daa5ad026?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/978e1fd70e1a6177e5cb285daa5ad026?s=96&d=mm&r=g","caption":"Microsoft Translator"},"url":"https://www.microsoft.com\/en-us\/translator/blog\/author\/mtteam\/"}]}},"_links":{"self":[{"href":"https://www.microsoft.com\/en-us\/translator/blog\/wp-json\/wp\/v2\/posts\/6943"}],"collection":[{"href":"https://www.microsoft.com\/en-us\/translator/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https://www.microsoft.com\/en-us\/translator/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https://www.microsoft.com\/en-us\/translator/blog\/wp-json\/wp\/v2\/users\/54"}],"replies":[{"embeddable":true,"href":"https://www.microsoft.com\/en-us\/translator/blog\/wp-json\/wp\/v2\/comments?post=6943"}],"version-history":[{"count":0,"href":"https://www.microsoft.com\/en-us\/translator/blog\/wp-json\/wp\/v2\/posts\/6943\/revisions"}],"wp:attachment":[{"href":"https://www.microsoft.com\/en-us\/translator/blog\/wp-json\/wp\/v2\/media?parent=6943"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https://www.microsoft.com\/en-us\/translator/blog\/wp-json\/wp\/v2\/categories?post=6943"},{"taxonomy":"post_tag","embeddable":true,"href":"https://www.microsoft.com\/en-us\/translator/blog\/wp-json\/wp\/v2\/tags?post=6943"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}