Research Archives - Microsoft Translator Blog http://approjects.co.za/?big=en-us/translator/blog/category/research/ Fri, 03 May 2024 18:16:58 +0000 en-US hourly 1 Azure AI Custom Translator Neural Dictionary: Delivering Higher Terminology Translation Quality  http://approjects.co.za/?big=en-us/translator/blog/2023/12/06/azure-ai-custom-translator-neural-dictionary-delivering-higher-terminology-translation-quality/ Wed, 06 Dec 2023 14:00:56 +0000 http://approjects.co.za/?big=en-us/translator/blog/?p=9839 Today, we are super excited to announce the release of neural dictionary, a significant translation quality improvement to our platform. In this blog post, we will explore the neural dictionary feature. Introduction   Neural dictionary is an extension to our dynamic dictionary and phrase dictionary features in Azure AI Translator. Both allow our users to customize the translation output by providing their....

The post Azure AI Custom Translator Neural Dictionary: Delivering Higher Terminology Translation Quality  appeared first on Microsoft Translator Blog.

]]>

Today, we are super excited to announce the release of neural dictionary, a significant translation quality improvement to our platform. In this blog post, we will explore the neural dictionary feature.

Introduction  

Neural dictionary is an extension to our dynamic dictionary and phrase dictionary features in Azure AI Translator. Both allow our users to customize the translation output by providing their own translations for specific terms or phrases. Our previous method used verbatim dictionary, which was an exact find-and-replace operation. Neural dictionary improves translation quality for sentences which may include one or more term translations by letting the machine translation model adjust both the term and the context to produce more fluent translation. At the same time, it preserves the high term translation accuracy.  

The following English-German example demonstrates differences in translation outputs between both methods when a custom terminology translation is requested: 

Input:   Basic Knowledge of <mstrans:dictionary translation=”regelmäßiges Testen”>Periodic Maintenance</mstrans:dictionary>   
Verbatim dictionary:   Grundkenntnisse der regelmäßiges Testen 
Neural dictionary:   Grundkenntnisse des regelmäßigen Testens 

Quality improvement 

The chart below illustrates the significant improvements the new feature brings on common publicly available terminology test sets in Automotive (https://aclanthology.org/2021.eacl-main.271), Health (https://aclanthology.org/2021.emnlp-main.477) and Covid-19 domains (https://aclanthology.org/2021.wmt-1.69) using our general translation models. 

We also conducted a series of customer evaluations on Custom Translator platform and neural dictionary models. We measured the translation quality gains on customer data between models with and without the neural dictionary extension. Five customers participated, covering German, Spanish, and French in different business domains.

The chart below shows the average improvement of COMET in the education domain for English-German, English-Spanish, and English-French; for general models on the left, and for customized models on the right. BLUE color bars represent general translation quality without neural dictionary and ORANGE color bars represent translation quality using neural dictionary. These are overall average improvements on the entire test sets. For segments including one or more customer’s dictionary entries (between 19% and 63%), the improvement is as high as +6.3 to +12.9 COMET points. 

 Supported languages  

  • Currently available (as of December 6, 2023): Chinese simplified, French, German, Italian, Japanese, Korean, Polish, Russian, Spanish and Swedish – to and from English.  
  • We are adding more in the future. For updates, refer to Custom Translator release notes 

How neural dictionary works 

Neural dictionary does not employ the exact find-and-replace operation when handling custom terminology translation. Instead, it translates terms or phrases from the dictionary in a way that fits best the entire context. This means that the term can be inflected or have different casing, or that the surrounding words can be adjusted, producing a more fluent and coherent translation.  

Let’s say, for example, we have the following input sentence in English and its translation into Polish without any dictionary phrases is as follows:  

Input:   We need a fast solution that will be understandable.  
Standard translation:   Potrzebujemy szybkiego rozwiązania, które będzie zrozumiałe.  

If you want to make sure that “solution” is translated as “alternatywa” (“an alternative” in English), you can add a dynamic dictionary annotation to achieve that:  

Input:   We need a fast <mstrans:dictionary translation=”alternatywa“>solution</mstrans:dictionary> that will be understandable.  
Verbatim dictionary:   Potrzebujemy szybkiego alternatywa, który będzie zrozumiały.  
Neural dictionary:   Potrzebujemy szybkiej alternatywy, która będzie zrozumiała.  

The output produced by the previous method is not fluent as grammatical gender consistency is violated. The neural dictionary produces fluent output by a) inflecting the requested replacement and b) changing the surrounding words where needed. It can also change the casing in some cases, as in the following example:  

Input:   This company’s <mstrans:dictionary translation=”akcje“>stock</mstrans:dictionary> is cheap.  
Verbatim dictionary:   akcje tej firmy jest tani.  
Neural dictionary:   Akcje tej firmy tanie.   

Neural dictionary expects that the requested translation of a term is provided in its base grammatical form. Multi-word terms are also supported and should be provided as noun phrases, i.e., words should not be lemmatized independently (for example, “Estonian parliamentary election” will be better than “Estonia parliament election”). 

How to enable neural dictionary 

For all supported languages listed above, neural dictionary is immediately available for all customers using Custom Translator platform with phrase dictionaries. Full (or dictionary only) custom model retraining is required to enable neural dictionary. 

 Recommendations 

  1. If you want to ensure that the phrase dictionary entry is used more often when working with neural dictionary, you may consider adding the phrase entry with the source part in various forms. In the above example, next to “solution _ alternatywa”, you may want to add the following entries as well: “Solution _ alternatywa”, “solutions _ alternatywy”, “Solutions _ alternatywy”.  
  2. If the goal is to ensure that a specific word or phrase is copied “as is” from the input text to the output translation when using phrase dictionary, consider enforcing verbatim dictionary as it may be more consistent.   
  3. Avoid adding translations of common or frequent words or phrases to the phrase dictionary.  

To learn more about Custom Translator and how it can help your business thrive in the global marketplace, start with the Custom Translator beginner’s guide. 

What you can do with Microsoft Custom Translator 

Build custom models with your domain specific terminology and translate real-time using the Microsoft Translator API. 

Use Microsoft Custom Translator with your translation solutions to help globalize your business and improve customer interactions. 

For more information, visit Microsoft Translator business solutions and Custom Translator release notes. 

 

 

 

 

 

The post Azure AI Custom Translator Neural Dictionary: Delivering Higher Terminology Translation Quality  appeared first on Microsoft Translator Blog.

]]>
Bing’s gendered translations tackle bias in translation http://approjects.co.za/?big=en-us/translator/blog/2023/03/08/bings-gendered-translations-tackle-bias-in-translation/ Wed, 08 Mar 2023 08:00:07 +0000 http://approjects.co.za/?big=en-us/translator/blog/?p=9679 We’re excited to announce that, as of today, masculine and feminine alternative translations are available for when translating from English to Spanish, French, or Italian. You can try out this new feature in both Bing Search and Bing Translator verticals. Over the last few years, the field of Machine Translation (MT) has been revolutionized by the advent of transformer models,....

The post Bing’s gendered translations tackle bias in translation appeared first on Microsoft Translator Blog.

]]>
Gender de-bias
3D rendering of gender symbols.

We’re excited to announce that, as of today, masculine and feminine alternative translations are available for when translating from English to Spanish, French, or Italian. You can try out this new feature in both Bing Search and Bing Translator verticals.

Over the last few years, the field of Machine Translation (MT) has been revolutionized by the advent of transformer models, leading to tremendous improvements in quality. However, models optimized to capture the statistical properties of data collected from the real world inadvertently learn or even amplify social biases found in that data.

Our latest release is a step towards reducing one of these biases, specifically gender bias that is prevalent in MT systems. Bing Translator has always produced a single translation for an input sentence even when the translations could have had other gender variations including feminine and masculine variants. In accordance with the Microsoft responsible AI principles, we want to ensure we provide correct alternative translations and are more inclusive to all genders. As part of this journey our first step is to provide feminine and masculine translation variants.

Gender is expressed differently across different languages. For example, in English, the word lawyer could refer to either a male or female individual, but in Spanish, abogada would refer to a female lawyer, while abogado would refer to a male one. In the absence of information about the gender of a noun like ‘lawyer’ in a source sentence, MT models may resort to selecting an arbitrary gender for the noun in the target language. Often, these arbitrary gender assignments align with stereotypes, perpetuating harmful societal bias (Stanovsky et al., 2019; Ciora et al., 2021) and leading to translations that are not fully accurate.

In the example below, you notice that while translating gender-neutral sentences from English to Spanish, the translated text follows the stereotypical gender role, i.e., lawyer is translated as being male.

Translation with gender bias
Screenshot of translation of English text “Let’s get our lawyer’s opinion on this issue.” into Spanish language having gender bias.

As there is no context in the source sentence that implies the gender of the lawyer, producing a translation with the assumption of either a male or female lawyer would both be valid. Now, Bing Translator produces translations with both feminine and masculine forms.

Translation of gender ambiguous English Text into Spanish
Screenshot of translation of English text “Let’s get our lawyer’s opinion on this issue.” into Spanish language having gender specific translations.

System design

We aimed to design our system to meet the following key criteria for providing gendered alternatives:

  1. The feminine and masculine variants should have minimal differences except for those needed to convey gender.
  2. We wanted to cover a wide range of sentences where multiple gendered alternatives are possible.
  3. We wanted to ensure that the translations preserve the meaning of the original source sentence.

Detecting gender ambiguity

In order to accurately detect gender ambiguity in source text, we utilize a coreference model to analyze inputs containing animate nouns. For instance, if a given input text contains a gender-neutral profession word, we only want provide gendered alternatives for it when its gender can’t be determined by other information in the sentence. For example: On translating an English sentence “The lawyer met her driver at the hotel lobby.” into French we can determine that the lawyer is female, while the gender of the driver is unknown.

Translation of gender ambiguous English Text into French
Screenshot of translation of English text “The lawyer met her driver at the hotel lobby.” into French language.

Generating alternate translation

When the source sentence is ambiguously gendered, we examine our translation system’s output to decide if an alternative gender interpretation is possible. If so, we proceed to determine the best way to revise the translation. We begin by constructing a set of candidate target translations by rewriting the original translation. We apply linguistic constraints based on dependency relations to ensure consistency in the proposed alternatives and prune the erroneous candidates.

However, in many cases, even after applying our constraints, we are left with multiple candidate rewrites for the gendered alternative translation. To determine the best option, we evaluate each candidate by scoring it with our translation model. By leveraging the fact that a good gender rewrite will also be an accurate translation of the source sentence, we are able to ensure high accuracy in our final output.

System design of gender re-inflection
A diagram showing system design of gender re-inflection.

Leveraging managed online endpoints in Azure Machine Learning

The gendered alternative feature in Bing is hosted on managed online endpoints in Azure Machine Learning. Managed online endpoints provide a unified interface to invoke and manage model deployments on Microsoft-managed compute in a turnkey manner. They enable us to take advantage of scalable and reliable endpoints without being concerned about infrastructure management. This inference environment also enables the processing of large numbers of requests with low latency. Our ability to create and deploy the gender debias service with the latest frameworks and technologies has been greatly improved through the use of managed inference features in Azure Machine Learning. By leveraging these features, we have been able to maintain low COGS (Cost of Goods Sold) and ensure straightforward security and privacy compliance.

How can you contribute?

To facilitate progress in gender bias reduction in MT, we are releasing a test corpus containing gender-ambiguous translation examples from English into Spanish, French and Italian. Each English source sentence is accompanied by multiple translations, covering each possible gender variation.

Our test set is constructed to be challenging, morphologically rich and linguistically diverse. This corpus has been instrumental in our development process. It was developed with the help of a bilingual linguists with significant translation experience. We are also releasing a technical paper that discusses the test corpus in detail and the methodology and tools for evaluation.

GATE: A challenge set for Gender-Ambiguous Translation Examples – Paper

GATE: A challenge set for Gender-Ambiguous Translation Examples – Test set

Path forward

Through this work we aim to improve the quality of MT output in cases of ambiguous source gender, as well as facilitate the development of better and more inclusive natural language processing (NLP) tools in general. Our initial release focuses on translating from English to Spanish, French, and Italian. Going forward, we plan to expand to new language pairs, as well as cover additional scenarios and types of biases.

Credits:

Ranjita Naik, Spencer Rarrick, Sundar Poudel, Varun Mathur, Jeshwanth Kumar Chandrala, Charan Mohan, Lee Schwartz, Steven Nguyen, Amit Bhagwat, Vishal Chowdhary.

The post Bing’s gendered translations tackle bias in translation appeared first on Microsoft Translator Blog.

]]>
Breakthrough Z-Code Mixture of Experts models now live in Translator http://approjects.co.za/?big=en-us/translator/blog/2022/03/22/breakthrough-z-code-mixture-of-experts-models-now-live-in-translator/ Tue, 22 Mar 2022 17:06:44 +0000 http://approjects.co.za/?big=en-us/translator/blog/?p=9471 Translator is now adopting Z-Code models, a breakthrough AI technology that significantly improves the quality of production translation models.  Z-code models utilize a new architecture called Mixture of Experts (MoE) which enables models to learn to translate between multiple languages at the same time.  This opens the way to high quality machine translation beyond high-resource languages and improves the quality....

The post Breakthrough Z-Code Mixture of Experts models now live in Translator appeared first on Microsoft Translator Blog.

]]>
Translator is now adopting Z-Code models, a breakthrough AI technology that significantly improves the quality of production translation models.  Z-code models utilize a new architecture called Mixture of Experts (MoE) which enables models to learn to translate between multiple languages at the same time.  This opens the way to high quality machine translation beyond high-resource languages and improves the quality of low-resource languages that lack significant training data.

Z-code models are available now by invitation to customers using the Document Translation feature and it will be made available to all customers and to other Translator products in phases.  Please fill out this form to request access to Document Translation using Z-code models.

You can read more about this news in the Microsoft AI announcement blog and the Microsoft Research blog.

The post Breakthrough Z-Code Mixture of Experts models now live in Translator appeared first on Microsoft Translator Blog.

]]>
Multilingual translation at scale: 10000 language pairs and beyond http://approjects.co.za/?big=en-us/translator/blog/2021/11/22/multilingual-translation-at-scale-10000-language-pairs-and-beyond/ Mon, 22 Nov 2021 18:26:31 +0000 http://approjects.co.za/?big=en-us/translator/blog/?p=9381 Microsoft is on a quest for AI at Scale with high ambition to enable the next generation of AI experiences. The Microsoft Translator ZCode team is working together with Microsoft Project Turing and Microsoft Research Asia to advance language and multilingual support at the core of this initiative. We continue to push frontiers with Multilingual models to support various language....

The post Multilingual translation at scale: 10000 language pairs and beyond appeared first on Microsoft Translator Blog.

]]>

Microsoft is on a quest for AI at Scale with high ambition to enable the next generation of AI experiences. The Microsoft Translator ZCode team is working together with Microsoft Project Turing and Microsoft Research Asia to advance language and multilingual support at the core of this initiative. We continue to push frontiers with Multilingual models to support various language scenarios across Microsoft. Last summer, we announced our large scale Multi-Lingual Mixture of Expert model with DeepSpeed that can outperform individual large scale bi-lingual models. Recently, the latest Turing universal language representation model (T-ULRv5), a Microsoft-created model is once again the state of the art and at the top of the Google XTREME public leaderboard at that time. More recently, Microsoft announced the largest Megatron-Turing NLG 530B parameters model.

The annual Conference on Machine Translation (aka WMT 2021) concluded last week in beautiful Punta Cana, Dominican Republic. WMT brings together researchers from across the entire Machine Translation field, both industry and academia, to participate in a series of shared tasks, each defining a benchmark in an important area of machine translation to push the field into new frontiers.

The Microsoft Translator ZCode team, working together with Turing team and Microsoft Research Asia, competed in the “Large-scale Multilingual Translation” track, which consisted of a Full Task of translating between all 10,000 directions across 101 languages, and two Small tasks: One focused on 5 central and southern European languages, and one on 5 south-east Asian languages. The Microsoft ZCode-DeltaLM model won all three tasks by huge margins, including an incredible 10+ point gain over the M2M100 model in the large task evaluated on a massive 10,000 language pairs. (Findings of the WMT 2021 Shared Task on Large-Scale Multilingual Machine Translation, Wenzek et al, WMT 2021).

Figure 1: Official Results (BLEU scores) on the Full-Task and the Small-Task1 at the WMT 2021 Large Scale Multilingual Translation shared task

The ZCode-DeltaLM approach

In this blog post, let’s take a look under the hood at the winning Microsoft ZCode-DeltaLM model. Our starting point was DeltaLM (DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders), the latest in the increasingly powerful series of massively multilingual pretrained language models from Microsoft.


DeltaLM is an encoder-decoder model, but instead of training from scratch, it is initialized from a previously pretrained state-of-the-art encoder-only model, specifically (TULRv3). While initializing the encoder is straightforward, the decoder is less so, since it adds cross-attention to the encoder’s self-attention. DeltaLM solves this problem with a novel interleaved architecture, where the self-attention and cross-attention alternate between layers, with the self-attention used in the odd layers and cross-attention used in the even layers. With this interleaving, the decoder structure matches the encoder, and so it can also be initialized the same way from TULRv3.

DeltaLM is augmented by ZCode powerful multitask learning: Multi-task Learning for Multilingual Neural Machine Translation. Our models show that combining multitask and multilingual learning can significantly improve training for large scale pretrained language models. Such multitask multilingual learning paradigm is leveraging the inductive bias and regularization from several tasks and languages simultaneously to perform better on various downstream tasks. We are using translation task, denoising auto encoder task and translation span corruption task as shown in the figure below.

Winning the massively multilingual translation track

To build our winning massively multilingual translation system (Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task), we started with zCode-DeltaLM, and added a few tricks.

We apply progressive learning, first training a model with 24 encoder layers and 12 decoder layers, then continue training with 12 added encoder layers, resulting in a deep 36 layer encoder. To cover all language pairs, we generate dual-pseudo-parallel data where both sides of the parallel data are synthetic, translated by the model from English. We also apply iterative back-translation to generate synthetic data. We apply curriculum learning, starting with the entire noisy training data, then reducing it to a clean subset. We re-weight the translation objective to favor parallel data over the back-translation and dual-pseudo-parallel data. We apply temperature sampling to balance across language pairs. For each language pair, we choose, based on the dev set, whether to prefer direct translation or pivot translation through English.

Putting it all together, we knew we had an amazing massively multilingual system, but the official results on the blind test set exceeded our expectations. We scored 2.5 to 9 BLEU ahead of the next competitor, and 10 to 21 BLEU points ahead of the baseline M2M-175 model. On the dev test we compared against the larger M2M-615 model, which we also beat by 10 to 18 points.

Beyond Translation: Universal Language Generation

While we are excited about the big win at WMT 2021, what’s even more exciting is that unlike the other competitors, our ZCode-DeltaLM model is not just a translation model, but rather a general pretrained encoder-decoder language model, usable for all kinds of generation tasks beyond translation. This really enable our models to perform quite well on various multilingual natural language generation tasks.

We reached a new SOTA in many popular generation tasks from GEM Benchmark, including Wikilingua (summarization), Text simplification (WikiAuto), and structure-to-text (WebNLG). The DeltaLM-ZCode model widely outperform much larger models such as mT5 XL (3.7B) which is also trained on much larger data as well. This demonstrated the efficiency and versatility of the models leading to strong performance across many tasks.

Figure 2. Performance (RL scores) of ZCode-DeltaLM on the Summarization and Text Simplification tasks in the GEM benchmark

Looking Ahead

Multilingual Machine Translation has reached a point where it performs very well, exceeding bilingual systems, on both low and high resource languages. Mixture of Experts (MoE) models have been shown to be a very good fit to scale up such models as has been shown in GShard. We explore how to efficiently scale such models with Mixture of Experts: Scalable and Efficient MoE Training for Multitask Multilingual Models. MoE models with massive multilingual data and unsupervised multitask training present unprecedent opportunity for such models to provide truly universal systems that can further enable the Microsoft Translator team to eliminate language barriers across the world, as well as support a variety of natural language generation tasks.

Acknowledgements

We would like to acknowledge and thank Francisco Guzman & his team who collected the massively multilingual FLORES test set and organized this WMT track with such large scale evaluation.

The post Multilingual translation at scale: 10000 language pairs and beyond appeared first on Microsoft Translator Blog.

]]>
Translator now translates more than 100 languages http://approjects.co.za/?big=en-us/translator/blog/2021/10/11/translator-now-translates-more-than-100-languages/ Mon, 11 Oct 2021 16:12:58 +0000 http://approjects.co.za/?big=en-us/translator/blog/?p=9278 Today, we added 12 new languages and dialects to the Microsoft Translator service– Bashkir, Dhivehi, Georgian, Kyrgyz, Macedonian, Mongolian (Cyrillic), Mongolian (Traditional), Tatar, Tibetan, Turkmen, Uyghur, and Uzbek (Latin)—bringing the total number of languages available in Translator to 103. You can read more about this news in the Microsoft AI announcement blog, the Microsoft Research blog, and the Azure Tech....

The post Translator now translates more than 100 languages appeared first on Microsoft Translator Blog.

]]>
Today, we added 12 new languages and dialects to the Microsoft Translator service– Bashkir, Dhivehi, Georgian, Kyrgyz, Macedonian, Mongolian (Cyrillic), Mongolian (Traditional), Tatar, Tibetan, Turkmen, Uyghur, and Uzbek (Latin)—bringing the total number of languages available in Translator to 103.

You can read more about this news in the Microsoft AI announcement blog, the Microsoft Research blog, and the Azure Tech Community Blog.

The post Translator now translates more than 100 languages appeared first on Microsoft Translator Blog.

]]>
Microsoft Translator releases literary Chinese translation http://approjects.co.za/?big=en-us/translator/blog/2021/08/25/microsoft-translator-releases-literary-chinese-translation/ Thu, 26 Aug 2021 00:11:04 +0000 http://approjects.co.za/?big=en-us/translator/blog/?p=9268 When reading ancient Chinese poetry, we often marvel at the very wonderful words ancient writers could use to describe people, events, objects, and scenes. This is a splendid cultural treasure that has been left behind for us. However, similar to Shakespeare’s verses in the English language, the literary Chinese used by these poets is often difficult for modern day people to understand, and the meanings....

The post Microsoft Translator releases literary Chinese translation appeared first on Microsoft Translator Blog.

]]>
When reading ancient Chinese poetry, we often marvel at the very wonderful words ancient writers could use to describe people, events, objects, and scenes. This is a splendid cultural treasure that has been left behind for us. However, similar to Shakespeares verses in the English language, the literary Chinese used by these poets is often difficult for modern day people to understand, and the meanings and subtleties embedded within it are frequently lost.  

To solve this problem, researchers at Microsoft Research Asia adopted the latest neural machine translation techniques to train direct translation models between literary Chinese and modern Chinese, which also results in creating translation capabilities between literary Chinese and more than 90 other languages and dialects in Microsoft Translator. Currently, literary Chinese translation has been integrated into the Microsoft Translator app, Azure Cognitive Services Translator, and a number of Microsoft products that are supported by Microsoft Translator services. 

Image: The painting from “West Mountain in Misty Rain” by Shen Zhou, Ming Dynasty. The ancient Chinese poem on the painting is from Yong Liu, Northern Song Dynasty. The poem depicts the spring scenery in southern China during the Qingming Festival and the prosperity of social life.

Enabling more people to appreciate the charm of traditional Chinese culture 

Literary Chinese is an important carrier of traditional Chinese culture. Voluminous books and texts from the ancient times have recorded Chinas rich and profound culture over the past five thousand years. The thoughts and wisdom accumulated and contained in them are worthy of continuous exploration and thinking.  

With the help of machine translation, tourists can now understand ancient Chinese texts and poems written on historic buildings and monuments, students now have an extra tool to help them learn Chinese, and researchers who are engaged in collating and translating ancient texts can be more productive.     

Dongdong Zhang, a principal researcher at Microsoft Research Asia, said, “From a technical perspective, literary Chinese can be regarded as a separate language. Once translation between literary Chinese and modern Chinese is realized, the translation between literary Chinese and other languages such as English, French, and German becomes a matter of course.”  

Biggest difficulty of literary Chinese translation AI model: Little training data 

The most critical element of AI model training is data. Only when data volume is large enough and its quality high enough can you train a more accurate model. In machine translation, the training of the model requires bilingual data: original text data and target language data. The translation of literary Chinese is very special, as it’s not a language used in daily life.  Therefore, compared with the translation of other languages, the training data of literary Chinese translation is very small, which is not conducive to the training of machine translation models.   

Although Microsoft Research Asia researchers collected a lot of publicly available literary and modern Chinese data in the early stages, the original data cannot be directly used. Data cleaning needs to be conducted to normalize data from different sources, various formats, as well as full-width/half-width punctuations, as a means to minimize the interference of invalid data on model training. In this way, the actual available high-quality data is further reduced.  

According to Shuming Ma, a researcher at Microsoft Research Asia, in order to reduce the data sparseness issue, researchers have conducted a great amount of data synthesis and augmentation work, including: 

First, common character based alignment and expansion to increase training data size. Different from translations between Chinese and other languages such as English, French, Russian, etc., literary Chinese and modern Chinese use the same character set. Taking advantage of this feature, researchers at Microsoft Research Asia have used innovative algorithms to allow machine translation to recall common characters, conduct natural alignment, and then further expand to words, phrases, and short sentences, thereby synthesizing a large amount of usable data.  

Second, deform sentence structure to improve the robustness of machine translation. Regarding breaks in texts and poems, researchers have added a number of variants to make machines more comprehensive in learning ancient poems. For people, even when they see a sentence that is structured abnormally, such as a poem segmented into lines based on rhythm rather than full sentences, they can still put the parts together and understand it. But for a translation model that has never seen such segmentation before, it will likely be confused. Therefore, transformation of data format can not only expand the amount of training data, but also improve the robustness of the translation model training.  

Third, conduct traditional and simplified character translation training to increase model adaptability. In Chinese, traditional characters exist in both literary and modern Chinese. When researchers trained the model, in order to improve the adaptability of the model, they not only leveraged data in simplified Chinese, but also added data in traditional Chinese, as well as data mixed with traditional and simplified characters. Thus, the model can understand both the traditional and simplified contents, which leads to more accurate translation results.   

Fourth, increase the training of foreign-language words to improve the accuracy of translation. When translating modern Chinese into literary Chinese, there are often modern words derived from foreign-language words and new words that have never appeared in ancient Chinese, such as “Microsoft”, “computer”, “high-speed rail”, and many others like it. To deal with this issue, researchers trained a small model to recognize entities. The model first translated the meaning of the word outside the entity, then filled the entity back in to ensure the accuracy of the machines processing of the foreign words.    

Image: The literary Chinese translation process

In addition, for informal writing styles such as blogs, forums, Weibo, and so on, the machine translation model has been trained specifically to further improve the robustness of translation between modern and literary Chinese.  

Dongdong Zhang expressed, “Based on the current translation system, we will continue to enrich the data set and improve the model training method to make it more robust and versatile. In the future, the method may not only be used for literary Chinese translation, but can also be extended to other application scenarios.” 

The post Microsoft Translator releases literary Chinese translation appeared first on Microsoft Translator Blog.

]]>
Neural Machine Translation Enabling Human Parity Innovations In the Cloud http://approjects.co.za/?big=en-us/translator/blog/2019/06/17/neural-machine-translation-enabling-human-parity-innovations-in-the-cloud/ Mon, 17 Jun 2019 16:56:06 +0000 http://approjects.co.za/?big=en-us/translator/blog/?p=8705 In March 2018 we announced (Hassan et al. 2018) a breakthrough result where we showed for the first time a Machine Translation system that could perform as well as human translators (in a specific scenario – Chinese-English news translation). This was an exciting breakthrough in Machine Translation research, but the system we built for this project was a complex, heavyweight research system, incorporating multiple cutting-edge techniques. While we....

The post Neural Machine Translation Enabling Human Parity Innovations In the Cloud appeared first on Microsoft Translator Blog.

]]>
In March 2018 we announced (Hassan et al. 2018) a breakthrough result where we showed for the first time a Machine Translation system that could perform as well as human translators (in a specific scenario – Chinese-English news translation). This was an exciting breakthrough in Machine Translation research, but the system we built for this project was a complex, heavyweight research system, incorporating multiple cutting-edge techniques. While we released the output of this system on several test sets, the system itself was not suitable for deployment in a real-time machine translation cloud API.

Today we are excited to announce the availability in production of our latest generation of neural Machine Translation models. These models incorporate most of the goodness of our research system and are now available by default when you use the Microsoft Translator API. These new models are available today in Chinese, German, French, Hindi, Italian, Spanish, Japanese, Korean, and Russian, from and to English. More languages are coming soon.

Getting from Research Paper to Cloud API

Over the past year, we have been looking for ways to bring much of the quality of our human-parity system into the Microsoft Translator API, while continuing to offer low-cost real-time translation. Here are some of the steps on that journey.

Teacher-Student Training

Our first step was to switch to a “teacher-student” framework, where we train a lightweight real-time student to mimic a heavyweight teacher network (Ba and Caruana 2014). This is accomplished by training the student not on the parallel data that MT systems are usually trained on, but on translations produced by the teacher (Kim and Rush 2016). This is a simpler task than learning from raw data, and allows a shallower, simpler student to very closely follow the complex teacher. As one might expect, our initial attempts still suffered quality drops from teacher to student (no free lunch!), but we nevertheless took first place in the WNMT 2018 Shared Task on Efficient Decoding (Junczys-Dowmunt et al. 2018a). Some particularly exciting results from this effort were that Transformer (Vaswani et al. 2017) models and their modifications play well with teacher-student training and are astoundingly efficient during inference on the CPU.

Learning from these initial results and after a lot of iteration we discovered a recipe that allows our simple student to have almost the same quality as the complex teacher (sometimes there is a free lunch after all?). Now we were free to build large, complex teacher models to maximize quality, without worrying about real-time constraints (too much).

Real-time translation

Our decision to switch to a teacher-student framework was motivated by the great work by Kim and Rush (2016) for simple RNN-based models. At that point it was unclear if the reported benefits would manifest for Transformer models as well (see Vaswani et al. 2017 for details on this model). However, we quickly discovered that this was indeed the case.

The Transformer student could use a greatly simplified decoding algorithm (greedy search) where we just pick the single best translated word at each step, rather than the usual method (beam-search) which involves searching through the huge space of possible translations. This change had minimal quality impact but led to big improvements in translation speed. By contrast, a teacher model would suffer a significant drop in quality when switching from beam-search to greedy-search.

At the same time, we realized that rather than using the latest neural architecture (Transformer with self-attention) in the decoder, the student could be modified to use a drastically simplified and faster recurrent (RNN) architecture. This matters because while the Transformer encoder can be computed over the whole source sentence in parallel, the target sentence is generated a single word at a time, so the speed of the decoder has a big impact on the overall speed of translation. Compared to self-attention, the recurrent decoder reduces algorithmic complexity from quadratic to linear in target sentence length. Especially in the teacher-student setting, we saw no loss in quality due to these modifications, neither for automatic nor for human evaluation results. Several additional improvements such as parameter sharing led to further reductions in complexity and increased speed.

Another advantage of the teacher-student framework we were very excited to see is that quality improvements over time of the ever growing and changing teachers are easily carried over to a non-changing student architecture. In cases where we saw problems in this regard, slight increases in student model capacity would close the gap again.

Dual Learning

The key insight behind dual learning (He et al. 2016) is the “round-trip translation” check that people sometimes use to check translation quality. Suppose we’re using an online translator to go from English to Italian. If we don’t read Italian, how do we know if it’s done a good job? Before clicking send on an email, we might choose to check the quality by translating the Italian back to English (maybe on a different web site). If the English we get back has strayed too far from the original, chances are one of the translations went off the rails.

Dual learning uses the same approach to train two systems (e.g. English->Italian and Italian->English) in parallel, using the round-trip translation from one system to score, validate and train the other system.

Dual learning was a major contributor to our human-parity research result. In going from the research system to our production recipe, we generalized this approach broadly. Not only did we co-train pairs of systems on each other’s output, we also used the same criterion for filtering our parallel data.

Cleaning up inaccurate data

Machine translation systems are trained on “parallel data”, i.e. pairs of documents that are translations of each other, ideally created by a human translator. As it turns out, this parallel data is often full of inaccurate translations. Sometimes the documents are not truly parallel but only loose paraphrases of each other. Human translators can choose to leave out some source material or insert additional information. The data can contain typos, spelling mistakes, grammatical errors. Sometimes our data mining algorithms are fooled by similar but non-parallel data, or even by sentences in the wrong language. Worst of all, a lot of the web pages we see are spam, or may in fact be machine translations rather than human translations. Neural systems are very sensitive to this kind of inaccuracy in the data. We found that building neural models to automatically identify and get rid of these inaccuracies gave strong improvements in the quality of our systems. Our approach to data filtering resulted in the first place in the WMT18 parallel corpus filtering benchmark (Junczys-Dowmunt 2018a) and helped build one of the strongest English-German translation systems in the WMT18 News translation task (Junczys-Dowmunt 2018b). We used improved versions of this approach in the production systems we released today.

Factored word representations

When moving a research technology to production, several real-world challenges arise. Getting numbers, dates, times, capitalization, spacing, etc. right matters a lot more in production than in a research system.

Consider the challenge of capitalization. If we’re translating the sentence “WATCH CAT VIDEOS HERE”. We know how to translate “cat”. We would want to translate “CAT” the same way. But now consider “Watch US soccer here”. We don’t want to confuse the word “us” and the acronym “US” in this context.

To handle this, we used an approach known as factored machine translation (Koehn and Hoang 2007Sennrich and Haddow 2016) which works as follows. Instead of a single numeric representation (“embedding”) for “cat” or “CAT”, we use multiple embeddings, known as “factors”. In this case, the primary embedding would be the same for “CAT” and “cat” but a separate factor would represent the capitalization, showing that it was all-caps in one instance but lowercase in the other. Similar factors are used on the source and the target side.

We use similar factors to handle word fragments and spacing between words (a complex issue in non-spacing or semi-spacing languages such as Chinese, Korean, Japanese or Thai).

Factors also dramatically improved translation of numbers, which is critical in many scenarios. Number translation is mostly an algorithmic transformation. For example, 1,234,000 can be written as 12,34,000 in Hindi, 1.234.000 in German, and 123.4万 in Chinese. Traditionally, numbers are represented like words, as groups of characters of varying length. This makes it hard for machine learning to discover the algorithm. Instead, we feed every single digit of a number separately, with factors marking beginning and end. This simple trick robustly and reliably removed nearly all number-translation errors.

Faster model training

When we’re training a single system towards a single goal, as we did for the human-parity research project, we expect to throw vast numbers of hardware at models that take weeks to train. When training production models for 20+ language pairs, this approach becomes untenable. Not only do we need reasonable turn-around times, but we also need to moderate our hardware demands. For this project, we made a number of performance improvements to Marian NMT (Junczys-Dowmunt et al. 2018b).

Marian NMT is the open-source Neural MT toolkit that Microsoft Translator is based on. Marian is a pure C++ neural machine translation toolkit, and, as a result, extremely efficient, not requiring GPUs at runtime, and very efficient at training time

Due to its self-contained nature, it is quite easy to optimize Marian for NMT specific tasks, which results in one of the most efficient NMT toolkits available. Take a look at the benchmarks. If you are interested in Neural MT research and development, please join and contribute to the community on Github.

Our improvements concerning mixed-precision training and decoding, as well as for large model training will soon be made available in the public Github repository.

We are excited about the future of neural machine translation. We will continue to roll out the new model architecture to the remaining languages and Custom Translator throughout this year. Our users will automatically get the significantly better-quality translations through the Translator API, our Translator app, Microsoft Office, and the Edge browser. We hope the new improvements help your personal and professional lives and look forward to your feedback.

 

References

  • Jimmy Ba and Rich Caruana. 2014. Do Deep Nets Really Need to be Deep? Advances in Neural Information Processing Systems 27. Pages 2654-2662. https://papers.nips.cc/paper/5484-do-deep-nets-really-need-to-be-deep
  • Hany Hassan, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis, Mu Li, Shujie Liu, Tie-Yan Liu, Renqian Luo, Arul Menezes, Tao Qin, Frank Seide, Xu Tan, Fei Tian, Lijun Wu, Shuangzhi Wu, Yingce Xia, Dongdong Zhang, Zhirui Zhang, Ming Zhou. 2018. Achieving Human Parity on Automatic Chinese to English News Translation. http://arxiv.org/abs/1803.05567
  • He, Di and Xia, Yingce and Qin, Tao and Wang, Liwei and Yu, Nenghai and Liu, Tie-Yan and Ma, Wei-Ying. 2016. Dual Learning for Machine Translation. Advances in Neural Information Processing Systems 29. Pages 820-828. https://papers.nips.cc/paper/6469-dual-learning-for-machine-translation
  • Marcin Junczys-Dowmunt. 2018a. Dual Conditional Cross-Entropy Filtering of Noisy Parallel Corpora. Proceedings of the Third Conference on Machine Translation: Shared Task Papers. Belgium, pages 888-895. https://aclweb.org/anthology/papers/W/W18/W18-6478/
  • Marcin Junczys-Dowmunt. 2018b. Microsoft’s Submission to the WMT2018 News Translation Task: How I Learned to Stop Worrying and Love the Data. Proceedings of the Third Conference on Machine Translation: Shared Task Papers. Belgium, pages 425-430. https://www.aclweb.org/anthology/W18-6415/
  • Marcin Junczys-Dowmunt, Kenneth Heafield, Hieu Hoang, Roman Grundkiewicz, Anthony Aue. 2018a. Marian: Cost-effective High-Quality Neural Machine Translation in C++. Proceedings of the 2nd Workshop on Neural Machine Translation and Generation. Melbourne, Australia, pages 129-135. https://aclweb.org/anthology/papers/W/W18/W18-2716/
  • Marcin Junczys-Dowmunt, Roman Grundkiewicz, Tomasz Dwojak, Hieu Hoang, Kenneth Heafield, Tom Neckermann, Frank Seide, Ulrich Germann, Alham Fikri Aji, Nikolay Bogoychev, André F. T. Martins, Alexandra Birch. 2018b. Marian: Fast Neural Machine Translation in C++. Proceedings of ACL 2018, System Demonstrations. Melbourne, Australia, pages 116-121. https://www.aclweb.org/anthology/P18-4020/
  • Yoon Kim and Alexander M. Rush. 2016. Sequence-level knowledge distillation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016, pages 1317–1327. https://aclweb.org/anthology/papers/D/D16/D16-1139/
  • Philipp Koehn, Hieu Hoang. 2007. Factored Translation Models. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). Prague, Czech Republic, pages 868-876. https://www.aclweb.org/anthology/D07-1091/
  • Rico Sennrich, Barry Haddow. 2016. Linguistic Input Features Improve Neural Machine Translation. Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers. Berlin, Germany, pages 83-91. https://www.aclweb.org/anthology/W16-2209/
  • Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, Lukasz and Polosukhin, Illia. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30. Pages 5998-6008. https://papers.nips.cc/paper/7181-attention-is-all-you-need

The post Neural Machine Translation Enabling Human Parity Innovations In the Cloud appeared first on Microsoft Translator Blog.

]]>