Currently, over 6000 languages spoken across the world but only about 100 languages are supported by existing commercial MT tools. Neural Machine Translation is now the de facto approach for translating between highly resourced languages like English, French, German, etc. But there are thousands of underserved languages which require machine translation support in many real-world scenarios.
The major bottleneck—data. The non-availability of parallel corpus for a large number of languages prevents MT systems to be built for them. By combining the power of neural technology and innovative HCI, the Interactive Neural Machine Translation (opens in new tab) was developed. The INMT model was developed using the existing open-source MT framework, OPENNMT, based on some initial seed corpus of parallel sentences.
The INMT tool is designed to assist human translators with on-the-fly hints and suggestions, making the end-to-end translation process faster, more efficient, and creating high-quality translations. The data that has been generated through the translations create parallel datasets that can be fed to the NMT models to further improve the system. We have also developed a mobile-based offline version of INMT, INMT-Lite, to overcome low or no connectivity issues and improve access to mobile-only users.
We have been working with Pratham Books, Translators without Borders (TwB), and CGNet Swara on different translation projects. INMT came out of our work with Pratham books, who wants to keep a human-in-the-loop during translations to capture the nuances and the contextual knowledge. INMT has been used by Pratham books to assist and improve the speed of the translation of children’s storybooks.
TwB, a not for profit that provides translations in the areas of crisis relief, health, and education, is working on Congolese Swahili and French translations to make health information available to the people of the strife-ridden region in an appropriate language. TWB is using INMT for their translation efforts to create appropriate healthcare information in Congolese Swahili.
CGNet Swara, a citizen journalism portal that works with the Gondi speaking tribal population in central India, wants to build a Hindi-Gondi translation system that will allow them to make Hindi content available to the Gondi-speaking community. They have already collected some parallel data through the Karya crowdsourcing platform which we used to seed Hindi-Gondi MT models that would allow them now to use the INMT-lite mobile version to accelerate the data generation process.
Through this interactive machine translation tool, we hope to not only provide help for human-aided translations for underserved languages but also for certain use cases for highly resourced languages where a human in the loop can help aid translation activities.
The open-sourced web- and mobile-based tools are available on GitHub: