Downloads
Impact of Controlled Language on Machine-Translation Quality and Post-Editing Efforts
September 2007
Results from experiments conducted by Microsoft Research’s Machine Translation Incubation Team to investigate the impact of using good English (controlled language) on post-editing productivity—as well as on the overall quality of our statistical machine-translation system.
Size : 334104
Microsoft Research Asia Chinese Word-Segmentation Data Set
August 2007
A set of manually annotated Chinese word-segmentation data and specifications for training and testing a Chinese word-segmentation system for research purposes. The data was extracted from the People’s Daily, which we have licensed for commercial usage, and the annotation was…
Microsoft Research IME Corpus
December 2005
This download consists of data only: it provides a test data set for the task of Japanese character conversion for text input. The data set consists of: (1) reference files, which consist of Japanese sentences that are randomly extracted from…
Size : 4495451
Microsoft Research Paraphrase Corpus
March 2005
This download consists of data only: a text file containing 5800 pairs of sentences which have been extracted from news sources on the web, along with human annotations indicating whether each pair captures a paraphrase/semantic equivalence relationship. No more than…