{"id":276609,"date":"2016-08-15T08:57:03","date_gmt":"2016-08-15T15:57:03","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&p=276609"},"modified":"2018-10-16T21:20:15","modified_gmt":"2018-10-17T04:20:15","slug":"experiments-cross-lingual-systems-synthesis-code-mixed-text","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/experiments-cross-lingual-systems-synthesis-code-mixed-text\/","title":{"rendered":"Experiments with Cross-lingual Systems for Synthesis of Code-Mixed Text"},"content":{"rendered":"

Most Text to Speech (TTS) systems today assume that the input
\nis in a single language written in its native script, which is the
\nlanguage that the TTS database is recorded in. However, due
\nto the rise in conversational data available from social media,
\nphenomena such as code-mixing, in which multiple languages
\nare used together in the same conversation or sentence are now
\nseen in text. TTS systems capable of synthesizing such text
\nneed to be able to handle multiple languages at the same time,
\nand may also need to deal with noisy input. Previously, we
\nproposed a framework to synthesize code-mixed text by using
\na TTS database in a single language, identifying the language
\nthat each word was from, normalizing spellings of a language
\nwritten in a non-standardized script and mapping the phonetic
\nspace of mixed language to the language that the TTS database
\nwas recorded in. We extend this cross-lingual approach to more
\nlanguage pairs, and improve upon our language identification
\ntechnique. We conduct listening tests to determine which of the
\ntwo languages being mixed should be used as the target language.
\nWe perform experiments for code-mixed Hindi-English
\nand German-English and conduct listening tests with bilingual
\nspeakers of these languages. From our subjective experiments
\nwe find that listeners have a strong preference for cross-lingual
\nsystems with Hindi as the target language for code-mixed Hindi
\nand English text. We also find that listeners prefer cross-lingual
\nsystems in English that can synthesize German text for codemixed
\nGerman and English text.<\/p>\n","protected":false},"excerpt":{"rendered":"

Most Text to Speech (TTS) systems today assume that the input is in a single language written in its native script, which is the language that the TTS database is recorded in. However, due to the rise in conversational data available from social media, phenomena such as code-mixing, in which multiple languages are used together […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13545],"msr-publication-type":[193716],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-276609","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-human-language-technologies","msr-locale-en_us"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2016-09-15","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"276618","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","title":"ssw9_codemix_submit","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/08\/ssw9_codemix_submit.pdf","id":276618,"label_id":0}],"msr_related_uploader":"","msr_attachments":[],"msr-author-ordering":[{"type":"text","value":"Sunayana Sitaram","user_id":0,"rest_url":false},{"type":"text","value":"Sai Krishna Rallabandi","user_id":0,"rest_url":false},{"type":"text","value":"Shruti Rijhwani","user_id":0,"rest_url":false},{"type":"text","value":"Alan W Black","user_id":0,"rest_url":false}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[144940],"msr_project":[],"publication":[],"video":[],"download":[],"msr_publication_type":"inproceedings","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/276609"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":2,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/276609\/revisions"}],"predecessor-version":[{"id":535011,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/276609\/revisions\/535011"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=276609"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=276609"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=276609"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=276609"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=276609"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=276609"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=276609"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=276609"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=276609"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=276609"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=276609"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=276609"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=276609"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=276609"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=276609"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}