{"id":763126,"date":"2021-07-24T00:19:32","date_gmt":"2021-07-24T07:19:32","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&#038;p=763126"},"modified":"2021-07-24T00:21:19","modified_gmt":"2021-07-24T07:21:19","slug":"prolinguist-program-synthesis-for-linguistics-and-nlp","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/prolinguist-program-synthesis-for-linguistics-and-nlp\/","title":{"rendered":"ProLinguist: Program Synthesis for Linguistics and NLP"},"content":{"rendered":"<p>We introduce ProLinguist, an approach that uses <em>program synthesis<\/em> to automatically synthesize explicit string transformation rules from input-output examples for NLP tasks. Our algorithm is able to learn rules not only where the output depends on the surrounding input context, but also stateful rules, where it also depends on the results of applying transformation rules to the input context. Our algorithms work for both small and large amounts of potentially noisy training data. Furthermore, the learning process, as well as the level of abstraction of the inferred rules, can be controlled by an expert by providing linguistic knowledge to ProLinguist in the form of a Domain Specific Language. We demonstrate ProLinguist on a variety of NLP tasks ranging from textbook phonology problems to a more complex grapheme-to-phoneme conversion for Hindi and Tamil, showing that it can produce interpretable rules from small amounts of training data.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We introduce ProLinguist, an approach that uses program synthesis to automatically synthesize explicit string transformation rules from input-output examples for NLP tasks. Our algorithm is able to learn rules not only where the output depends on the surrounding input context, but also stateful rules, where it also depends on the results of applying transformation rules [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13556],"msr-publication-type":[193716],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[246694],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-763126","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-locale-en_us","msr-field-of-study-artificial-intelligence"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2021-8-1","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/nsnli.github.io\/assets\/ProLinguist.pdf","label_id":"243109","label":0}],"msr_related_uploader":"","msr_attachments":[],"msr-author-ordering":[{"type":"text","value":"Partho Sarthi","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Monojit Choudhury","user_id":32996,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Monojit Choudhury"},{"type":"user_nicename","value":"Arun Iyer","user_id":36299,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Arun Iyer"},{"type":"user_nicename","value":"Suresh Parthasarathy","user_id":33762,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Suresh Parthasarathy"},{"type":"user_nicename","value":"Arjun Radhakrishna","user_id":39405,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Arjun Radhakrishna"},{"type":"user_nicename","value":"Sriram Rajamani","user_id":33711,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Sriram Rajamani"}],"msr_impact_theme":[],"msr_research_lab":[199562],"msr_event":[],"msr_group":[],"msr_project":[813607,810280],"publication":[],"video":[],"download":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":813607,"post_title":"AI meets PL","post_name":"program-learning","post_type":"msr-project","post_date":"2022-01-19 05:15:20","post_modified":"2022-01-19 10:38:01","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/program-learning\/","post_excerpt":"In this area of research, we broadly explore combining machine learning and program synthesis in various ways. This is an umbrella project that has spawned several projects exploring applications of such a combination in different areas. Heterogeneous data extraction framework (HDEF): This project explores the benefits of combining program synthesis with machine learning for structured information extraction.\u00a0We use machine learning models (\u201cML models\u201d) such as conditional random fields to get an initial labeling of potential&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/813607"}]}},{"ID":810280,"post_title":"Synthesis and Machine Learning for Heterogeneous Extraction","post_name":"synthesis-and-machine-learning-for-heterogeneous-extraction","post_type":"msr-project","post_date":"2022-01-08 04:53:17","post_modified":"2022-02-02 09:14:57","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/synthesis-and-machine-learning-for-heterogeneous-extraction\/","post_excerpt":"In this project, we present a way to combine techniques from the program synthesis and machine learning communities to extract structured information from heterogeneous data. Such problems arise in several situations such as extracting attributes from web pages, machine-generated emails, or from data obtained from multiple sources. Our goal is to extract a set of structured attributes from such data. We use machine learning models (\u201cML models\u201d) such as conditional random fields to get an&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/810280"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/763126","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/763126\/revisions"}],"predecessor-version":[{"id":763129,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/763126\/revisions\/763129"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=763126"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=763126"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=763126"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=763126"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=763126"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=763126"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=763126"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=763126"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=763126"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=763126"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=763126"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=763126"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=763126"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=763126"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=763126"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=763126"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}