{"id":394715,"date":"2017-06-29T07:46:23","date_gmt":"2017-06-29T14:46:23","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&p=394715"},"modified":"2018-10-16T19:59:40","modified_gmt":"2018-10-17T02:59:40","slug":"distant-supervision-cancer-pathway-extraction-text","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/distant-supervision-cancer-pathway-extraction-text\/","title":{"rendered":"Distant Supervision for Cancer Pathway Extraction from Text"},"content":{"rendered":"

Biological pathways are central to understanding complex diseases such as cancer. The majority\u00a0of this knowledge is scattered in the vast and rapidly growing research literature. To automate\u00a0knowledge extraction, machine learning approaches typically require annotated examples, which are\u00a0expensive and time-consuming to acquire. Recently, there has been increasing interest in leveraging\u00a0databases for distant supervision in knowledge extraction, but existing applications focus almost exclusively
\non newswire domains. In this paper, we present the first attempt to formulate the distant\u00a0supervision problem for pathway extraction and apply a state-of-the-art method to extracting pathway\u00a0interactions from PubMed abstracts. Experiments show that distant supervision can effectively\u00a0compensate for the lack of annotation, attaining an accuracy approaching supervised results. From\u00a022 million PubMed abstracts, we extracted 1.5 million pathway interactions at a precision of 25%.\u00a0More than 10% of interactions are mentioned in the context of one or more cancer types, analysis\u00a0of which yields interesting insights.<\/p>\n","protected":false},"excerpt":{"rendered":"

Biological pathways are central to understanding complex diseases such as cancer. The majority\u00a0of this knowledge is scattered in the vast and rapidly growing research literature. To automate\u00a0knowledge extraction, machine learning approaches typically require annotated examples, which are\u00a0expensive and time-consuming to acquire. Recently, there has been increasing interest in leveraging\u00a0databases for distant supervision in knowledge extraction, […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13553],"msr-publication-type":[193716],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-394715","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-medical-health-genomics","msr-locale-en_us"],"msr_publishername":"","msr_edition":"Pacific Symposium on Biocomputing, 2015","msr_affiliation":"","msr_published_date":"2015-01-04","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"386951","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","title":"psb15","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/05\/psb15.pdf","id":386951,"label_id":0}],"msr_related_uploader":"","msr_attachments":[],"msr-author-ordering":[{"type":"user_nicename","value":"hoifung","user_id":32016,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=hoifung"},{"type":"text","value":"Kristina Toutanova","user_id":0,"rest_url":false},{"type":"user_nicename","value":"chrisq","user_id":31430,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=chrisq"}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[952050],"msr_project":[393227],"publication":[],"video":[],"download":[],"msr_publication_type":"inproceedings","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/394715"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/394715\/revisions"}],"predecessor-version":[{"id":394718,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/394715\/revisions\/394718"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=394715"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=394715"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=394715"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=394715"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=394715"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=394715"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=394715"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=394715"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=394715"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=394715"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=394715"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=394715"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=394715"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=394715"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=394715"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}