{"id":293774,"date":"2016-09-18T01:18:59","date_gmt":"2016-09-18T08:18:59","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&#038;p=293774"},"modified":"2017-06-06T09:35:52","modified_gmt":"2017-06-06T16:35:52","slug":"conceptualization","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/conceptualization\/","title":{"rendered":"Conceptualization"},"content":{"rendered":"<p>The Conceptualization model aims to map text format entities into semantic concept categories with some probabilities, which may depend on the context texts of the entities. As an example, \u201cMicrosoft\u201d could be automatically mapped to \u201cSoftware Company\u201d and \u201cFortune 500 company\u201d etc. with some probabilities. It provides computers the common sense computing capability and make machines &#8220;aware&#8221; of the mental world of human beings, through which way machines can better understand human communication in text. In detail, conceptualization maps instances or short texts into a large auto learned concept space, which is a vector space, with human-level concept reasoning. It can be treated as both human understandable and machine understandable text embedding. Thus it provides us the capability of text concept tagging, short text semantic similarity computation etc. for text understanding. It can benefit various text processing applications including search engines, automatic question-answering, online advertising, recommendation systems and artificial intelligence system. For more information, please refer to our <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/concept.research.microsoft.com\/\">Microsoft Concept Graph <\/a>release page and our ACL 2016 tutorial &#8220;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.wangzhongyuan.com\/tutorial\/ACL2016\/Understanding-Short-Texts\/\">Understanding Short Texts<\/a>&#8220;.<\/p>\n<p><span class=\"spanhome\"><strong>1.Single instance conceptualization<\/strong> <\/span><\/p>\n<p>Single instance conceptualization can return a ranked list of automatically learned concept\/category names for any input entity mention\/instance. Each concept has a probability to denote the possibility of the input entity belonging to this concept. As a result, the input entity is represented as a numerical vector, which shows its distribution over the concept vector space.<\/p>\n<p>For human beings, given a single instance, this concept distribution often forms automatically and subconsciously. More importantly, those categories at the appropriate level rank higher. Psychologists and linguists call it as <strong>Basic-level Categorization (BLC)<\/strong>.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/concept.research.microsoft.com\/Images\/p1.png\" alt=\"\" width=\"500\" align=\"right\" \/>As an example, consider the term <strong><em>Microsoft<\/em><\/strong>, which can be categorized into a large number of concepts, ranging from extremely general to extremely specific, such as <strong><em>company<\/em><\/strong><em>, <strong>software company<\/strong>, and <strong>largest OS vendor<\/strong>. <\/em>If we go through <em>company<\/em>, we may find objects such as McDonald\u2019s and BMW, which have not much similarity to Microsoft. If we go through <em>largest OS vendor<\/em>, we may not be able to find any reasonable object other than Microsoft. On the other hand, if we go through <em>software company<\/em>, we may find Oracle, Adobe, IBM, which are a lot more similar to Microsoft. Thus, software company is a more appropriate basic-level concept for Microsoft, or in other words, properties associated with<em> software company<\/em> are more readily applied to Microsoft, which is also the reason why through <em>software company<\/em> we can find many objects that are similar to Microsoft.<\/p>\n<p>In this release, we will provide the concept distribution of input text with basic-level conceptualization. Besides, some common measures for conceptualization including MI, PMI, PMIk, and Typicality will be provided simultaneously.<\/p>\n<p><b>A snapshot of the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/concept.research.microsoft.com\/Home\/Demo\">demo<\/a>:<\/b><\/p>\n<p>Given a single instance \u201cpython\u201d, the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/concept.research.microsoft.com\/Home\/Demo\">demo<\/a> returns concept distributions with different measures (including <strong>BLC<\/strong> measure):<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/concept.research.microsoft.com\/Images\/python.png\" align=\"middle\" \/><\/p>\n<p>You can simply integrate this <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/concept.research.microsoft.com\/help\/index#\/Concept\">single instance conceptualization service<\/a> into your own applications.<\/p>\n<p><span class=\"spanhome\"><strong>2.Single instance conceptualization with context<\/strong> <\/span><\/p>\n<p>Given \u201capple\u201d and \u201cpie\u201d, our API maps \u201capple\u201d to fruit related senses.<br \/>\nGiven \u201capple\u201d and \u201cipad\u201d, our API maps \u201capple\u201d to company related seneses.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/concept.research.microsoft.com\/Images\/table2.png\" alt=\"\" align=\"middle\" \/><\/p>\n<p><strong><span class=\"spanhome\">3.Short text conceptualization<\/span><\/strong><\/p>\n<p>Given a short text \u201cthe engineer is eating the apple\u201d, will do the segmentation, concept mapping, and sense disambiguation.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/concept.research.microsoft.com\/Images\/apple3.png\" alt=\"\" align=\"middle\" \/><\/p>\n<h3><strong>References<\/strong><\/h3>\n<ol>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/wangzhongyuan.com\/en\/\">Zhongyuan Wang<\/a> and Haixun Wang, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/research.microsoft.com\/apps\/pubs\/default.aspx?id=264862\" target=\"_blank\" rel=\"noopener noreferrer\">Understanding Short Texts, <\/a>in the Association for Computational Linguistics (ACL) (Tutorial), August 2016.<\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/wangzhongyuan.com\/en\/\">Zhongyuan Wang<\/a>, Haixun Wang, Ji-Rong Wen, and Yanghua Xiao,<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/research.microsoft.com\/apps\/pubs\/default.aspx?id=255397\" target=\"_blank\" rel=\"noopener noreferrer\"> An Inference Approach to Basic Level of Categorization,<\/a> in ACM International Conference on Information and Knowledge Management (CIKM), ACM \u2013Association for Computing Machinery, October 2015.<\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/wangzhongyuan.com\/en\/\">Zhongyuan Wang<\/a>, Kejun Zhao, Haixun Wang, Xiaofeng Meng, and Ji-Rong Wen, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/research.microsoft.com\/apps\/pubs\/default.aspx?id=245007\" target=\"_blank\" rel=\"noopener noreferrer\"> Query Understanding through Knowledge-Based Conceptualization,<\/a> in IJCAI, July 2015.<\/li>\n<li>Wen Hua, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/wangzhongyuan.com\/en\/\">Zhongyuan Wang<\/a>, Haixun Wang, Kai Zheng, and Xiaofang Zhou,<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/research.microsoft.com\/apps\/pubs\/default.aspx?id=231107\" target=\"_blank\" rel=\"noopener noreferrer\"> Short Text Understanding Through Lexical-Semantic Analysis,<\/a> in International Conference on Data Engineering (ICDE), April 2015. (<b>Best Paper Award<\/b>)<\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/wangzhongyuan.com\/en\/\">Zhongyuan Wang<\/a>, Haixun Wang, and Zhirui Hu, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/research.microsoft.com\/apps\/pubs\/default.aspx?id=203584\" target=\"_blank\" rel=\"noopener noreferrer\"> Head, Modifier, and Constraint Detection in Short Texts, <\/a>in International Conference on Data Engineering (ICDE), 2014.<\/li>\n<li>Yangqiu Song, Haixun Wang, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/wangzhongyuan.com\/en\/\">Zhongyuan Wang<\/a>, Hongsong Li, and Weizhu Chen, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/research.microsoft.com\/apps\/pubs\/default.aspx?id=151341\" target=\"_blank\" rel=\"noopener noreferrer\"> Short Text Conceptualization using a Probabilistic Knowledgebase, <\/a>in IJCAI, 2011<\/li>\n<\/ol>\n<h3><strong>Contacts<\/strong><\/h3>\n<table style=\"height: 143px;\" width=\"535\">\n<tbody>\n<tr>\n<td style=\"text-align: center;\"><img decoding=\"async\" src=\"https:\/\/concept.research.microsoft.com\/Images\/zhongyuan.png\" \/><br \/>\n<a class=\"contact\" href=\"http:\/\/wangzhongyuan.com\/en\/\" target=\"_blank\" rel=\"noopener noreferrer\">Zhongyuan Wang<\/a><\/td>\n<td style=\"text-align: center;\"><img decoding=\"async\" src=\"https:\/\/concept.research.microsoft.com\/Images\/dawei.png\" \/><br \/>\n<a class=\"contact\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/dawezh\/\">Dawei Zhang<\/a><\/td>\n<td style=\"text-align: center;\"><img decoding=\"async\" src=\"https:\/\/concept.research.microsoft.com\/Images\/junyan.png\" \/><br \/>\n<a class=\"contact\" href=\"http:\/\/research.microsoft.com\/en-us\/people\/junyan\/\" target=\"_blank\" rel=\"noopener noreferrer\">Jun Yan<\/a><\/td>\n<td style=\"text-align: center;\"><img decoding=\"async\" src=\"https:\/\/concept.research.microsoft.com\/Images\/weiying.png\" \/><br \/>\n<a class=\"contact\" href=\"http:\/\/research.microsoft.com\/en-us\/people\/wyma\/\" target=\"_blank\" rel=\"noopener noreferrer\">Wei-Ying Ma<\/a><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h3><strong>Group<\/strong><\/h3>\n<p><img decoding=\"async\" src=\"https:\/\/concept.research.microsoft.com\/Images\/email.png\" width=\"15\" \/>\u00a0\u00a0<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/data-mining-enterprise-intelligence\/\">Data Mining and Enterprise Intelligence Group<\/a>, MSRA<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Conceptualization model aims to map text format entities into semantic concept categories with some probabilities, which may depend on the context texts of the entities. As an example, \u201cMicrosoft\u201d could be automatically mapped to \u201cSoftware Company\u201d and \u201cFortune 500 company\u201d etc. with some probabilities. It provides computers the common sense computing capability and make [&hellip;]<\/p>\n","protected":false},"featured_media":294923,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"footnotes":""},"research-area":[13556,13545],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-293774","msr-project","type-msr-project","status-publish","has-post-thumbnail","hentry","msr-research-area-artificial-intelligence","msr-research-area-human-language-technologies","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"2010-07-05","related-publications":[295037],"related-downloads":[],"related-videos":[],"related-groups":[],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[],"slides":[],"related-researchers":[],"msr_research_lab":[199560],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/293774"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":0,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/293774\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/294923"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=293774"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=293774"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=293774"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=293774"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=293774"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}