{"id":293774,"date":"2016-09-18T01:18:59","date_gmt":"2016-09-18T08:18:59","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&p=293774"},"modified":"2017-06-06T09:35:52","modified_gmt":"2017-06-06T16:35:52","slug":"conceptualization","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/conceptualization\/","title":{"rendered":"Conceptualization"},"content":{"rendered":"
The Conceptualization model aims to map text format entities into semantic concept categories with some probabilities, which may depend on the context texts of the entities. As an example, \u201cMicrosoft\u201d could be automatically mapped to \u201cSoftware Company\u201d and \u201cFortune 500 company\u201d etc. with some probabilities. It provides computers the common sense computing capability and make machines “aware” of the mental world of human beings, through which way machines can better understand human communication in text. In detail, conceptualization maps instances or short texts into a large auto learned concept space, which is a vector space, with human-level concept reasoning. It can be treated as both human understandable and machine understandable text embedding. Thus it provides us the capability of text concept tagging, short text semantic similarity computation etc. for text understanding. It can benefit various text processing applications including search engines, automatic question-answering, online advertising, recommendation systems and artificial intelligence system. For more information, please refer to our Microsoft Concept Graph <\/a>release page and our ACL 2016 tutorial “Understanding Short Texts<\/a>“.<\/p>\n 1.Single instance conceptualization<\/strong> <\/span><\/p>\n Single instance conceptualization can return a ranked list of automatically learned concept\/category names for any input entity mention\/instance. Each concept has a probability to denote the possibility of the input entity belonging to this concept. As a result, the input entity is represented as a numerical vector, which shows its distribution over the concept vector space.<\/p>\n For human beings, given a single instance, this concept distribution often forms automatically and subconsciously. More importantly, those categories at the appropriate level rank higher. Psychologists and linguists call it as Basic-level Categorization (BLC)<\/strong>.<\/p>\n As an example, consider the term Microsoft<\/em><\/strong>, which can be categorized into a large number of concepts, ranging from extremely general to extremely specific, such as company<\/em><\/strong>, software company<\/strong>, and largest OS vendor<\/strong>. <\/em>If we go through company<\/em>, we may find objects such as McDonald\u2019s and BMW, which have not much similarity to Microsoft. If we go through largest OS vendor<\/em>, we may not be able to find any reasonable object other than Microsoft. On the other hand, if we go through software company<\/em>, we may find Oracle, Adobe, IBM, which are a lot more similar to Microsoft. Thus, software company is a more appropriate basic-level concept for Microsoft, or in other words, properties associated with software company<\/em> are more readily applied to Microsoft, which is also the reason why through software company<\/em> we can find many objects that are similar to Microsoft.<\/p>\n In this release, we will provide the concept distribution of input text with basic-level conceptualization. Besides, some common measures for conceptualization including MI, PMI, PMIk, and Typicality will be provided simultaneously.<\/p>\n A snapshot of the demo<\/a>:<\/b><\/p>\n Given a single instance \u201cpython\u201d, the demo<\/a> returns concept distributions with different measures (including BLC<\/strong> measure):<\/p>\n <\/p>\n You can simply integrate this single instance conceptualization service<\/a> into your own applications.<\/p>\n 2.Single instance conceptualization with context<\/strong> <\/span><\/p>\n Given \u201capple\u201d and \u201cpie\u201d, our API maps \u201capple\u201d to fruit related senses. <\/p>\n 3.Short text conceptualization<\/span><\/strong><\/p>\n Given a short text \u201cthe engineer is eating the apple\u201d, will do the segmentation, concept mapping, and sense disambiguation.<\/p>\n <\/p>\n
\nGiven \u201capple\u201d and \u201cipad\u201d, our API maps \u201capple\u201d to company related seneses.<\/p>\nReferences<\/strong><\/h3>\n
\n
Contacts<\/strong><\/h3>\n
\n\n
\n
\nZhongyuan Wang<\/a><\/td>\n
\nDawei Zhang<\/a><\/td>\n
\nJun Yan<\/a><\/td>\n
\nWei-Ying Ma<\/a><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\nGroup<\/strong><\/h3>\n