{"id":599043,"date":"2019-07-26T09:00:30","date_gmt":"2019-07-26T16:00:30","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=599043"},"modified":"2019-08-13T12:11:04","modified_gmt":"2019-08-13T19:11:04","slug":"analyzing-ambiguity-and-word-embeddings-by-probing-semantic-classes","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/analyzing-ambiguity-and-word-embeddings-by-probing-semantic-classes\/","title":{"rendered":"Analyzing ambiguity and word embeddings by probing semantic classes"},"content":{"rendered":"<p>Word embeddings have had a big impact on many applications in natural language processing (NLP) and information retrieval. It is, therefore, crucial to open the blackbox and understand their meaning representation. We propose probing tasks for analyzing the meaning representation in word embeddings. Our tasks are classification based with word embeddings as the only input. We use semantic classes such as \u201cfood,\u201d \u201corganization,\u201d and \u201canimal\u201d to define word senses and annotate words with them. By doing so, we can model ambiguous words and answer important questions including: <strong>How do word embeddings represent multiple meanings<\/strong>?<\/p>\n<p>In the context of NLP, word embeddings are word meanings represented by vectors. That is, the word meanings have been translated into vectors of real-valued numbers. Vectors are trained based on the context that words appear in. In research to this point, there has been a question of whether or not multiple meanings can be represented by a vector space as well as how accurately these vectors retain multiple word meanings when they encounter ambiguity.<\/p>\n<p>For example, if a word has multiple meanings but the usage of those meanings occurs at roughly the same rate, there is a gray area where meaning becomes difficult to identify. The question becomes, then, do vectors in this situation retain any information about the specific word meaning, do they move to another space that has no meaning at all, or do they move to an unrelated space disassociated from the specific meaning of the word?<\/p>\n<p>In \u201c<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/probing-for-semantic-classes-diagnosing-the-meaning-content-of-word-embeddings\/\">Probing for Semantic Classes: Diagnosing the Meaning Content of Word Embeddings<\/a>,\u201d our research addresses this ambiguous space by creating a dataset called WIKI-PSE and using probing tasks to make predictions about a word embedding\u2019s semantic class and whether or not it occurs ambiguously. We ultimately find that this process can be a good predictor of both semantic class identification and whether a word embedding is ambiguous or not.<\/p>\n<h3>WIKI-PSE: a resource for probing semantics in word embeddings<\/h3>\n<div id=\"attachment_599049\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/07\/Probing-for-Semantic-Classes-Diagnosing-the-Meaning-Content-of-Word-Embeddings_Site_07_2019_1400x788-e1563826596129.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-599049\" class=\"wp-image-599049 size-large\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/07\/Probing-for-Semantic-Classes-Diagnosing-the-Meaning-Content-of-Word-Embeddings_Site_07_2019_1400x788-e1563826596129-1024x435.png\" alt=\"Figure 1: An example of how we built WIKI-PSE. The three sentences are taken from Wikipedia, and apple is linked to one of its Wikipedia pages in each of them.\" width=\"1024\" height=\"435\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/07\/Probing-for-Semantic-Classes-Diagnosing-the-Meaning-Content-of-Word-Embeddings_Site_07_2019_1400x788-e1563826596129-1024x435.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/07\/Probing-for-Semantic-Classes-Diagnosing-the-Meaning-Content-of-Word-Embeddings_Site_07_2019_1400x788-e1563826596129-300x128.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/07\/Probing-for-Semantic-Classes-Diagnosing-the-Meaning-Content-of-Word-Embeddings_Site_07_2019_1400x788-e1563826596129-768x326.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/07\/Probing-for-Semantic-Classes-Diagnosing-the-Meaning-Content-of-Word-Embeddings_Site_07_2019_1400x788-e1563826596129.png 1400w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><p id=\"caption-attachment-599049\" class=\"wp-caption-text\">Figure 1: An example of how we built WIKI-PSE. The three sentences are taken from Wikipedia, and &#8220;apple&#8221; is linked to one of its Wikipedia pages in each of them.<\/p><\/div>\n<p>To complete our probing tasks, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/yyaghoobzadeh\/WIKI-PSE\">we built WIKI-PSE<\/a>, a Wikipedia-based resource for probing semantics in word embeddings. Wikipedia is suitable for our purposes since it contains nouns\u2013both proper and common\u2013disambiguated and linked to Wikipedia pages via anchor links. To find more abstract meanings than Wikipedia pages, we annotate the nouns with semantic classes. Semantic classes act as proxies for meanings. For example, \u201clamb\u201d has the meanings \u201cfood\u201d and \u201cliving thing.\u201d WIKI-PSE has around 80,000 such words annotated with 34 semantic classes.<br \/>\nIn Figure 1, we show how we built WIKI-PSE, using \u201capple\u201d as an example.<\/p>\n<h3>Probing meanings in word embeddings<\/h3>\n<p>We investigate embeddings by using probing to ask the question: Is the information we care about available in a word embedding? Specifically, we probe for semantic classes by asking the question: Can the information, whether a word belongs to a specific semantic class, be obtained from its embedding?<\/p>\n<p>We learn word embeddings by running models like word2vec on the WIKI-PSE corpus. In Figure 2, we demonstrate this process.<\/p>\n<div id=\"attachment_599058\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/07\/semantic_classes_figure_2.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-599058\" class=\"wp-image-599058 size-large\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/07\/semantic_classes_figure_2-1024x506.jpg\" alt=\"Figure 2: Learning one embedding for \"apple\" representing multiple meanings of it.\" width=\"1024\" height=\"506\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/07\/semantic_classes_figure_2-1024x506.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/07\/semantic_classes_figure_2-300x148.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/07\/semantic_classes_figure_2-768x379.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/07\/semantic_classes_figure_2.jpg 1682w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><p id=\"caption-attachment-599058\" class=\"wp-caption-text\">Figure 2: Learning one embedding for &#8220;apple&#8221; representing multiple meanings of it.<\/p><\/div>\n<p>Next, given word embeddings, we define two probing tasks:<\/p>\n<p>1. Semantic-class prediction: given an embedding, the task predicts the embedding\u2019s different semantic classes.<br \/>\n2. Ambiguity prediction: given an embedding, the task predicts if the embedding belongs to an ambiguous or unambiguous word.<\/p>\n<div id=\"attachment_599061\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/07\/semantic_classes_figure_3.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-599061\" class=\"wp-image-599061 size-large\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/07\/semantic_classes_figure_3-1024x477.jpg\" alt=\"Figure 3: Our two prediction tasks for probing meanings in word embeddings\" width=\"1024\" height=\"477\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/07\/semantic_classes_figure_3-1024x477.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/07\/semantic_classes_figure_3-300x140.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/07\/semantic_classes_figure_3-768x358.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/07\/semantic_classes_figure_3.jpg 1071w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><p id=\"caption-attachment-599061\" class=\"wp-caption-text\">Figure 3: Our two prediction tasks for probing meanings in word embeddings<\/p><\/div>\n<h3>Employing the semantic-class prediction probing task<\/h3>\n<p>The first task is to probe for semantic classes. We train, for each semantic class, a binary multilayer perceptron (MLP) classifier (with one hidden layer) that takes an embedding as input and predicts membership in the semantic class. An ambiguous word like \u201capple\u201d belongs to multiple semantic classes, so each of several different binary classifiers should diagnose it as being in its semantic class. How well this type of probing for semantic classes works in practice is one of our key questions: Can semantic classes be correctly encoded in embedding space? The results are promising.<\/p>\n<p>In Figure 4, we plot our experimental results with respect to three important factors: number of semantic classes, dominance-level, frequency-level.<\/p>\n<p>The performance degrades as number of semantic classes increases; however, we still get pretty high recall even when words have more than 6 semantic classes, which shows that embeddings do not have issues in representing multiple meanings in single vectors.<\/p>\n<p>The performance also degrades with decreasing dominance-level. Dominance level represents the percentage of a word sense occurring in relationship to a single word. There is a sharp drop of performance at dominance-level of 0.3, emphasizing that semantic classes with low dominance-levels (<0.3) are poorly represented in word embeddings.<\/p>\n<p>Frequency level indicates the occurrence of a word in the overall corpus of data. Low frequency semantic classes are not represented well, but with frequency more than 20, the recall is more than 80%.<\/p>\n<div id=\"attachment_599079\" style=\"width: 910px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-599079\" class=\"wp-image-599079\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/07\/semantic_classes_figure4R-2.gif\" alt=\"Figure 6: Results of semantic class prediction as a function of three important factors: number of semantic classes in the word, dominance-level of the semantic class in the word, and frequency-level of the word\/ semantic-class.\" width=\"900\" height=\"593\" \/><p id=\"caption-attachment-599079\" class=\"wp-caption-text\">Figure 4: Results of semantic class prediction as a function of three important factors: number of semantic classes in the word, dominance-level of the semantic class in the word, and frequency-level of the word\/ semantic-class.<\/p><\/div>\n<h3>Employing the ambiguity prediction probing task<\/h3>\n<p>The second probing task predicts whether an embedding represents an unambiguous (one semantic class) or an ambiguous (multiple semantic classes) word. Here, we do not look for any specific meaning in an embedding but assess whether it is an encoding of multiple different meanings or not. High accuracy of this classifier would imply that ambiguous and unambiguous words are distinguishable in the embedding space.<\/p>\n<p>We train an MLP classifier to predict whether an embedding belongs to an ambiguous word or not.<\/p>\n<p>The results are shown in Table 1. The MLP classifier using word embedding as input achieves very high accuracy of 81.2%. We added the result of a classifier using word frequency as input as well to make it clear that frequency is not the main distinguishing feature which might be captured by word embeddings.<\/p>\n<div id=\"attachment_599094\" style=\"width: 610px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/07\/semantic_classes_table_1R.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-599094\" class=\"wp-image-599094\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/07\/semantic_classes_table_1R.jpg\" alt=\"Table 1: Accuracy of an MLP classifier for ambiguity prediction\" width=\"600\" height=\"121\" \/><\/a><p id=\"caption-attachment-599094\" class=\"wp-caption-text\">Table 1: Accuracy of an MLP classifier for ambiguity prediction<\/p><\/div>\n<p>In our paper, we also designed alternative ways for representation of word embeddings. In those alternatives, we first learn different embeddings for different meanings of words and then aggregate then using uniform and weighted sum. They are used as baselines to contrast the results of word embeddings learned by the typical approach.<\/p>\n<p>We also evaluated our embeddings in five common NLP datasets and showed contradictory results compared to the results of probing tasks.<\/p>\n<p>We aim to increase the NLP community\u2019s understanding of how word embeddings represent meanings. By looking closely at the space in vectors where multiple meanings occur in an ambiguous way, we have learned that we can predict, with high accuracy using probing tasks, whether a word embedding represents an ambiguous or unambiguous word. We have also learned that if word meanings are frequent enough, word embedding models can capture multiple meanings in a single vector well, an idea that is reiterated through our work with both the semantic class prediction probing task and the ambiguity probing task. Further information regarding occurrences of rare senses of words can be found in our research paper.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Word embeddings have had a big impact on many applications in natural language processing (NLP) and information retrieval. It is, therefore, crucial to open the blackbox and understand their meaning representation. We propose probing tasks for analyzing the meaning representation in word embeddings. Our tasks are classification based with word embeddings as the only input. [&hellip;]<\/p>\n","protected":false},"author":38022,"featured_media":599049,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"footnotes":""},"categories":[243622,194456],"tags":[],"research-area":[13545],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-599043","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-human-language-technologies","category-natural-language-processing","msr-research-area-human-language-technologies","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[],"related-events":[599466],"related-researchers":[],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/07\/Probing-for-Semantic-Classes-Diagnosing-the-Meaning-Content-of-Word-Embeddings_Site_07_2019_1400x788-e1563826596129-960x540.png\" class=\"img-object-cover\" alt=\"\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/07\/Probing-for-Semantic-Classes-Diagnosing-the-Meaning-Content-of-Word-Embeddings_Site_07_2019_1400x788-e1563826596129-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/07\/Probing-for-Semantic-Classes-Diagnosing-the-Meaning-Content-of-Word-Embeddings_Site_07_2019_1400x788-e1563826596129-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/07\/Probing-for-Semantic-Classes-Diagnosing-the-Meaning-Content-of-Word-Embeddings_Site_07_2019_1400x788-e1563826596129-343x193.png 343w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/07\/Probing-for-Semantic-Classes-Diagnosing-the-Meaning-Content-of-Word-Embeddings_Site_07_2019_1400x788-e1563826596129-640x360.png 640w\" sizes=\"(max-width: 960px) 100vw, 960px\" \/>","byline":"Yadollah Yaghoobzadeh","formattedDate":"July 26, 2019","formattedExcerpt":"Word embeddings have had a big impact on many applications in natural language processing (NLP) and information retrieval. It is, therefore, crucial to open the blackbox and understand their meaning representation. We propose probing tasks for analyzing the meaning representation in word embeddings. Our tasks&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/599043"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/38022"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=599043"}],"version-history":[{"count":14,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/599043\/revisions"}],"predecessor-version":[{"id":599577,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/599043\/revisions\/599577"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/599049"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=599043"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=599043"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=599043"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=599043"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=599043"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=599043"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=599043"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=599043"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=599043"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=599043"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=599043"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}