{"id":699412,"date":"2020-10-21T10:01:55","date_gmt":"2020-10-21T17:01:55","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-blog-post&p=699412"},"modified":"2021-03-09T15:11:10","modified_gmt":"2021-03-09T23:11:10","slug":"expanding-semantic-search-into-biomed-with-medical-subject-headings-mesh","status":"publish","type":"msr-blog-post","link":"https:\/\/www.microsoft.com\/en-us\/research\/articles\/expanding-semantic-search-into-biomed-with-medical-subject-headings-mesh\/","title":{"rendered":"Expanding Semantic Search into Biomed with Medical Subject Headings (MeSH)"},"content":{"rendered":"

We’re excited to announce that Microsoft Academic (opens in new tab)<\/span><\/a> (MA) users can now explore biomedical publications using Medical Subject Heading (opens in new tab)<\/span><\/a> (MeSH) terms in semantic search.<\/p>\n

MeSH is a controlled and hierarchically organized vocabulary that the National Institute of Health (opens in new tab)<\/span><\/a> (NIH) maintains for indexing, cataloging, and facilitating search in biomedical databases such as PubMed (opens in new tab)<\/span><\/a>. Since releasing the new version of MA nearly 5 years ago, we have increasingly observed that many user queries are phrased using MeSH terminology. That observation coupled with the prevalence of biomedical literature in the Microsoft Academic Graph (opens in new tab)<\/span><\/a> (MAG) led us to pursue the integration of MeSH into MA’s unique semantic search capabilities.<\/p>\n

Users on MA can now access MeSH using two new semantic attributes: MeSH descriptors (opens in new tab)<\/span><\/a> signified by \"\", and MeSH qualifiers (opens in new tab)<\/span><\/a> signified by \"\"<\/p>\n

\"Query<\/p>\n

Revisiting semantic search<\/h2>\n

One of the core differentiating behaviors of MA has always been its emphasis on semantic search. In contrast to keyword search where a search engine performs best when users select the \u201cright\u201d keywords that match how the contents are indexed, semantic search is designed for the cases when it is not clear what the \u201cright\u201d keywords should be. For example, suppose you want to find the most influential publications in artificial intelligence (opens in new tab)<\/span><\/a> (AI). Using the query \u201cartificial intelligence\u201d with a keyword-based search engine, you will get results where the query terms explicitly appear in the paper title\/body, which misses the influential publications on AI that do not contain those\u00a0specific\u00a0terms. A semantic search engine like MA, on the other hand, will be able to overcome\u00a0this\u00a0limitation.<\/p>\n

As of the time of writing, the top\u00a0results\u00a0for the query \u201cartificial intelligence\u201d on MA\u00a0are articles that demonstrate the efficacy of deep convolutional neural networks for computer vision. These trend setting articles do not include \u201cartificial intelligence\u201d anywhere in their titles, abstracts, or even in the full text body and hence will not be retrieved by keyword search unless additional field of study annotations are also indexed as keywords.<\/p>\n

However, there are scenarios where a more intelligent search behavior cannot be so easily addressed, which is where our semantic search truly shines.<\/p>\n

What are composite attributes?<\/h2>\n

Composite data relationships are one such example. In a world where talents can move from one institution to another, it is common to see authors with publications affiliated with different institutions. In the meantime, authors can also collaborate with others from their previous affiliations. The query consisting of an author and an institution can therefore be interpreted as to find either the work of the author while affiliated with the institution, or the collaborative work this author has with the said institution. We can distinguish these two different meanings by modeling the author-affiliation relationship as a composite attribute of a publication. Our API users have always been able to express this nuanced intent using the composite query function (opens in new tab)<\/span><\/a>, and we are now making the same capability available to our website users.<\/p>\n

Take the Turing Award winner Yann LeCun (opens in new tab)<\/span><\/a> as an example. As a renowned computer scientist, he has had a productive career through AT&T Bell Labs (opens in new tab)<\/span><\/a>, Courant Institute in New York University (opens in new tab)<\/span><\/a> and, most recently, Facebook (opens in new tab)<\/span><\/a>. Previously, MA treated the query \u201cYann LeCun New York University\u201d by lumping the search results of both interpretations together. MA users can now use \u201cYann LeCun while at New York University\u201d to more narrowly scope search to only include papers written while the author was affiliated with New York University. As the goal of semantic search is to zoom in on the most relevant result, being able to express more precise intent can help quickly filter the massive result sets that a keyword search engine would produce. For example, MA will only serve up one result to the query \u201cYann LeCun while at New York University Bell Labs\u201d where another Bell Labs researcher coauthored a paper with Yann LeCun. All the papers Yann published while he worked at Bell Labs are not included in the search results as shown below (Note: be sure to engage with the query suggestion as explained in MA FAQ (opens in new tab)<\/span><\/a>):<\/p>\n

\"Search<\/p>\n

Similarly, the query \u201cYann LeCun Bell Labs\u201d is now treated as an ambiguous query and will prompt MA to help the user clarify their intent with disambiguating query suggestions:<\/p>\n

\"Query<\/p>\n

MeSH as a composite attribute<\/h2>\n

Composite attributes provide a powerful mechanism to group concepts that should be processed together, and one area that can further demonstrate its efficacy is in handling Medical Subject Headings (MeSH) (opens in new tab)<\/span><\/a>.<\/p>\n

In the MeSH implementation now available on MA, two basic types of MeSH records are included: the descriptor (aka main heading) and the qualifier (aka subheading). Descriptors characterize the subject matter or content of an article, while qualifiers are used in connection with descriptors to define a particular aspect of a subject.<\/p>\n

A good way to understand the differences between descriptors and qualifiers and our rationale to keep them as distinct fields in a composite attribute is through terms that can play either role. Take \u201cmortality\u201d as an example. MA can now differentiate the dual roles this term can play directly in the query suggestion dropdown where a darker\/lighter icon is used for a descriptor\/qualifier, respectively:<\/p>\n

\"Query<\/p>\n

Clicking on the fourth suggestion to instruct MA to interpret \u201cmortality\u201d as a descriptor, one can see (from the \u201cTop Topics\u201d on the left rail of the search result page) that research on this subject commonly co-occurs with topics in \u201cdemography\u201d, \u201cpopulation\u201d and \u201cpublic health\u201d.<\/p>\n

\"Search<\/p>\n

Further down the search result page are new sections for top co-occurring MeSH descriptors, where we can see that mortality is typically studied with other subjects like sex (male vs female), age, and geography. Similarly, by looking into top related MeSH qualifiers, MA shows the research articles addressing the subject of mortality are commonly from the areas of epidemiology or etiology, and the top topics include mortality trends and prevention control:<\/p>\n

\"Search<\/p>\n

In contrast, when asking MA to interpret \u201cmortality\u201d as a qualifier, we can see \u201cmortality\u201d is often an aspect in \u201cinternal medicine\u201d, \u201csurgery\u201d, \u201ccardiology\u201d or \u201ccancer\u201d research. Take heart attack (MeSH descriptor \u201cmyocardial infarction\u201d) as an example. As MA can now show, this area of research can be studied through many aspects, including \u201cmortality\u201d but also others ranging from \u201cdrug treatment\u201d to \u201ccomplications\u201d:<\/p>\n

\"Query<\/p>\n

In this example, if you want to focus on articles about the mortality rate of heart attacks, you can select the first query suggestion \u201cmyocardial infarction in relation to mortality\u201d. On the subsequent search result page all the top-most results will match the \u201cmyocardial infarction\/mortality\u201d descriptor\/qualifier pair, indicated by the highlighted tag as<\/p>\n

\"\" or \"\"<\/p>\n

One important item to note here is the presence of the \u201c*\u201d, which is a MeSH convention to annotate the \u201cmajor topic (opens in new tab)<\/span><\/a>\u201d for an article. This major topic flag is used in MA as one of the many signals in determining search result rankings. However, because search rankings are influenced by many factors, it is possible that an article whose major topic matches the query perfectly is ranked lower than others whose major topics are not as tightly matched.<\/p>\n

Moving back to query formulation, similar to the author\/affiliation example showcased above, when encountering the ambiguous query \u201cheart attack mortality\u201d MA will now generate two suggestions that reflect distinct interpretations:<\/p>\n

\"Partial<\/p>\n

The first interpretation generates results explicitly about the mortality of heart attacks. The second query suggestion, however, reflects a larger set of results with articles about the mortality rate for diseases (not specifically heart attacks) but also mentioning heart attacks (e.g. as a preexisting condition). To put it another way, the first interpretation is more<\/em> specific and the second less<\/em> specific.<\/p>\n

As with author\/affiliation metadata, modeling MeSH concepts with composite attributes enables this behavior in semantic search. It also enables descriptor\/qualifier values to be queried independent of each other.<\/p>\n

As MeSH concepts overlap significantly with MA\u2019s existing topics, we\u2019ve also provided new scoping triggers for MeSH so that queries can be more precisely specified:<\/p>\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
Scope<\/strong><\/td>\nDescription<\/strong><\/td>\nExample<\/strong><\/td>\n<\/tr>\n
mesh:<\/strong><\/td>\nMatch MeSH descriptor and\/or qualifier<\/strong><\/td>\nmesh: heart attack<\/strong>
\nmesh: mortality<\/strong>
\nmesh: heart attack mortality<\/strong>
\nmesh: heart attack in relation to mortality<\/strong><\/td>\n<\/tr>\n
mesh descriptor<\/strong><\/td>\nMatch MeSH descriptor<\/strong><\/td>\nmesh descriptor heart attack<\/strong><\/td>\n<\/tr>\n
mesh qualifier<\/strong><\/td>\nMatch MeSH qualifier<\/strong><\/td>\nmesh qualifier diagnosis<\/strong><\/td>\n<\/tr>\n
abstract:<\/td>\nMatch term or quoted value from the paper abstract<\/td>\nabstract: “heterogeneous entity graph comprised of six types of entities”<\/td>\n<\/tr>\n
affiliation:<\/td>\nMatch affiliation (institution) name<\/td>\naffiliation: “microsoft research”<\/td>\n<\/tr>\n
author:<\/td>\nMatch author name<\/td>\nauthor: “darrin eide”<\/td>\n<\/tr>\n
conference:<\/td>\nMatch conference series name<\/td>\nconference: www<\/td>\n<\/tr>\n
doi:<\/td>\nMatch paper Document Object Identifier (DOI)<\/td>\ndoi: 10.1037\/0033-2909.105.1.156<\/td>\n<\/tr>\n
journal:<\/td>\nMatch journal name<\/td>\njournal: nature<\/td>\n<\/tr>\n
title:<\/td>\nMatch term or quoted value from the paper title<\/td>\ntitle: “an overview of microsoft academic service mas and applications”<\/td>\n<\/tr>\n
topic:<\/td>\nMatch paper topic (field of study)<\/td>\ntopic: “knowledge base”<\/td>\n<\/tr>\n
year:<\/td>\nMatch paper publication year<\/td>\nyear: 2015<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n

 <\/p>\n

In closing, we are excited about the addition of MeSH to MA, and the opportunities it enables with the research community. As always, we love getting feedback and try to respond to as much of it as possible. To provide feedback, navigate to Microsoft Academic (opens in new tab)<\/span><\/a> and click the \u201cfeedback\u201d icon in the lower right-hand corner.<\/p>\n

Happy researching!<\/p>\n","protected":false},"excerpt":{"rendered":"

Microsoft Academic users can now explore biomedical publications using Medical Subject Heading (MeSH) terms in semantic search.<\/p>\n","protected":false},"author":36554,"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"msr-content-parent":170262,"footnotes":""},"research-area":[],"msr-locale":[268875],"class_list":["post-699412","msr-blog-post","type-msr-blog-post","status-publish","hentry","msr-locale-en_us"],"msr_assoc_parent":{"id":170262,"type":"project"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/699412"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-blog-post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/36554"}],"version-history":[{"count":18,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/699412\/revisions"}],"predecessor-version":[{"id":732004,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/699412\/revisions\/732004"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=699412"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=699412"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=699412"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}