{"id":743044,"date":"2021-06-22T19:44:57","date_gmt":"2021-06-23T02:44:57","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-blog-post&#038;p=743044"},"modified":"2021-06-22T19:44:57","modified_gmt":"2021-06-23T02:44:57","slug":"microsoft-lreasoner-leads-the-reclor-challenge-on-logical-reasoning","status":"publish","type":"msr-blog-post","link":"https:\/\/www.microsoft.com\/en-us\/research\/articles\/microsoft-lreasoner-leads-the-reclor-challenge-on-logical-reasoning\/","title":{"rendered":"Microsoft LReasoner leads the ReClor challenge on logical reasoning"},"content":{"rendered":"<p>Recently, the industry has witnessed the growth of highly advanced and powerful AI language models. When the industry marvels at its variety of skills, like drawing, writing, and game-playing, it also worries for its IQ. For example, if you try to ask an advanced language model the following question:<\/p>\n<p><em>Question: How many eyes does the sun have?<\/em><br \/>\n<em>Model: Sun has one eye.<\/em><br \/>\n<em>Correct answer from humans: The sun is a star, and it has no eyes.<\/em><\/p>\n<p>The reason for this type of mistake is that when the language model was asked, it did not infer the relationship between the sun and the eyes. If you look for the reason from technical aspect, there is a possible explanation that most of the current natural processing technologies use the \u201cpre-training + fine-tuning\u201d paradigm. This paradigm achieves superior performance on tasks that require shallow semantic matching and understanding of text. However, whether the pre-trained language model really has reasoning ability and whether it can cope with tasks that require complex reasoning ability is still a problem needs to be solved in current research.<\/p>\n<p>In order to solve the logical reasoning problem of the machine, the Natural Language Computing Group of Microsoft Research Asia proposed the LReasoner system, which assists the model to find the answer to the problem by recognizing logical symbols and expressions expressed in the text.<\/p>\n<p>When the researchers test the LReasoner system on the ReCLor dataset, which focuses on the logical reasoning part of Law School Admission Test (LSAT), the system achieved the current SOTA (state-of-the-art performance) in the official evaluation leaderboard of the dataset. As a result, it significantly outperforms human performance (Note: human performance refers to the average accuracy of 10 college students given in the ReClor paper) reported in the ReClor paper (Table 1).<\/p>\n<div id=\"attachment_743047\" style=\"width: 702px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-743047\" class=\"size-full wp-image-743047\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/reclor-1.png\" alt=\"Table 1: The performance of human and LReasoner system on the ReClor dataset\" width=\"692\" height=\"122\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/reclor-1.png 692w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/reclor-1-300x53.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/reclor-1-16x3.png 16w\" sizes=\"(max-width: 692px) 100vw, 692px\" \/><p id=\"caption-attachment-743047\" class=\"wp-caption-text\">Table 1: The performance of human and LReasoner system on the ReClor dataset<\/p><\/div>\n<p>The website of official leaderboard of the ReClor dataset<br \/>\nhttps:\/\/eval.ai\/web\/challenges\/challenge-page\/503\/leaderboard\/1347<\/p>\n<div id=\"attachment_743050\" style=\"width: 702px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-743050\" class=\"size-full wp-image-743050\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/reclor-2.png\" alt=\"Figure1\uff1aLReasoner achieves the state-of-the-art performance on the official ReClor leaderboard\" width=\"692\" height=\"424\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/reclor-2.png 692w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/reclor-2-300x184.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/reclor-2-16x10.png 16w\" sizes=\"(max-width: 692px) 100vw, 692px\" \/><p id=\"caption-attachment-743050\" class=\"wp-caption-text\">Figure1\uff1aLReasoner achieves the state-of-the-art performance on the official ReClor leaderboard<\/p><\/div>\n<h2>Real world Scenario: Law School Admission Test (LSAT)<\/h2>\n<p>Law School Admission Test (LSAT), is a standardized admission test established by the Law School Admissions Commission in Pennsylvania, USA in 1947. Since LSAT is one of the most important reference conditions for law school admissions, almost all the law schools require applicants to take the LSAT exam.<\/p>\n<p>LSAT exam does not require candidates to have professional legal knowledge, and is designed to examine the logical analysis and reasoning skills required by students in law school. The multiple-choice questions of the LSAT exam are divided into three parts: (1) the reading comprehension part, (2) the logical reasoning part and (3) the analytical reasoning part. The reading comprehension part requires the candidates to understand complex texts that introduce new knowledge. Analytical reasoning part requires the candidates to understand and analyze the relationship structure between a set of elements according to given rules. For example, requiring the candidates to rank or group a set of elements.<\/p>\n<p>The researchers in the Natural Language Computing Group of Microsoft Research Asia focused on the logical reasoning part. This part focuses on testing candidates\u2019 ability to analyze multiple sets of logical arguments, critical thinking, and combinatorial reasoning. This part contains a number of passages composed of logical arguments, and a set of questions for each passage. Candidates are required to choose the correct option for each question. The potential types of questions include asking candidates to find wrong arguments; weakening or strengthening an argument, finding out the assumptions that argument relies on, or combining multiple sets of arguments to reach new conclusions, etc.<\/p>\n<div id=\"attachment_743053\" style=\"width: 386px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-743053\" class=\"size-full wp-image-743053\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/reclor-3.png\" alt=\"graphical user interface, text, application\" width=\"376\" height=\"478\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/reclor-3.png 376w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/reclor-3-236x300.png 236w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/reclor-3-9x12.png 9w\" sizes=\"(max-width: 376px) 100vw, 376px\" \/><p id=\"caption-attachment-743053\" class=\"wp-caption-text\">Figure 2: examples of logical reasoning test<\/p><\/div>\n<p>Figure 2 gives an example of logical reasoning test in LSAT. Given a passage, a question and multiple choices, the candidates are required to choose the most plausible answer (marked in green). As shown in the example, in order to answer the question, a system needs to extract logical symbols, like \u201chave keyboarding skills\u201d and \u201cbe able to use a computer\u201d. Then, it needs to identify the existing logical expressions that are composed of logical symbols. Then according to logical equivalence laws, it performs inference to extend the logical expressions that are not explicitly mentioned in the context. Finally, it compare the logical expressions with options to find the most plausible answer. It can be seen that the task of logical reasoning requires the machine to have the ability to understand logical arguments and make complex inferences.<\/p>\n<p>Researchers took the public ReClor [1] dataset as a case study to carry out research on logical reasoning task. The questions of ReClor dataset come from the logical reasoning part of the Law School Admission Test (LSAT) and the Graduate Management Admission Test (GMAT). This dataset consists of 6,138 logical reasoning problems in realistic scenario. The evaluation metric is accuracy. In order to avoid data biases, the test set of the ReClor dataset is partitioned into the easy part (Test-E) and the hard part (Test-H) according to whether it\u2019s easy for the model to make judgement purely by the option. ReClor has an official evaluation leaderboard on EvalAI. Since the annotation of the test set of ReClor is not published, participants are required to submit their test results to the official leaderboard for obtaining scores.<\/p>\n<h2>Approach: Logic-driven LReasoner system<\/h2>\n<p>In order to solve logical reasoning problem, researchers from Microsoft Research Asia propose LReasoner system, which identifies the logical symbols and expressions from the text and select the most plausible answer. Specifically, LReasoner consists of two main components:<\/p>\n<p>(1) logic-driven context extension framework and (2) logic-driven data augmentation algorithm. Logic-driven context extension framework infers extended logical expressions implicitly mentioned in the text according to logical equivalence law. Logic-driven data augmentation algorithm focuses on constructing literally similar but logically different context to help the model to better capture the logical information, especially the logical negation and conditional relationship.<\/p>\n<div id=\"attachment_743056\" style=\"width: 1280px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-743056\" class=\"size-full wp-image-743056\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/reclor-4.png\" alt=\"Figure 3: Logic-driven context extension framework\" width=\"1270\" height=\"455\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/reclor-4.png 1270w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/reclor-4-300x107.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/reclor-4-1024x367.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/reclor-4-768x275.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/reclor-4-16x6.png 16w\" sizes=\"(max-width: 1270px) 100vw, 1270px\" \/><p id=\"caption-attachment-743056\" class=\"wp-caption-text\">Figure 3: Logic-driven context extension framework<\/p><\/div>\n<p>Logic-driven context extension framework can be divided into three steps: logical identification, logical extension and logical verbalization. (1) Firstly, LReasoner setup a set of rules to identify the logical symbols and explicitly mentioned logical expressions from the text, considering logical negation and conditional relationship. These logical expressions are taken as the elementary components of reasoning. As the example shown in Figure 3, LReasoner extract (\u00ac\u03b1\u2192\u00ac\u03b2) and (\u00ac\u03b2\u2192\u00ac\u03b3) from the context.<\/p>\n<p>(2) Secondly, researchers extend the implicitly mentioned logical expressions according to already identified logical expressions and logical equivalence law. For example, (\u00ac\u03b1\u2192\u00ac\u03b3) in Figure 3 is an extended logical expression. (3) Finally, LReasoner verbalizes the extended logical expressions into natural language statements, which are further concatenated with the original context for training the pre-trained language model. Furthermore, it selects the answer by matching the options and inferred logical information.<\/p>\n<p>Logic-driven data augmentation augments challenging instances with literally similar but logically different contexts based on logical expressions. These challenging instances are utilized in training to help the model to identify the logical information hidden in the context and predict the logically correct answer. Specifically, in the contrastive learning stage, researchers construct the positive instances by the original context. Furthermore, logical negative instances are generated by modifying the existing logical expressions by operations (delete, negation and reverse) and verbalizing the modified logical expressions into negative context. The process of constructing negative instances is shown in Figure 4.<\/p>\n<div id=\"attachment_743059\" style=\"width: 410px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-743059\" class=\"wp-image-743059\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/reclor-5.png\" alt=\"table\" width=\"400\" height=\"300\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/reclor-5.png 693w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/reclor-5-300x225.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/reclor-5-16x12.png 16w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/reclor-5-80x60.png 80w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/reclor-5-240x180.png 240w\" sizes=\"(max-width: 400px) 100vw, 400px\" \/><p id=\"caption-attachment-743059\" class=\"wp-caption-text\">Figure 4: Procedure to construct a logical negative sample<\/p><\/div>\n<h2>Ablation Study: LReasoner system improve the logical reasoning ability<\/h2>\n<p>To verify the effectiveness of different components in the LReasoner, researchers conduct ablation study and takes RoBERTa as the backbone model. As the results shown in Table 2, we can see that both logic-driven context extension framework and logic-driven data augmentation algorithm improve the performance on the logical reasoning task.<\/p>\n<div id=\"attachment_743062\" style=\"width: 410px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-743062\" class=\"wp-image-743062\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/reclor-6-300x109.png\" alt=\"table\" width=\"400\" height=\"146\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/reclor-6-300x109.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/reclor-6-16x6.png 16w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/reclor-6.png 692w\" sizes=\"(max-width: 400px) 100vw, 400px\" \/><p id=\"caption-attachment-743062\" class=\"wp-caption-text\">Table 2: Results of ablation study. CE and DA indicate logic-driven context extension framework and logic-driven data augmentation algorithm, respectively. RoBERTa + CE+DA indicate the LReasoner system built upon RoBERTa.<\/p><\/div>\n<p>The LReasoner system is the first attempt by researchers that applies machine reasoning to real scenarios. In the future, the Natural Language Computing Group of Microsoft Research Asia will continue to explore new tasks and new methods in the field of machine reasoning, and promote the research of knowledgeable and interpretable artificial intelligence.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recently, the industry has witnessed the growth of highly advanced and powerful AI language models. When the industry marvels at its variety of skills, like drawing, writing, and game-playing, it also worries for its IQ. For example, if you try to ask an advanced language model the following question: Question: How many eyes does the [&hellip;]<\/p>\n","protected":false},"author":34512,"featured_media":743068,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"msr-content-parent":199560,"footnotes":""},"research-area":[],"msr-locale":[268875],"class_list":["post-743044","msr-blog-post","type-msr-blog-post","status-publish","has-post-thumbnail","hentry","msr-locale-en_us"],"msr_assoc_parent":{"id":199560,"type":"lab"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/743044"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-blog-post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/34512"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/743044\/revisions"}],"predecessor-version":[{"id":743065,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/743044\/revisions\/743065"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/743068"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=743044"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=743044"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=743044"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}