Microsoft LReasoner leads the ReClor challenge on logical reasoning

Published

By

For many years AI researchers have sought to build upon traditional machine learning, which trains technology to process facts and learn from them, and develop machine reasoning, in which programs apply logic to data and solve problems – comparable to the way humans think. For a system to analyze multiple sets of logical arguments, it requires both critical thinking and combinatorial reasoning abilities.

One of the current benchmarks for evaluating a system’s logical reasoning ability is ReClor, a Reading Comprehension Dataset Requiring Logical Reasoning. ReClor is a dataset built from logical reasoning problems used in standardized admission tests, including the Law School Admission Test (LSAT) and Graduate Management Admission Test (GMAT).

Today, we are excited to announce that Microsoft’s LReasoner system is the top-rated performer on the official ReCLor leaderboard. LReasoner also significantly exceeded human performance, as measured by the average accuracy of 10 college students who each answered 10 randomly selected test questions and reported in the ReClor paper.

Figure 1: A screenshot from the ReClor leaderboard shows LReasoner at the top, with an overall score of 76.1, a Test-Easy score of 87.05, and a Test-Hard score of 67.5. At number two on the leaderboard is RainaCUED (Electra) with scores of 67.1, 80.91, and 56.25. At number three on the leaderboard is zhiweihu (ALBERT) with scores of 62.6, 73.64, and 53.93. (opens in new tab)
Figure1:LReasoner achieves the state-of-the-art performance on the official ReClor leaderboard (opens in new tab)

Microsoft Research Blog

Microsoft Research Forum Episode 3: Globally inclusive and equitable AI, new use cases for AI, and more

In the latest episode of Microsoft Research Forum, researchers explored the importance of globally inclusive and equitable AI, shared updates on AutoGen and MatterGen, presented novel use cases for AI, including industrial applications and the potential of multimodal models to improve assistive technologies.

The LSAT is a standardized exam established in 1947 that has become an essential benchmark in the law school admissions process. An example of the LSAT’s challenging logical reasoning questions is shown in Figure 2. To answer such a question, a system needs to understand the logical symbols like “have keyboarding skills”, make complex inferences to extend the existing logical expressions according to logical rules and then match the logical expressions with the answer options.

Figure 2: An example of a logical reasoning test question: 
Context: If you have no keyboarding skills at all, you will not be able to use a computer. And if you are not able to use a computer you will not be able to write your essays using a computer program.  
Question: If the statements above are true, which of the following must be true? 
Options: 
A.	If you are not able to write your essays using a word processing program, you have no keyboarding skills
B.	If you're able to write your essays using a word processing program, you have at least some keyboarding skills
C.	If you are not able to write your essays using a word processing program, you are not able to use a computer
D.	If you have some keyboarding skills, you will be able to write your essays using a word processing program
Option B is listed in green, indicating that it is the most plausible answer.
Figure 2: example of a logical reasoning test question

To address this logical reasoning challenge in a realistic scenario, the Natural Language Computing Group of Microsoft Research Asia proposed the LReasoner system, which helps the model find the answer to a problem by recognizing logical symbols and logical expressions in the text.

LReasoner improves the reasoning ability of the previous pre-trained language models by using two novel techniques (illustrated in Figure 2): (1) logic-driven context extension framework, which aims to first identify logic symbols from the context and then infer extended logical expressions through logical equivalence law, and (2) logic-driven data augmentation algorithm, which applies contrastive learning to find logically different context to help the model better capture the logical information, especially the logical negation and conditional relationship.

Three panel diagram showing logic-driven context extension framework. More details in the paper.
Figure 3: Logic-driven context extension framework

In surpassing human performance as demonstrated in Figure 1, the LReasoner has taken an essential step towards deeper logical reasoning by AI. The LReasoner system is also one of the first attempts by researchers to apply machine reasoning to real scenarios. In the future, the Natural Language Computing Group of Microsoft Research Asia will continue to explore new tasks and new methods in the field of machine reasoning and promote the research of knowledgeable and interpretable artificial intelligence.

Related publications

Continue reading

See all blog posts