{"id":474486,"date":"2018-03-22T09:23:26","date_gmt":"2018-03-22T16:23:26","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=474486"},"modified":"2018-03-22T09:23:26","modified_gmt":"2018-03-22T16:23:26","slug":"microsoft-tsinghua-university-work-together-open-academic-data-research","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/microsoft-tsinghua-university-work-together-open-academic-data-research\/","title":{"rendered":"Microsoft and Tsinghua University Work Together on Open Academic Data Research"},"content":{"rendered":"

\"\"<\/p>\n

In a recent collaboration, Microsoft and China\u2019s Tsinghua University released an academic graph, named Open Academic Graph (opens in new tab)<\/span><\/a> (OAG). This billion-scale academic graph integrates the current Microsoft Academic Graph (opens in new tab)<\/span><\/a> (MAG) and Tsinghua’s AMiner (opens in new tab)<\/span><\/a> academic graph. Specifically, it contains the metadata information of 155 million academic paper metadata from AMiner and 166 million papers from MAG. By consolidating metadata information of each, it generates nearly 65 million matching relationships between the two academic graphs [1].<\/p>\n

\"\"

Picture 1: Connections between Tsinghua University AMiner and Microsoft Academic Graph<\/p><\/div>\n

The construction of the billion-scale OAG is challenging, because of the heterogeneous distribution of academic data that exists in the different academic graphs, challenges in terms of homonyms and synonyms, and the need for accuracy in data matching. Some examples:<\/p>\n