{"id":235465,"date":"2015-02-10T06:00:00","date_gmt":"2015-02-10T06:00:00","guid":{"rendered":"https:\/\/blogs.technet.microsoft.com\/inside_microsoft_research\/2015\/02\/10\/microsoft-researchers-algorithm-sets-imagenet-challenge-milestone\/"},"modified":"2018-08-07T21:46:23","modified_gmt":"2018-08-08T04:46:23","slug":"microsoft-researchers-algorithm-sets-imagenet-challenge-milestone","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/microsoft-researchers-algorithm-sets-imagenet-challenge-milestone\/","title":{"rendered":"Microsoft Researchers’ Algorithm Sets ImageNet Challenge Milestone"},"content":{"rendered":"

Posted by Richard Eckel<\/span><\/p>\n

\" (opens in new tab)<\/span><\/a><\/p>\n

The race among computer scientists to build the world\u2019s most accurate computer vision (opens in new tab)<\/span><\/a> system is more of a marathon than a sprint.<\/p>\n

The race\u2019s new leader is a team of Microsoft researchers in Beijing, which this week published a paper in which they noted their computer vision system based on deep convolutional neural networks (opens in new tab)<\/span><\/a> (CNNs) had for the first time eclipsed the abilities of people to classify objects defined in the ImageNet 1000 challenge.<\/p>\n

In their paper, Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification (opens in new tab)<\/span><\/a><\/em>, the researchers say their system achieved a 4.94 percent error rate on the 1000-class ImageNet 2012 classification dataset, which contains about 1.2 million training images, 50,000 validation images, and 100,000 test images. In previous experiments, humans have achieved an estimated 5.1 percent error rate<\/em>.<\/p>\n

\u201cTo our knowledge, our result is the first to surpass human-level performance\u2026on this visual recognition challenge,\u201d the researchers wrote.<\/p>\n\n\n\n\n
\"Jian<\/td>\n<\/tr>\n
Jian Sun, principal researcher at Microsoft<\/strong><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n

The research team comprises 38-year-old Jian Sun (opens in new tab)<\/span><\/a>, principal researcher, and Kaiming He (opens in new tab)<\/span><\/a>, a 30-year-old researcher in Microsoft Research Asia\u2019s Visual Computing Group (opens in new tab)<\/span><\/a>, and two academic interns, Xiangyu Zhang of Xi\u2019an Jiaotong University and Shaoqing Ren of the University of Science and Technology of China.<\/p>\n

Sun, who joined Microsoft Research Asia (opens in new tab)<\/span><\/a> a dozen years ago, earned his bachelor\u2019s, master\u2019s and Ph.D degrees in electrical engineering from Xi\u2019an Jiaotong University, where in 2001 he was a student of Harry Shum (opens in new tab)<\/span><\/a> (@harryshum (opens in new tab)<\/span><\/a>), Microsoft\u2019s executive vice president, Technology and Research, and one of the founding members of Microsoft\u2019s research organization in China. Shum, an IEEE Fellow and an ACM Fellow for his contributions to computer vision and computer graphics, is incredibly proud of his former student\u2019s accomplishment.<\/p>\n

\u201cThe first project Jian worked on with me in 2001, together with Professor Nanning Zheng of Xi\u2019an Jiaotong University, was stereo reconstruction with belief propagation. Jian was among the first to realize the power of using Bayesian belief propagation to solve a large class of computer vision problems with Markov networks (opens in new tab)<\/span><\/a> such as stereo,\u201d Shum said.<\/p>\n

Shum is especially proud of Sun\u2019s Microsoft achievements. \u201cMany of Jian\u2019s research results have been incorporated within Microsoft products, and I am especially excited about the potential of his latest work with deeper neural nets.\u201d<\/p>\n

Sun credits the team\u2019s most recent achievement to two key ideas: the development of more adaptable nonlinear neural units of the neural network, and a better training algorithm that makes the neural network more powerful.<\/p>\n

In the paper, the researchers note that the rectifier neuron (opens in new tab)<\/span><\/a> is one of several keys to the recent success of deep neural networks being applied to computer vision challenges.<\/p>\n

\u201cIn this paper, we investigate neural networks from two aspects particularly driven by the rectifiers,\u201d the researchers wrote. \u201cFirst, we propose a new generalization of ReLU, which we call Parametric Rectified Linear Unit (PReLU). This activation function adaptively learns the parameters of the rectifiers, and improves accuracy at negligible extra computational cost. Second, we study the difficulty of training rectified models that are very deep. By explicitly modeling the nonlinearity of the rectifiers (ReLU\/PReLU), we derive a theoretically sound initialization method, which helps with convergence of very deep models (e.g., with 30 weight layers) trained directly from scratch. This gives us more flexibility to explore more powerful network architectures.\u201d<\/p>\n

Although excited about the team\u2019s algorithm eclipsing human understanding, similar to other researchers in the field, the paper\u2019s authors emphasize that computer vision still cannot match human vision in general, noting that the computing system has challenges with understanding objects, or where contextual understanding or high-level knowledge of a scene is required.<\/p>\n

\u201cWhile our algorithm produces a superior result on this particular dataset, this does not indicate that machine vision outperforms human vision on object recognition in general\u2026On recognizing elementary object categories\u2026machines still have obvious errors in cases that are trivial for humans. Nevertheless, we believe our results show the tremendous potential of machine algorithms to match human-level performance for many visual recognition tasks.\u201d<\/p>\n

With the Chinese New Year (the year of the sheep) approaching on Feb. 19, Sun uses sheep to explain that human-level understanding is still more developed than computer image classification.<\/p>\n

\u201cHumans have no trouble distinguishing between a sheep and a cow. But computers are not perfect with these simple tasks,\u201d Sun explains. \u201cHowever, when it comes to distinguishing between different breeds of sheep, this is where computers outperform humans. The computer can be trained to look at the detail, texture, shape and context of the image and see distinctions that can\u2019t be observed by humans.\u201d<\/p>\n

The work of Sun, He and team isn\u2019t confined to research; it\u2019s already being applied to Microsoft services, including Bing image search (opens in new tab)<\/span><\/a> and OneDrive (opens in new tab)<\/span><\/a>, the company\u2019s online storage solution. In a recent blog post (opens in new tab)<\/span><\/a>, Douglas Pearce (@douglasprc), group program manager, noted how OneDrive now can automatically recognize content in your photos.<\/p>\n

\u201cOur users will have access to automatically grouped collections of photos and they can easily search for specific ones. You\u2019ll be able to quickly find things such as \u2018people,\u2019 \u2018dogs,\u2019 \u2018whiteboard,\u2019 \u2018beach,\u2019 \u2018sunsets,\u2019 and dozens of other terms. This makes it even easier to add your photos in to presentations for school, to relive a specific memory, or to share something important with all of your friends on Facebook,\u201d Pearce said.<\/p>\n

He later suggested that readers interested in how this technology works read this article (opens in new tab)<\/span><\/a> which we posted last fall about the work of these same researchers that speeds deep-learning object-detection systems by as many as 100 times, yet maintains accuracy. The team\u2019s advance was documented in this research paper, Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition (opens in new tab)<\/span><\/a><\/em>.<\/p>\n

\u201cThe Visual Computing team here in Beijing has been devoted to pushing the state-of-art in computer vision, with the ultimate goal of enabling computers to emulate the perceptual capability of humans. I\u2019m proud of their achievements over the years, which have not only impacted the academic world through the contribution of high-quality publications, but also empowered Microsoft products through technology transfers (opens in new tab)<\/span><\/a>,\u201d said Hsiao-Wuen Hon (opens in new tab)<\/span><\/a>, chairman of Microsoft\u2019s Asia-Pacific R&D Group, and managing director of Microsoft Research Asia.<\/p>\n

The computer vision marathon gained momentum in 2010 when scientists from Stanford, Princeton and Columbia universities started the Large Scale Visual Recognition Challenge (opens in new tab)<\/span><\/a>. According to an August 2014 New York Times article by noted technology industry journalist John Markoff (@markoff), accuracy almost doubled in the 2014 competition and error rates were cut in half. Most recently, Baidu researchers (opens in new tab)<\/span><\/a> have published a paper in which they claim to have achieved \u201ca top-5 error rate of 5.33%.\u201d against the ImageNet classification challenge.<\/p>\n

The marathon continues; this year\u2019s challenge (opens in new tab)<\/span><\/a>\u00a0will take place in December. But that isn\u2019t the primary focus of Sun, He and team. \u201cOur goal is to develop systems that are as good as, or better, at recognizing images than humans on many useful applications,\u201d Sun said. \u201cFor that to happen, we need more training data and more real-world test scenarios. It\u2019s our work with Bing, OneDrive and other services that will help us improve the robustness of our algorithm.\u201d<\/p>\n

See also:<\/p>\n