{"id":235465,"date":"2015-02-10T06:00:00","date_gmt":"2015-02-10T06:00:00","guid":{"rendered":"https:\/\/blogs.technet.microsoft.com\/inside_microsoft_research\/2015\/02\/10\/microsoft-researchers-algorithm-sets-imagenet-challenge-milestone\/"},"modified":"2018-08-07T21:46:23","modified_gmt":"2018-08-08T04:46:23","slug":"microsoft-researchers-algorithm-sets-imagenet-challenge-milestone","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/microsoft-researchers-algorithm-sets-imagenet-challenge-milestone\/","title":{"rendered":"Microsoft Researchers&#8217; Algorithm Sets ImageNet Challenge Milestone"},"content":{"rendered":"<p class=\"posted-by\">Posted by <span class=\"author\">Richard Eckel<\/span><\/p>\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/msdnshared.blob.core.windows.net\/media\/TNBlogsFS\/prod.evol.blogs.technet.com\/CommunityServer.Blogs.Components.WeblogFiles\/00\/00\/00\/90\/35\/imagenet2-550.jpg\"><img decoding=\"async\" src=\"https:\/\/msdnshared.blob.core.windows.net\/media\/TNBlogsFS\/prod.evol.blogs.technet.com\/CommunityServer.Blogs.Components.WeblogFiles\/00\/00\/00\/90\/35\/imagenet2-550.jpg\" alt=\" \" border=\"0\" \/><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n<p>The race among computer scientists to build the world\u2019s most accurate <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" title=\"computer vision research at Microsoft\" href=\"http:\/\/research.microsoft.com\/en-us\/about\/our-research\/computer-vision.aspx\" target=\"_blank\" rel=\"noopener\">computer vision<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> system is more of a marathon than a sprint.<\/p>\n<p>The race\u2019s new leader is a team of Microsoft researchers in Beijing, which this week published a paper in which they noted their computer vision system based on deep <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" title=\"convolutional neural networks\" href=\"http:\/\/en.wikipedia.org\/wiki\/Convolutional_neural_network\" target=\"_blank\" rel=\"noopener\">convolutional neural networks<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (CNNs) had for the first time eclipsed the abilities of people to classify objects defined in the ImageNet 1000 challenge.<\/p>\n<p>In their paper, <em><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" title=\"Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification\" href=\"http:\/\/arxiv.org\/abs\/1502.01852\" target=\"_blank\" rel=\"noopener\">Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/em>, the researchers say their system achieved a 4.94 percent error rate on the 1000-class ImageNet 2012 classification dataset, which contains about 1.2 million training images, 50,000 validation images, and 100,000 test images. In previous experiments, humans have achieved an estimated <em>5.1 percent error rate<\/em>.<\/p>\n<p>\u201cTo our knowledge, our result is the first to surpass human-level performance\u2026on this visual recognition challenge,\u201d the researchers wrote.<\/p>\n<table style=\"margin: 8px; width: 260px;\" border=\"0\" cellspacing=\"1\" cellpadding=\"5\" align=\"right\">\n<tbody>\n<tr>\n<td><img decoding=\"async\" style=\"float: right; margin: 10px;\" title=\"Jian Sun\" src=\"https:\/\/msdnshared.blob.core.windows.net\/media\/TNBlogsFS\/prod.evol.blogs.technet.com\/CommunityServer.Blogs.Components.WeblogFiles\/00\/00\/00\/90\/35\/jian-sun-imagenet_250.jpg\" alt=\"Jian Sun\" \/><\/td>\n<\/tr>\n<tr>\n<td><strong>Jian Sun, principal researcher at Microsoft<\/strong><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The research team comprises 38-year-old <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" title=\"Microsoft researcher Jian Sun\" href=\"http:\/\/research.microsoft.com\/en-us\/people\/jiansun\/\" target=\"_blank\" rel=\"noopener\">Jian Sun<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, principal researcher, and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" title=\"Kaiming He\" href=\"http:\/\/research.microsoft.com\/en-us\/um\/people\/kahe\/\" target=\"_blank\" rel=\"noopener\">Kaiming He<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, a 30-year-old researcher in Microsoft Research Asia\u2019s <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" title=\"Visual Computing Group\" href=\"http:\/\/research.microsoft.com\/en-us\/groups\/vc\/\" target=\"_blank\" rel=\"noopener\">Visual Computing Group<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, and two academic interns, Xiangyu Zhang of Xi\u2019an Jiaotong University and Shaoqing Ren of the University of Science and Technology of China.<\/p>\n<p>Sun, who joined <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" title=\"Microsoft Research Asia\" href=\"http:\/\/msra.cn\/zh-cn\/\" target=\"_blank\" rel=\"noopener\">Microsoft Research Asia<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> a dozen years ago, earned his bachelor\u2019s, master\u2019s and Ph.D degrees in electrical engineering from Xi\u2019an Jiaotong University, where in 2001 he was a student of <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" title=\"Harry Shum\" href=\"http:\/\/news.microsoft.com\/exec\/harry-shum\/\" target=\"_blank\" rel=\"noopener\">Harry Shum<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" title=\"Follow Harry Shum on Twitter\" href=\"https:\/\/x.com\/harryshum\" target=\"_blank\" rel=\"noopener\">@harryshum<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>), Microsoft\u2019s executive vice president, Technology and Research, and one of the founding members of Microsoft\u2019s research organization in China. Shum, an IEEE Fellow and an ACM Fellow for his contributions to computer vision and computer graphics, is incredibly proud of his former student\u2019s accomplishment.<\/p>\n<p>\u201cThe first project Jian worked on with me in 2001, together with Professor Nanning Zheng of Xi\u2019an Jiaotong University, was stereo reconstruction with belief propagation. Jian was among the first to realize the power of using Bayesian belief propagation to solve a large class of computer vision problems with <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" title=\"Markov networks\" href=\"http:\/\/en.wikipedia.org\/wiki\/Markov_random_field\" target=\"_blank\" rel=\"noopener\">Markov networks<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> such as stereo,\u201d Shum said.<\/p>\n<p>Shum is especially proud of Sun\u2019s Microsoft achievements. \u201cMany of Jian\u2019s research results have been incorporated within Microsoft products, and I am especially excited about the potential of his latest work with deeper neural nets.\u201d<\/p>\n<p>Sun credits the team\u2019s most recent achievement to two key ideas: the development of more adaptable nonlinear neural units of the neural network, and a better training algorithm that makes the neural network more powerful.<\/p>\n<p>In the paper, the researchers note that the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" title=\"rectifier neuron\" href=\"http:\/\/en.wikipedia.org\/wiki\/Rectifier_(neural_networks)\" target=\"_blank\" rel=\"noopener\">rectifier neuron<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> is one of several keys to the recent success of deep neural networks being applied to computer vision challenges.<\/p>\n<p>\u201cIn this paper, we investigate neural networks from two aspects particularly driven by the rectifiers,\u201d the researchers wrote. \u201cFirst, we propose a new generalization of ReLU, which we call Parametric Rectified Linear Unit (PReLU). This activation function adaptively learns the parameters of the rectifiers, and improves accuracy at negligible extra computational cost. Second, we study the difficulty of training rectified models that are very deep. By explicitly modeling the nonlinearity of the rectifiers (ReLU\/PReLU), we derive a theoretically sound initialization method, which helps with convergence of very deep models (e.g., with 30 weight layers) trained directly from scratch. This gives us more flexibility to explore more powerful network architectures.\u201d<\/p>\n<p>Although excited about the team\u2019s algorithm eclipsing human understanding, similar to other researchers in the field, the paper\u2019s authors emphasize that computer vision still cannot match human vision in general, noting that the computing system has challenges with understanding objects, or where contextual understanding or high-level knowledge of a scene is required.<\/p>\n<p>\u201cWhile our algorithm produces a superior result on this particular dataset, this does not indicate that machine vision outperforms human vision on object recognition in general\u2026On recognizing elementary object categories\u2026machines still have obvious errors in cases that are trivial for humans. Nevertheless, we believe our results show the tremendous potential of machine algorithms to match human-level performance for many visual recognition tasks.\u201d<\/p>\n<p>With the Chinese New Year (the year of the sheep) approaching on Feb. 19, Sun uses sheep to explain that human-level understanding is still more developed than computer image classification.<\/p>\n<p>\u201cHumans have no trouble distinguishing between a sheep and a cow. But computers are not perfect with these simple tasks,\u201d Sun explains. \u201cHowever, when it comes to distinguishing between different breeds of sheep, this is where computers outperform humans. The computer can be trained to look at the detail, texture, shape and context of the image and see distinctions that can\u2019t be observed by humans.\u201d<\/p>\n<p>The work of Sun, He and team isn\u2019t confined to research; it\u2019s already being applied to Microsoft services, including <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" title=\"Bing image search\" href=\"http:\/\/www.bing.com\https://www.microsoft.com/images\/search?q=sheep&qs=n&form=QBILPG&pq=sheep&sc=8-2&sp=-1&sk=\" target=\"_blank\" rel=\"noopener\">Bing image search<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" title=\"OneDrive\" href=\"https:\/\/onedrive.live.com\/about\/en-us\/\" target=\"_blank\" rel=\"noopener\">OneDrive<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, the company\u2019s online storage solution. In a recent <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" title=\"View, manage, share photos in OneDrive\" href=\"https:\/\/blog.onedrive.com\/introducing-an-all-new-way-to-view-manage-and-share-your-photos-in-onedrive\/\" target=\"_blank\" rel=\"noopener\">blog post<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, Douglas Pearce (@douglasprc), group program manager, noted how OneDrive now can automatically recognize content in your photos.<\/p>\n<p>\u201cOur users will have access to automatically grouped collections of photos and they can easily search for specific ones. You\u2019ll be able to quickly find things such as \u2018people,\u2019 \u2018dogs,\u2019 \u2018whiteboard,\u2019 \u2018beach,\u2019 \u2018sunsets,\u2019 and dozens of other terms. This makes it even easier to add your photos in to presentations for school, to relive a specific memory, or to share something important with all of your friends on Facebook,\u201d Pearce said.<\/p>\n<p>He later suggested that readers interested in how this technology works read <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" title=\"Deep-learning and object detection\" href=\"http:\/\/research.microsoft.com\/en-us\/news\/features\/spp-102914.aspx\" target=\"_blank\" rel=\"noopener\">this article<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> which we posted last fall about the work of these same researchers that speeds deep-learning object-detection systems by as many as 100 times, yet maintains accuracy. The team\u2019s advance was documented in this research paper, <em><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" title=\"Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition\" href=\"http:\/\/arxiv.org\/pdf\/1406.4729v1.pdf\" target=\"_blank\" rel=\"noopener\">Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/em>.<\/p>\n<p>\u201cThe Visual Computing team here in Beijing has been devoted to pushing the state-of-art in computer vision, with the ultimate goal of enabling computers to emulate the perceptual capability of humans. I\u2019m proud of their achievements over the years, which have not only impacted the academic world through the contribution of high-quality publications, but also <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" title=\"Product Contributions from Microsoft Research\" href=\"http:\/\/research.microsoft.com\/en-us\/about\/techtransfer\/default.aspx\" target=\"_blank\" rel=\"noopener\">empowered Microsoft products through technology transfers<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>,\u201d said <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" title=\"Hsiao-Wuen Hon\" href=\"http:\/\/research.microsoft.com\/en-us\/people\/hon\/\" target=\"_blank\" rel=\"noopener\">Hsiao-Wuen Hon<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, chairman of Microsoft\u2019s Asia-Pacific R&D Group, and managing director of Microsoft Research Asia.<\/p>\n<p>The computer vision marathon gained momentum in 2010 when scientists from Stanford, Princeton and Columbia universities started the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" title=\"Large Scale Visual Recognition Challenge\" href=\"http:\/\/image-net.org\/challenges\/LSVRC\/2014\/\" target=\"_blank\" rel=\"noopener\">Large Scale Visual Recognition Challenge<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. According to an August 2014 New York Times article by noted technology industry journalist John Markoff (@markoff), accuracy almost doubled in the 2014 competition and error rates were cut in half. Most recently, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" title=\"Baidu researchers\" href=\"http:\/\/arxiv.org\/abs\/1501.02876\" target=\"_blank\" rel=\"noopener\">Baidu researchers<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> have published a paper in which they claim to have achieved \u201ca top-5 error rate of 5.33%.\u201d against the ImageNet classification challenge.<\/p>\n<p>The marathon continues; this year\u2019s <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" title=\"2015 ImageNet challenge\" href=\"http:\/\/image-net.org\/challenges\/LSVRC\/2015\/\" target=\"_blank\" rel=\"noopener\">challenge<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0will take place in December. But that isn\u2019t the primary focus of Sun, He and team. \u201cOur goal is to develop systems that are as good as, or better, at recognizing images than humans on many useful applications,\u201d Sun said. \u201cFor that to happen, we need more training data and more real-world test scenarios. It\u2019s our work with Bing, OneDrive and other services that will help us improve the robustness of our algorithm.\u201d<\/p>\n<p>See also:<\/p>\n<ul>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" title=\"Microsoft's Project Adam\" href=\"http:\/\/research.microsoft.com\/en-us\/news\/features\/dnnvision-071414.aspx\" target=\"_blank\" rel=\"noopener\">Microsoft&#8217;s Project Adam<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" title=\"Harry Shum Project Adam demo\" href=\"https:\/\/www.youtube.com\/watch?v=zOPIvC0MlA4\" target=\"_blank\" rel=\"noopener\">Harry Shum Project Adam demo<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" title=\"ImageNet 2014 Competition\" href=\"http:\/\/image-net.org\/challenges\/LSVRC\/2014\/index\" target=\"_blank\" rel=\"noopener\">ImageNet 2014 Competition<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" title=\"ImageNet Large Scale Visual Recognition Challenge\" href=\"http:\/\/arxiv.org\/abs\/1409.0575\" target=\"_blank\" rel=\"noopener\">ImageNet Large Scale Visual Recognition Challenge<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Posted by Richard Eckel The race among computer scientists to build the world\u2019s most accurate computer vision system is more of a marathon than a sprint. The race\u2019s new leader is a team of Microsoft researchers in Beijing, which this week published a paper in which they noted their computer vision system based on deep [&hellip;]<\/p>\n","protected":false},"author":30766,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"footnotes":""},"categories":[194471,194480,1],"tags":[186897,201109,195766,201951,202183,202305,196903,204507],"research-area":[13562],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-235465","post","type-post","status-publish","format-standard","hentry","category-computer-vision","category-graphics-and-multimedia","category-research-blog","tag-computer-vision","tag-convolutional-neural-networks","tag-harry-shum","tag-imagenet","tag-jian-sun","tag-kaiming-he","tag-project-adam","tag-visual-computing-group","msr-research-area-computer-vision","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199560],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[],"related-events":[],"related-researchers":[],"msr_type":"Post","byline":"","formattedDate":"February 10, 2015","formattedExcerpt":"Posted by Richard Eckel The race among computer scientists to build the world\u2019s most accurate computer vision system is more of a marathon than a sprint. The race\u2019s new leader is a team of Microsoft researchers in Beijing, which this week published a paper in&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/235465"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/30766"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=235465"}],"version-history":[{"count":2,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/235465\/revisions"}],"predecessor-version":[{"id":499796,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/235465\/revisions\/499796"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=235465"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=235465"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=235465"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=235465"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=235465"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=235465"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=235465"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=235465"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=235465"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=235465"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=235465"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}