{"id":625260,"date":"2019-12-09T08:16:44","date_gmt":"2019-12-09T16:16:44","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=625260"},"modified":"2019-12-09T08:16:44","modified_gmt":"2019-12-09T16:16:44","slug":"project-petridish-efficient-forward-neural-architecture-search","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/project-petridish-efficient-forward-neural-architecture-search\/","title":{"rendered":"Project Petridish: Efficient forward neural architecture search"},"content":{"rendered":"

\"Animation (opens in new tab)<\/span><\/a><\/p>\n

Having experience in deep learning doesn\u2019t hurt when it comes to the often mysterious, time- and cost-consuming process of hunting down an appropriate neural architecture. But truth be told, no one really knows what works the best on a new dataset and task. Relying on well-known, top-performing networks provides few guarantees in a space where your dataset can look very different from anything those proven networks have encountered before. For example, a network that worked well on satellite images won\u2019t necessarily work well on the selfies and food photos making the rounds on social media. Even when a task dataset is similar to other common datasets and a bit of prior knowledge can be utilized by starting with similar architectures, it\u2019s challenging to find architectures that satisfy not only accuracy, but also memory and latency constraints, among others, at serving time. These challenges could lead to a frustrating amount of trial and error.<\/p>\n

In our paper \u201cEfficient Forward Architecture Search,\u201d (opens in new tab)<\/span><\/a> which is being presented at the 33rd Conference on Neural Information Processing Systems (NeurIPS) (opens in new tab)<\/span><\/a>, we introduce Petridish, a neural architecture search algorithm that opportunistically adds new layers determined to be beneficial to a parent model, resulting in a gallery<\/em> of models capable of satisfying a variety of constraints for researchers and engineers to choose from. The team behind the ongoing work is comprised of myself, Carnegie Mellon University PhD graduate Hanzhang Hu (opens in new tab)<\/span><\/a>, and John Langford (opens in new tab)<\/span><\/a>, Partner Research Manager; Rich Caruana (opens in new tab)<\/span><\/a>, Senior Principal Researcher; Shital Shah (opens in new tab)<\/span><\/a>, Principal Research Software Engineer; Saurajit Mukherjee (opens in new tab)<\/span><\/a>, Principal Engineering Manager; and Eric Horvitz (opens in new tab)<\/span><\/a>, Technical Fellow and Director, Microsoft Research AI.<\/p>\n

With Petridish, we seek to increase efficiency and speed in finding suitable neural architectures, making the process easier for those in the field, as well as those without expertise interested in machine learning solutions.<\/p>\n

Neural architecture search\u2014forward search vs. backward search<\/strong><\/h3>\n

The machine learning subfield of neural architecture search (NAS) aims to take the guesswork out of people\u2019s hands and let algorithms search for good architectures. While NAS experienced a resurgence in 2016 (opens in new tab)<\/span><\/a> and has become a very popular topic (see the AutoML Freiburg-Hannover website (opens in new tab)<\/span><\/a> for a continuously updated compilation of published papers), the earliest papers on the topic date back to NeurIPS 1988 (opens in new tab)<\/span><\/a> and NeurIPS 1989 (opens in new tab)<\/span><\/a>. Most of the well-known NAS algorithms today, such as Efficient Neural Architecture Search (ENAS) (opens in new tab)<\/span><\/a>, Differentiable Architecture Search (DARTS) (opens in new tab)<\/span><\/a>, and ProxylessNAS (opens in new tab)<\/span><\/a>, are examples of backward search. During backward search, smaller networks are sampled from a supergraph, a large architecture containing multiple subarchitectures. A limitation of backward search algorithms is that human domain knowledge is needed to create a supergraph in the first place. In contrast, Petridish is an example of forward search, a paradigm first introduced 30 years ago by Scott Fahlman and Christian Lebiere of Carnegie Mellon University in that 1989 NAS NeurIPS paper. Forward search requires far less human knowledge when it comes to search space design.<\/p>\n

Petridish, which was also inspired by gradient boosting, creates as its search output a gallery of models to choose from, incorporates stop-forward and stop-gradient layers in more efficiently identifying beneficial candidates for building that gallery, and uses asynchronous training.<\/p>\n

\"Figure

(opens in new tab)<\/span><\/a> Figure 1: Petridish, a neural architecture search algorithm that grows a nominal seed model during search by opportunistically adding layers as needed, comprises three phases. Phase 0 starts with the small parent model. In Phase 1, a large number of candidates is considered for addition to the parent. If a candidate is promising, then it\u2019s added to the parent in Phase 2. Models in Phase 2 that lie near the boundary of the current estimate of the Pareto frontier (see Figure 2) are then added to the pool of parent models in Phase 0 so they have the chance to grow further.<\/p><\/div>\n

Overview of Petridish <\/strong><\/h3>\n

There are three main phases to Petridish:<\/p>\n