{"id":936864,"date":"2023-05-02T09:00:00","date_gmt":"2023-05-02T16:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=936864"},"modified":"2023-06-12T10:33:40","modified_gmt":"2023-06-12T17:33:40","slug":"ai-self-play-for-algorithm-design","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/ai-self-play-for-algorithm-design\/","title":{"rendered":"AI self-play for algorithm design"},"content":{"rendered":"\n

This research was accepted by the 2023 International Conference on Learning Representations (ICLR) (opens in new tab)<\/span><\/a>, which is dedicated to the advancement of the branch of artificial intelligence generally referred to as deep learning.<\/em><\/p>\n\n\n\n

\"A
A self-play pipeline for a language model (LM) to improve itself in a fully automatic manner. First, the LM generates novel puzzles based on a training set of handwritten puzzles. Then, the LM attempts to solve each of these puzzles 100 times. In Step 3, the computer (specifically a Python interpreter) filters the candidate solutions for correctness. Finally, the LM is improved by further training on these verified correct solutions to synthetic puzzles, and the process repeats. This process leads to significant improvements as measured on held-out test puzzles, which were also handwritten.<\/figcaption><\/figure>\n\n\n\n

Efficient algorithms are crucial for many purposes, including reducing energy consumption in digital devices. While humans outperform AI systems at designing such algorithms, we show how to improve AI programming abilities using self-play, a technique that has helped AI systems dominate in games such as chess and Go.<\/p>\n\n\n\n

Designing fast and accurate algorithms requires high-level abstract reasoning, which remains difficult for AI systems. Our approach involves having the AI design and solve its own programming challenges, enabling practice on millions of artificial challenges and exploration of problem types not found in public repositories. We detail our work in a new paper, \u201cLanguage Models Can Teach Themselves to Program Better,\u201d (opens in new tab)<\/span><\/a> which we\u2019re presenting at the 2023 International Conference on Learning Representations (ICLR) (opens in new tab)<\/span><\/a>.<\/p>\n\n\n\n

<\/div>\n\n\n\n\t
\n\t\t\n\n\t\t

\n\t\ton-demand event<\/span>\n\t<\/p>\n\t\n\t

\n\t\t\t\t\t\t
\n\t\t\t\t\n\t\t\t\t\t\"Research\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t
\n\n\t\t\t\t\t\t\t\t\t

Microsoft Research Forum Episode 4<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t

Learn about the latest multimodal AI models, advanced benchmarks for AI evaluation and model self-improvement, and an entirely new kind of computer for AI inference and hard optimization. <\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t

\n\t\t\t\t\t
\n\t\t\t\t\t\t\n\t\t\t\t\t\t\tWatch on-demand\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div>\n\t<\/div>\n\t<\/div>\n\t\n\n\n

The key challenge and our solution<\/h2>\n\n\n\n

How can an AI system generate novel algorithmic programming problems without knowing the solution<\/em>?<\/p>\n\n\n\n

Our approach uses programming puzzles (opens in new tab)<\/span><\/a> introduced by Microsoft Research in 2021. These puzzles\u2014known in complexity theory as the class of \u201cNP\u201d (opens in new tab)<\/span><\/a> decision problems\u2014are easy to check for correctness (no hidden answer key) but often difficult to solve. In this way, they\u2019re like a Rubik\u2019s cube, where it\u2019s trivial to recognize a solution but hard to find one. Three examples are illustrated below: a novel string challenge and the classic Towers of Hanoi and factoring problems. Programming puzzles can range from trivial to major open problems in algorithms and mathematics, and solving them requires all the major algorithmic techniques, such as dynamic programming and greedy algorithms. However, each puzzle just checks a single input as opposed to standard problems in algorithms, which require a solution that scales efficiently for all<\/em> inputs, which is much harder to test.<\/p>\n\n\n\n\t

\n\t\tProgramming puzzle examples\t<\/h3>\n<\/a>\n\n\n\n
<\/div>\n\n\n\n

Can computers generate valuable, novel challenges?<\/h2>\n\n\n\n

Surprisingly, language models such as Codex and GPT-Neo can indeed create novel puzzles when prompted to generate \u201cmore like these\u201d on a set of example puzzles without solutions. You may wonder what makes a challenge good. Instead of focusing on interesting<\/em>, we prioritize useful<\/em> challenges. Our evaluation has the language model generate, solve, and train on its own puzzles; then we assess whether the training improved its performance on a hidden test set of puzzles. (By now, solutions to our puzzles may have leaked into AI training sets, but with the help of champion competitive programmers, we have created a secret test set that remains unpublished, which can be used for uncontaminated evaluation.) In our experiments with small- to medium-sized language models\u2014with a few billion parameters, much fewer than the latest GPT models\u2014self-training more than doubled success rates.<\/p>\n\n\n\n

Risks and limitations<\/h2>\n\n\n\n

This research was conducted prior to GPT-4\u2019s release. While we believe similar techniques may help GPT-4 self-improve in programming, this is an active area of research as we better understand the capabilities and limitations of these models, as well as their appropriate use and the potential consequences of increased programming capabilities. One key limitation of puzzles is that solutions might only work for the specific instance provided. However, this limitation also serves as an advantage in terms of human-AI alignment. Unlike other AI challenges with inherent ambiguities that could lead to unintended consequences if objectives are imprecisely defined (for example, an AI-designed math-tutor app that may become addicting unintendedly), our programming puzzles encompass exactly those standalone problems that can be perfectly verified for meeting a precise objective. As there remains a risk that any work that substantially advances AI programming capabilities can be used in other systems and with unintended consequences, we continue to encourage taking great care before deploying systems with artificially generated code.  <\/p>\n\n\n\n

Examples of programming puzzles for AI self-play<\/h2>\n\n\n\n

Each puzzle is specified by a short Python program that checks a possible answer. Each solution is a Python program that outputs an answer in a limited amount of time.<\/p>\n\n\n\n

Example 1: Towers of Hanoi<\/h3>\n\n\n\n
\"A<\/figure>\n\n\n\n

The goal of the well-known Towers of Hanoi (opens in new tab)<\/span><\/a> puzzle is to move all the disks from the first tower to the last tower, one by one, without ever putting a bigger disk on top of a smaller disk. It\u2019s easy to check that a solution is correct but hard to find a correct solution. Even though the number of steps required to solve it is exponential in the number of disks, there\u2019s a solution in the form of a short program that is often used to teach recursion. The clever solution program that outputs the moves is easier to find than the sequence of moves itself. Here are the programming puzzle and solution:<\/p>\n\n\n\n