\n\t\tProgramming puzzle examples\t<\/h3>\n<\/a>\n\n\n\n<\/div>\n\n\n\n
Can computers generate valuable, novel challenges?<\/h2>\n\n\n\n
Surprisingly, language models such as Codex and GPT-Neo can indeed create novel puzzles when prompted to generate \u201cmore like these\u201d on a set of example puzzles without solutions. You may wonder what makes a challenge good. Instead of focusing on interesting<\/em>, we prioritize useful<\/em> challenges. Our evaluation has the language model generate, solve, and train on its own puzzles; then we assess whether the training improved its performance on a hidden test set of puzzles. (By now, solutions to our puzzles may have leaked into AI training sets, but with the help of champion competitive programmers, we have created a secret test set that remains unpublished, which can be used for uncontaminated evaluation.) In our experiments with small- to medium-sized language models\u2014with a few billion parameters, much fewer than the latest GPT models\u2014self-training more than doubled success rates.<\/p>\n\n\n\nRisks and limitations<\/h2>\n\n\n\n
This research was conducted prior to GPT-4\u2019s release. While we believe similar techniques may help GPT-4 self-improve in programming, this is an active area of research as we better understand the capabilities and limitations of these models, as well as their appropriate use and the potential consequences of increased programming capabilities. One key limitation of puzzles is that solutions might only work for the specific instance provided. However, this limitation also serves as an advantage in terms of human-AI alignment. Unlike other AI challenges with inherent ambiguities that could lead to unintended consequences if objectives are imprecisely defined (for example, an AI-designed math-tutor app that may become addicting unintendedly), our programming puzzles encompass exactly those standalone problems that can be perfectly verified for meeting a precise objective. As there remains a risk that any work that substantially advances AI programming capabilities can be used in other systems and with unintended consequences, we continue to encourage taking great care before deploying systems with artificially generated code. <\/p>\n\n\n\n
Examples of programming puzzles for AI self-play<\/h2>\n\n\n\n
Each puzzle is specified by a short Python program that checks a possible answer. Each solution is a Python program that outputs an answer in a limited amount of time.<\/p>\n\n\n\n