{"id":986238,"date":"2023-11-28T17:43:34","date_gmt":"2023-11-29T01:43:34","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=986238"},"modified":"2023-11-28T17:43:35","modified_gmt":"2023-11-29T01:43:35","slug":"the-power-of-prompting","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/the-power-of-prompting\/","title":{"rendered":"The Power of Prompting"},"content":{"rendered":"\n
\"Illustrated<\/figure>\n\n\n\n

Today, we published an exploration<\/a> of the power of prompting strategies that demonstrates how the generalist GPT-4 model can perform as a specialist on medical challenge problem benchmarks. The study shows GPT-4\u2019s ability to outperform a leading model that was fine-tuned specifically for medical applications, on the same benchmarks and by a significant margin. These results are among other recent studies that show how prompting strategies alone can be effective in evoking this kind of domain-specific expertise from generalist foundation models.  <\/p>\n\n\n\n

\"A
Figure 1: Visual illustration of Medprompt components and additive contributions to performance on the MedQA benchmark. Prompting strategy combines kNN-based few-shot example selection, GPT-4\u2013generated chain-of-thought prompting, and answer-choice shuffled ensembling.<\/figcaption><\/figure>\n\n\n\n

During early evaluations<\/a> of the capabilities of GPT-4, we were excited to see glimmers of general problem-solving skills, with surprising polymathic<\/em> capabilities of abstraction, generalization, and composition\u2014including the ability to weave together concepts across disciplines. Beyond these general reasoning powers, we discovered that GPT-4 could be steered via prompting to serve as a domain-specific specialist in numerous areas. Previously, eliciting these capabilities required fine-tuning the language models with specially curated data to achieve top performance in specific domains. This poses the question of whether more extensive training of generalist foundation models might reduce the need for fine-tuning.<\/p>\n\n\n\n

In a study shared in March<\/a>, we demonstrated how very simple prompting strategies revealed GPT-4\u2019s strengths in medical knowledge without special fine-tuning. The results showed how the \u201cout-of-the-box\u201d model could ace a battery of medical challenge problems with basic prompts. In our more recent study, we show how the composition of several prompting strategies into a method that we refer to as \u201cMedprompt\u201d can efficiently steer GPT-4 to achieve top performance. In particular, we find that GPT-4 with Medprompt:\u00a0<\/p>\n\n\n\n