Large Language Models Cannot Explain Themselves

Advait Sarkar

Large Language Models Cannot Explain Themselves

Advait Sarkar

ACM CHI 2024 Workshop on Human-Centered Explainable AI (HCXAI 2024) | May 2024

Download BibTex

Large language models can be prompted to produce text. They can also be prompted to produce “explanations” of their output. But these are not really explanations, because they do not accurately reflect the mechanical process underlying the prediction. The illusion that they reflect the reasoning process can result in significant harms. These “explanations” can be valuable, but for promoting critical thinking rather than for understanding the model. I propose a recontextualisation of these “explanations”, using the term “exoplanations” to draw attention to their exogenous nature. I discuss some implications for design and technology, such as the inclusion of appropriate guardrails and responses when models are prompted to generate explanations.