Grad-SAM: Explaining Transformers via Gradient Self-Attention Maps

Oren Barkan; Edan Hauon; Avi Caciularu; Ori Katz; Itzik Malkiel; Omri Armstrong; Noam Koenigstein

Grad-SAM: Explaining Transformers via Gradient Self-Attention Maps

Oren Barkan ,
Edan Hauon ,
Avi Caciularu ,
Ori Katz ,
Itzik Malkiel ,
Omri Armstrong ,
Noam Koenigstein

30th ACM International Conference on Information & Knowledge Management | October 2021

Download BibTex

Transformer-based language models significantly advanced the state-of-the-art in many linguistic tasks. As this revolution continues, the ability to explain model predictions has become a major area of interest for the NLP community. In this work, we present Gradient Self-Attention Maps (Grad-SAM) – a novel gradient-based method that analyzes self-attention units and identifies the input elements that explain the model’s prediction the best. Extensive evaluations on various benchmarks show that Grad-SAM obtains significant improvements over state-of-the-art alternatives.