[CSRR] Invited Talk: Mor Geva

Mor Geva: "Debugging Transformer Language Models Inside Out."


The prediction construction process of transformer language models (LMs) is opaque and largely not understood. This makes it difficult to understand LM predictions and fix problematic model behavior. This talk presents recent efforts to tackle this by reverse-engineering the operation of the feed-forward network (FFN) layers in transformers. First, I will describe an interpretation approach to the prediction process in LMs and analyze the operation of the FFN layers in that process. We will show that the output from each FFN layer can be decomposed as a collection of updates to the model’s output distribution, where updates are induced by parameter vectors that often promote human-interpretable concepts. Then, I will demonstrate the utility of these findings beyond interpretability, in the context of controlled language generation and computation efficiency. We will conclude this talk with a demonstration of LM-Debugger, an open-source interactive tool that, based on these findings, allows inspecting and intervening in predictions of transformer LMs.


Mor Geva is a researcher at the Allen Institute for AI (AI2) and a Ph.D. candidate in Computer Science at Tel Aviv University. Her research focuses on developing systems that can reason over text in a robust and interpretable manner. During her Ph.D., Mor interned at AI2, Google AI, and Microsoft Media AI. She was awarded the Dan David prize for graduate students in the field of AI, was nominated as one of the MIT Rising Stars in EECS, and is a laureate of the Séphora Berrebi scholarship in Computer Science.