LLM-MRI Python module: a brain scanner for LLMs

Luiz Costa; Mateus Figênio; André Santanchè; Luiz Gomes-Jr

doi:10.5753/sbbd_estendido.2024.243136

Luiz Costa Universidade Estadual de Campinas (UNICAMP) http://orcid.org/0009-0005-3838-522X
Mateus Figênio Universidade Tecnológica Federal do Paraná (UTFPR)
André Santanchè Universidade Estadual de Campinas (UNICAMP)
Luiz Gomes-Jr Universidade Tecnológica Federal do Paraná (UTFPR)

DOI: https://doi.org/10.5753/sbbd_estendido.2024.243136

Resumo

LLMs (Large Language Models) have demonstrated human-level language and knowledge acquisition skills in several tasks. However, despite the recent success and broad use, understanding how these skills are learned and encoded inside the underlying neural network is still challenging. The goal of the LLM-MRI package is to simplify the study of activation patterns in any transformer-based LLM, similarly to how MRI (magnetic resonance imaging) simplifies with biological brains. The package, written for the Python language, allows the mapping of neural regions using a parameterized reduction of the model's dimensionality. Neural regions can be visualized according to the forward-pass activations stimulated by a set of documents. Similarly, the package enables the creation of graph models representing the interlayer network of connections stimulated by a set of documents. These features allow for qualitative and quantitative assessments of the underlying structure of activations, depending on the type of documents that the LLM model is exposed to.

Palavras-chave: Neural Networks, Interpretability, Large Language Models

Referências

Bengio, Y., Ducharme, R., and Vincent, P. (2000). A neural probabilistic language model. In Leen, T., Dietterich, T., and Tresp, V., editors, Advances in Neural Information Processing Systems, volume 13. MIT Press.

Dalvi, F., Durrani, N., Sajjad, H., Belinkov, Y., Bau, A., and Glass, J. (2019). What is one grain of sand in the desert? analyzing individual neurons in deep nlp models. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01):6309–6317.

DeRose, J. F., Wang, J., and Berger, M. (2020). Attention flows: Analyzing and comparing attention mechanisms in language models.

Geva, M., Caciularu, A., Wang, K. R., and Goldberg, Y. (2022). Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space.

Hiter, S. (2024). Top 20 generative ai tools and applications in 2024. Disponível em: [link].

Hoover, B., Strobelt, H., and Gehrmann, S. (2019). exbert: A visual analysis tool to explore learned representations in transformers models.

Naveed, H., Khan, A. U., Qiu, S., Saqib, M., Anwar, S., Usman, M., Akhtar, N., Barnes, N., and Mian, A. (2024). A comprehensive overview of large language models.

Samek, W., Wiegand, T., and Müller, K.-R. (2017). Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models.

Tunstall, L., Von Werra, L., and Wolf, T. (2022). Natural language processing with transformers. ” O’Reilly Media, Inc.”.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems.