Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Garcia, Edith Natalia Villegas, Ansuini, Alessio
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Biomolecules
Online Access:	https://arxiv.org/abs/2502.09135
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912230987005952
author	Garcia, Edith Natalia Villegas Ansuini, Alessio
author_facet	Garcia, Edith Natalia Villegas Ansuini, Alessio
contents	The rapid advancements in transformer-based language models have revolutionized natural language processing, yet understanding the internal mechanisms of these models remains a significant challenge. This paper explores the application of sparse autoencoders (SAE) to interpret the internal representations of protein language models, specifically focusing on the ESM-2 8M parameter model. By performing a statistical analysis on each latent component's relevance to distinct protein annotations, we identify potential interpretations linked to various protein characteristics, including transmembrane regions, binding sites, and specialized motifs. We then leverage these insights to guide sequence generation, shortlisting the relevant latent components that can steer the model towards desired targets such as zinc finger domains. This work contributes to the emerging field of mechanistic interpretability in biological sequence models, offering new perspectives on model steering for sequence design.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_09135
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Interpreting and Steering Protein Language Models through Sparse Autoencoders Garcia, Edith Natalia Villegas Ansuini, Alessio Machine Learning Biomolecules The rapid advancements in transformer-based language models have revolutionized natural language processing, yet understanding the internal mechanisms of these models remains a significant challenge. This paper explores the application of sparse autoencoders (SAE) to interpret the internal representations of protein language models, specifically focusing on the ESM-2 8M parameter model. By performing a statistical analysis on each latent component's relevance to distinct protein annotations, we identify potential interpretations linked to various protein characteristics, including transmembrane regions, binding sites, and specialized motifs. We then leverage these insights to guide sequence generation, shortlisting the relevant latent components that can steer the model towards desired targets such as zinc finger domains. This work contributes to the emerging field of mechanistic interpretability in biological sequence models, offering new perspectives on model steering for sequence design.
title	Interpreting and Steering Protein Language Models through Sparse Autoencoders
topic	Machine Learning Biomolecules
url	https://arxiv.org/abs/2502.09135

Similar Items