:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	De Sabbata, Stef, Mizzaro, Stefano, Roitero, Kevin
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2505.03368
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Exploring Geographic Relative Space in Large Language Models through Activation Patching
by: De Sabbata, Stef, et al.
Published: (2026)

The Effect of Document Summarization on LLM-Based Relevance Judgments
by: Mohtadi, Samaneh, et al.
Published: (2025)

On Robustness and Reliability of Benchmark-Based Evaluation of LLMs
by: Lunardi, Riccardo, et al.
Published: (2025)

Efficiency and Effectiveness of LLM-Based Summarization of Evidence in Crowdsourced Fact-Checking
by: Roitero, Kevin, et al.
Published: (2025)

Rational Metareasoning for Large Language Models
by: De Sabbata, C. Nicolò, et al.
Published: (2024)

GeoLLM: Extracting Geospatial Knowledge from Large Language Models
by: Manvi, Rohin, et al.
Published: (2023)

Binary Autoencoder for Mechanistic Interpretability of Large Language Models
by: Cho, Hakaze, et al.
Published: (2025)

Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
by: Winninger, Thomas, et al.
Published: (2025)

Mechanistic Interpretability with SAEs: Probing Religion, Violence, and Geography in Large Language Models
by: Simbeck, Katharina, et al.
Published: (2025)

How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective
by: Liu, Qi, et al.
Published: (2025)

Challenges in Mechanistically Interpreting Model Representations
by: Golechha, Satvik, et al.
Published: (2024)

Detecting and Understanding Vulnerabilities in Language Models via Mechanistic Interpretability
by: García-Carrasco, Jorge, et al.
Published: (2024)

Open Problems in Mechanistic Interpretability
by: Sharkey, Lee, et al.
Published: (2025)

Exemplar Partitioning for Mechanistic Interpretability
by: Rumbelow, Jessica
Published: (2026)

From Mechanistic to Compositional Interpretability
by: Gauderis, Ward, et al.
Published: (2026)

Mechanistic Interpretability of RNNs emulating Hidden Markov Models
by: Torre, Elia, et al.
Published: (2025)

Reasoning Circuits in Language Models: A Mechanistic Interpretation of Syllogistic Inference
by: Kim, Geonhee, et al.
Published: (2024)

Mechanistic Interpretability for Neural TSP Solvers
by: Narad, Reuben, et al.
Published: (2025)

Mechanistic Interpretability of Reinforcement Learning Agents
by: Trim, Tristan, et al.
Published: (2024)

Validating Mechanistic Interpretations: An Axiomatic Approach
by: Palumbo, Nils, et al.
Published: (2024)

Mixture of Cognitive Reasoners: Modular Reasoning with Brain-Like Specialization
by: AlKhamissi, Badr, et al.
Published: (2025)

Mechanistic Interpretability of LoRA-Adapted Language Models for Nuclear Reactor Safety Applications
by: Lee, Yoon Pyo
Published: (2025)

Predicting missing values: A good idea?
by: van Buuren, Stef
Published: (2026)

DLM-Scope: Mechanistic Interpretability of Diffusion Language Models via Sparse Autoencoders
by: Wang, Xu, et al.
Published: (2026)

Mechanistic Exploration of Backdoored Large Language Model Attention Patterns
by: Baker, Mohammed Abu, et al.
Published: (2025)

Using Degeneracy in the Loss Landscape for Mechanistic Interpretability
by: Bushnaq, Lucius, et al.
Published: (2024)

Compact Proofs of Model Performance via Mechanistic Interpretability
by: Gross, Jason, et al.
Published: (2024)

Mitigating Geospatial Knowledge Hallucination in Large Language Models: Benchmarking and Dynamic Factuality Aligning
by: Wang, Shengyuan, et al.
Published: (2025)

Mechanistic Interpretability Tool for AI Weather Models
by: Tempest, Kirsten I., et al.
Published: (2026)

Mechanistic Interpretability of Binary and Ternary Transformers
by: Li, Jason
Published: (2024)

A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models
by: Lin, Zihao, et al.
Published: (2025)

Mechanistic Interpretability of Brain-to-Speech Models Across Speech Modes
by: Maghsoudi, Maryam, et al.
Published: (2026)

reward-lens: A Mechanistic Interpretability Library for Reward Models
by: Nadaf, Mohammed Suhail B
Published: (2026)

TurnBack: A Geospatial Route Cognition Benchmark for Large Language Models through Reverse Route
by: Luo, Hongyi, et al.
Published: (2025)

Bridging Mechanistic Interpretability and Prompt Engineering with Gradient Ascent for Interpretable Persona Control
by: Saini, Harshvardhan, et al.
Published: (2026)

Mechanistic Interpretability of GPT-like Models on Summarization Tasks
by: Mishra, Anurag
Published: (2025)

OceanCBM: A Concept Bottleneck Model for Mechanistic Interpretability in Ocean Forecasting
by: Suri, Sanah, et al.
Published: (2026)

LLMs as Implicit Imputers: Uncertainty Should Scale with Missing Information
by: van Buuren, Stef
Published: (2026)

Interpretable Deep Learning for Polar Mechanistic Reaction Prediction
by: Miller, Ryan J., et al.
Published: (2025)

Cluster-Based Random Forest Visualization and Interpretation
by: Sondag, Max, et al.
Published: (2025)