Saved in:
| Main Authors: | De Sabbata, Stef, Mizzaro, Stefano, Roitero, Kevin |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.03368 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Exploring Geographic Relative Space in Large Language Models through Activation Patching
by: De Sabbata, Stef, et al.
Published: (2026)
by: De Sabbata, Stef, et al.
Published: (2026)
The Effect of Document Summarization on LLM-Based Relevance Judgments
by: Mohtadi, Samaneh, et al.
Published: (2025)
by: Mohtadi, Samaneh, et al.
Published: (2025)
On Robustness and Reliability of Benchmark-Based Evaluation of LLMs
by: Lunardi, Riccardo, et al.
Published: (2025)
by: Lunardi, Riccardo, et al.
Published: (2025)
Efficiency and Effectiveness of LLM-Based Summarization of Evidence in Crowdsourced Fact-Checking
by: Roitero, Kevin, et al.
Published: (2025)
by: Roitero, Kevin, et al.
Published: (2025)
Rational Metareasoning for Large Language Models
by: De Sabbata, C. Nicolò, et al.
Published: (2024)
by: De Sabbata, C. Nicolò, et al.
Published: (2024)
GeoLLM: Extracting Geospatial Knowledge from Large Language Models
by: Manvi, Rohin, et al.
Published: (2023)
by: Manvi, Rohin, et al.
Published: (2023)
Binary Autoencoder for Mechanistic Interpretability of Large Language Models
by: Cho, Hakaze, et al.
Published: (2025)
by: Cho, Hakaze, et al.
Published: (2025)
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
by: Winninger, Thomas, et al.
Published: (2025)
by: Winninger, Thomas, et al.
Published: (2025)
Mechanistic Interpretability with SAEs: Probing Religion, Violence, and Geography in Large Language Models
by: Simbeck, Katharina, et al.
Published: (2025)
by: Simbeck, Katharina, et al.
Published: (2025)
How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective
by: Liu, Qi, et al.
Published: (2025)
by: Liu, Qi, et al.
Published: (2025)
Challenges in Mechanistically Interpreting Model Representations
by: Golechha, Satvik, et al.
Published: (2024)
by: Golechha, Satvik, et al.
Published: (2024)
Detecting and Understanding Vulnerabilities in Language Models via Mechanistic Interpretability
by: García-Carrasco, Jorge, et al.
Published: (2024)
by: García-Carrasco, Jorge, et al.
Published: (2024)
Open Problems in Mechanistic Interpretability
by: Sharkey, Lee, et al.
Published: (2025)
by: Sharkey, Lee, et al.
Published: (2025)
Exemplar Partitioning for Mechanistic Interpretability
by: Rumbelow, Jessica
Published: (2026)
by: Rumbelow, Jessica
Published: (2026)
From Mechanistic to Compositional Interpretability
by: Gauderis, Ward, et al.
Published: (2026)
by: Gauderis, Ward, et al.
Published: (2026)
Mechanistic Interpretability of RNNs emulating Hidden Markov Models
by: Torre, Elia, et al.
Published: (2025)
by: Torre, Elia, et al.
Published: (2025)
Reasoning Circuits in Language Models: A Mechanistic Interpretation of Syllogistic Inference
by: Kim, Geonhee, et al.
Published: (2024)
by: Kim, Geonhee, et al.
Published: (2024)
Mechanistic Interpretability for Neural TSP Solvers
by: Narad, Reuben, et al.
Published: (2025)
by: Narad, Reuben, et al.
Published: (2025)
Mechanistic Interpretability of Reinforcement Learning Agents
by: Trim, Tristan, et al.
Published: (2024)
by: Trim, Tristan, et al.
Published: (2024)
Validating Mechanistic Interpretations: An Axiomatic Approach
by: Palumbo, Nils, et al.
Published: (2024)
by: Palumbo, Nils, et al.
Published: (2024)
Mixture of Cognitive Reasoners: Modular Reasoning with Brain-Like Specialization
by: AlKhamissi, Badr, et al.
Published: (2025)
by: AlKhamissi, Badr, et al.
Published: (2025)
Mechanistic Interpretability of LoRA-Adapted Language Models for Nuclear Reactor Safety Applications
by: Lee, Yoon Pyo
Published: (2025)
by: Lee, Yoon Pyo
Published: (2025)
Predicting missing values: A good idea?
by: van Buuren, Stef
Published: (2026)
by: van Buuren, Stef
Published: (2026)
DLM-Scope: Mechanistic Interpretability of Diffusion Language Models via Sparse Autoencoders
by: Wang, Xu, et al.
Published: (2026)
by: Wang, Xu, et al.
Published: (2026)
Mechanistic Exploration of Backdoored Large Language Model Attention Patterns
by: Baker, Mohammed Abu, et al.
Published: (2025)
by: Baker, Mohammed Abu, et al.
Published: (2025)
Using Degeneracy in the Loss Landscape for Mechanistic Interpretability
by: Bushnaq, Lucius, et al.
Published: (2024)
by: Bushnaq, Lucius, et al.
Published: (2024)
Compact Proofs of Model Performance via Mechanistic Interpretability
by: Gross, Jason, et al.
Published: (2024)
by: Gross, Jason, et al.
Published: (2024)
Mitigating Geospatial Knowledge Hallucination in Large Language Models: Benchmarking and Dynamic Factuality Aligning
by: Wang, Shengyuan, et al.
Published: (2025)
by: Wang, Shengyuan, et al.
Published: (2025)
Mechanistic Interpretability Tool for AI Weather Models
by: Tempest, Kirsten I., et al.
Published: (2026)
by: Tempest, Kirsten I., et al.
Published: (2026)
Mechanistic Interpretability of Binary and Ternary Transformers
by: Li, Jason
Published: (2024)
by: Li, Jason
Published: (2024)
A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models
by: Lin, Zihao, et al.
Published: (2025)
by: Lin, Zihao, et al.
Published: (2025)
Mechanistic Interpretability of Brain-to-Speech Models Across Speech Modes
by: Maghsoudi, Maryam, et al.
Published: (2026)
by: Maghsoudi, Maryam, et al.
Published: (2026)
reward-lens: A Mechanistic Interpretability Library for Reward Models
by: Nadaf, Mohammed Suhail B
Published: (2026)
by: Nadaf, Mohammed Suhail B
Published: (2026)
TurnBack: A Geospatial Route Cognition Benchmark for Large Language Models through Reverse Route
by: Luo, Hongyi, et al.
Published: (2025)
by: Luo, Hongyi, et al.
Published: (2025)
Bridging Mechanistic Interpretability and Prompt Engineering with Gradient Ascent for Interpretable Persona Control
by: Saini, Harshvardhan, et al.
Published: (2026)
by: Saini, Harshvardhan, et al.
Published: (2026)
Mechanistic Interpretability of GPT-like Models on Summarization Tasks
by: Mishra, Anurag
Published: (2025)
by: Mishra, Anurag
Published: (2025)
OceanCBM: A Concept Bottleneck Model for Mechanistic Interpretability in Ocean Forecasting
by: Suri, Sanah, et al.
Published: (2026)
by: Suri, Sanah, et al.
Published: (2026)
LLMs as Implicit Imputers: Uncertainty Should Scale with Missing Information
by: van Buuren, Stef
Published: (2026)
by: van Buuren, Stef
Published: (2026)
Interpretable Deep Learning for Polar Mechanistic Reaction Prediction
by: Miller, Ryan J., et al.
Published: (2025)
by: Miller, Ryan J., et al.
Published: (2025)
Cluster-Based Random Forest Visualization and Interpretation
by: Sondag, Max, et al.
Published: (2025)
by: Sondag, Max, et al.
Published: (2025)
Similar Items
-
Exploring Geographic Relative Space in Large Language Models through Activation Patching
by: De Sabbata, Stef, et al.
Published: (2026) -
The Effect of Document Summarization on LLM-Based Relevance Judgments
by: Mohtadi, Samaneh, et al.
Published: (2025) -
On Robustness and Reliability of Benchmark-Based Evaluation of LLMs
by: Lunardi, Riccardo, et al.
Published: (2025) -
Efficiency and Effectiveness of LLM-Based Summarization of Evidence in Crowdsourced Fact-Checking
by: Roitero, Kevin, et al.
Published: (2025) -
Rational Metareasoning for Large Language Models
by: De Sabbata, C. Nicolò, et al.
Published: (2024)