:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Trott, Sean
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Artificial Intelligence Computation and Language
Online-Zugang:	https://arxiv.org/abs/2509.22831
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Do language models capture implied discourse meanings? An investigation with exhaustivity implicatures of Korean morphology
von: Shin, Hagyeong, et al.
Veröffentlicht: (2024)

AI-Augmented Predictions: LLM Assistants Improve Human Forecasting Accuracy
von: Schoenegger, Philipp, et al.
Veröffentlicht: (2024)

Seeing Through Words, Speaking Through Pixels: Deep Representational Alignment Between Vision and Language Models
von: He, Zoe Wanying, et al.
Veröffentlicht: (2025)

Towards Understanding and Improving Refusal in Compressed Models via Mechanistic Interpretability
von: Chhabra, Vishnu Kabir, et al.
Veröffentlicht: (2025)

Mechanistic Interpretability Needs Philosophy
von: Williams, Iwan, et al.
Veröffentlicht: (2025)

HyperDAS: Towards Automating Mechanistic Interpretability with Hypernetworks
von: Sun, Jiuding, et al.
Veröffentlicht: (2025)

Language Statistics and False Belief Reasoning: Evidence from 41 Open-Weight LMs
von: Trott, Sean, et al.
Veröffentlicht: (2026)

Mechanistic Data Attribution: Tracing the Training Origins of Interpretable LLM Units
von: Chen, Jianhui, et al.
Veröffentlicht: (2026)

The Story is Not the Science: Execution-Grounded Evaluation of Mechanistic Interpretability Research
von: Bai, Xiaoyan, et al.
Veröffentlicht: (2026)

Intrinsic Self-Correction in LLMs: Towards Explainable Prompting via Mechanistic Interpretability
von: Lee, Yu-Ting, et al.
Veröffentlicht: (2025)

Mechanistic Interpretability of Emotion Inference in Large Language Models
von: Tak, Ala N., et al.
Veröffentlicht: (2025)

Dissecting Bias in LLMs: A Mechanistic Interpretability Perspective
von: Chandna, Bhavik, et al.
Veröffentlicht: (2025)

MIB: A Mechanistic Interpretability Benchmark
von: Mueller, Aaron, et al.
Veröffentlicht: (2025)

Analysing Moral Bias in Finetuned LLMs through Mechanistic Interpretability
von: Raimondi, Bianca, et al.
Veröffentlicht: (2025)

Mechanistic Interpretability of Socio-Political Frames in Language Models
von: Asghari, Hadi, et al.
Veröffentlicht: (2025)

Benchmark Profiling: Mechanistic Diagnosis of LLM Benchmarks
von: Kim, Dongjun, et al.
Veröffentlicht: (2025)

Towards Ethical Multi-Agent Systems of Large Language Models: A Mechanistic Interpretability Perspective
von: Lee, Jae Hee, et al.
Veröffentlicht: (2025)

Mechanistic Interpretability of GPT-2: Lexical and Contextual Layers in Sentiment Analysis
von: Hatua, Amartya
Veröffentlicht: (2025)

CHIMERA: Compact Synthetic Data for Generalizable LLM Reasoning
von: Zhu, Xinyu, et al.
Veröffentlicht: (2026)

Toward Mechanistic Explanation of Deductive Reasoning in Language Models
von: Maltoni, Davide, et al.
Veröffentlicht: (2025)

Mechanistic Interpretability of GPT-like Models on Summarization Tasks
von: Mishra, Anurag
Veröffentlicht: (2025)

Binary Autoencoder for Mechanistic Interpretability of Large Language Models
von: Cho, Hakaze, et al.
Veröffentlicht: (2025)

Mechanistic Interpretability as Statistical Estimation: A Variance Analysis
von: Méloux, Maxime, et al.
Veröffentlicht: (2025)

Everything, Everywhere, All at Once: Is Mechanistic Interpretability Identifiable?
von: Méloux, Maxime, et al.
Veröffentlicht: (2025)

Theory-Grounded Evaluation of Human-Like Fallacy Patterns in LLM Reasoning
von: Richardson, Andrew Keenan, et al.
Veröffentlicht: (2025)

DetectAnyLLM: Towards Generalizable and Robust Detection of Machine-Generated Text Across Domains and Models
von: Fu, Jiachen, et al.
Veröffentlicht: (2025)

Mechanistic Interpretability of Cognitive Complexity in LLMs via Linear Probing using Bloom's Taxonomy
von: Raimondi, Bianca, et al.
Veröffentlicht: (2026)

Beyond Accuracy: Introducing a Symbolic-Mechanistic Approach to Interpretable Evaluation
von: Habibi, Reza, et al.
Veröffentlicht: (2026)

Position: Mechanistic Interpretability Should Prioritize Feature Consistency in SAEs
von: Song, Xiangchen, et al.
Veröffentlicht: (2025)

How does Chain of Thought Think? Mechanistic Interpretability of Chain-of-Thought Reasoning with Sparse Autoencoding
von: Chen, Xi, et al.
Veröffentlicht: (2025)

xList-Hate: A Checklist-Based Framework for Interpretable and Generalizable Hate Speech Detection
von: Girón, Adrián, et al.
Veröffentlicht: (2026)

How Trustworthy Are LLM-as-Judge Ratings for Interpretive Responses? Implications for Qualitative Research Workflows
von: Han, Songhee, et al.
Veröffentlicht: (2026)

MetaAligner: Towards Generalizable Multi-Objective Alignment of Language Models
von: Yang, Kailai, et al.
Veröffentlicht: (2024)

Reasoning Circuits in Language Models: A Mechanistic Interpretation of Syllogistic Inference
von: Kim, Geonhee, et al.
Veröffentlicht: (2024)

Towards Quantifying Commonsense Reasoning with Mechanistic Insights
von: Joshi, Abhinav, et al.
Veröffentlicht: (2025)

Towards Compositionally Generalizable Semantic Parsing in Large Language Models: A Survey
von: Mannekote, Amogh
Veröffentlicht: (2024)

DLM-Scope: Mechanistic Interpretability of Diffusion Language Models via Sparse Autoencoders
von: Wang, Xu, et al.
Veröffentlicht: (2026)

Adaptive Circuit Behavior and Generalization in Mechanistic Interpretability
von: Nainani, Jatin, et al.
Veröffentlicht: (2024)

DLPO: Towards a Robust, Efficient, and Generalizable Prompt Optimization Framework from a Deep-Learning Perspective
von: Peng, Dengyun, et al.
Veröffentlicht: (2025)

Patch-Effect Graph Kernels for LLM Interpretability
von: Fernandez-Boullon, Ruben, et al.
Veröffentlicht: (2026)