:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Shafran, Or, Ronen, Shaked, Fahn, Omri, Ravfogel, Shauli, Geiger, Atticus, Geva, Mor
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2602.02464
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Constructing Interpretable Features from Compositional Neuron Groups
by: Shafran, Or, et al.
Published: (2025)

Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context
by: Gur-Arieh, Yoav, et al.
Published: (2025)

Intrinsic Test of Unlearning Using Parametric Knowledge Traces
by: Hong, Yihuai, et al.
Published: (2024)

RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations
by: Huang, Jing, et al.
Published: (2024)

Enhancing Automated Interpretability with Output-Centric Feature Descriptions
by: Gur-Arieh, Yoav, et al.
Published: (2025)

Gumbel Counterfactual Generation From Language Models
by: Ravfogel, Shauli, et al.
Published: (2024)

BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models
by: Ben-Zaken, Elad, et al.
Published: (2021)

Log-linear Guardedness and its Implications
by: Ravfogel, Shauli, et al.
Published: (2022)

Emergence of Linear Truth Encodings in Language Models
by: Ravfogel, Shauli, et al.
Published: (2025)

Detecting (Un)answerability in Large Language Models with Linear Directions
by: Lavi, Maor Juliet, et al.
Published: (2025)

Estimating Knowledge in Large Language Models Without Generating a Single Token
by: Gottesman, Daniela, et al.
Published: (2024)

Diversity Over Quantity: A Lesson From Few Shot Relation Classification
by: Cohen, Amir DN, et al.
Published: (2024)

Geometric Factual Recall in Transformers
by: Ravfogel, Shauli, et al.
Published: (2026)

Friends and Grandmothers in Silico: Localizing Entity Cells in Language Models
by: Yona, Itay, et al.
Published: (2026)

A Practical Method for Generating String Counterfactuals
by: Avitan, Matan, et al.
Published: (2024)

RELIC: Evaluating Complex Reasoning via the Recognition of Languages In-Context
by: Petty, Jackson, et al.
Published: (2025)

Kernelized Concept Erasure
by: Ravfogel, Shauli, et al.
Published: (2022)

State over Tokens: Characterizing the Role of Reasoning Tokens
by: Levy, Mosh, et al.
Published: (2025)

Linear Adversarial Concept Erasure
by: Ravfogel, Shauli, et al.
Published: (2022)

Activation Steering via Generative Causal Mediation
by: Sankaranarayanan, Aruna, et al.
Published: (2026)

Routers Learn the Geometry of Their Experts: Geometric Coupling in Sparse Mixture-of-Experts
by: Ahrac, Sagi, et al.
Published: (2026)

Can Large Language Models Faithfully Express Their Intrinsic Uncertainty in Words?
by: Yona, Gal, et al.
Published: (2024)

The Role of Language Imbalance in Cross-lingual Generalisation: Insights from Cloned Language Experiments
by: Schäfer, Anton, et al.
Published: (2024)

From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty
by: Ivgi, Maor, et al.
Published: (2024)

Beyond Single Embeddings: Capturing Diverse Targets with Multi-Query Retrieval
by: Chen, Hung-Ting, et al.
Published: (2025)

The Medium Is Not the Message: Deconfounding Document Embeddings via Linear Concept Erasure
by: Fan, Yu, et al.
Published: (2025)

Pretrained LLMs Learn Multiple Types of Uncertainty
by: Cohen, Roi, et al.
Published: (2025)

Inferring Functionality of Attention Heads from their Parameters
by: Elhelo, Amit, et al.
Published: (2024)

IQ Test for LLMs: An Evaluation Framework for Uncovering Core Skills in LLMs
by: Maimon, Aviya, et al.
Published: (2025)

Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models
by: Cohen, Ido, et al.
Published: (2024)

Linguistic Binding in Diffusion Models: Enhancing Attribute Correspondence through Attention Map Alignment
by: Rassin, Royi, et al.
Published: (2023)

Indications of Belief-Guided Agency and Meta-Cognitive Monitoring in Large Language Models
by: Yalon, Noam Steinmetz, et al.
Published: (2026)

Backward Lens: Projecting Language Model Gradients into the Vocabulary Space
by: Katz, Shahar, et al.
Published: (2024)

Representation Surgery: Theory and Practice of Affine Steering
by: Singh, Shashwat, et al.
Published: (2024)

Description-Based Text Similarity
by: Ravfogel, Shauli, et al.
Published: (2023)

Discrete Diffusion Models Exploit Asymmetry to Solve Lookahead Planning Tasks
by: Trainin, Itamar, et al.
Published: (2026)

Hallucinations Undermine Trust; Metacognition is a Way Forward
by: Yona, Gal, et al.
Published: (2026)

Eliciting Textual Descriptions from Representations of Continuous Prompts
by: Ramati, Dana, et al.
Published: (2024)

Narrowing the Knowledge Evaluation Gap: Open-Domain Question Answering with Multi-Granularity Answers
by: Yona, Gal, et al.
Published: (2024)

Do Large Language Models Latently Perform Multi-Hop Reasoning?
by: Yang, Sohee, et al.
Published: (2024)