:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Tavor, Almog, Ebenspanger, Itay, Cnaan, Neil, Geva, Mor
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2602.01395
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Estimating Knowledge in Large Language Models Without Generating a Single Token
by: Gottesman, Daniela, et al.
Published: (2024)

Friends and Grandmothers in Silico: Localizing Entity Cells in Language Models
by: Yona, Itay, et al.
Published: (2026)

Narrowing the Knowledge Evaluation Gap: Open-Domain Question Answering with Multi-Granularity Answers
by: Yona, Gal, et al.
Published: (2024)

Inferring Functionality of Attention Heads from their Parameters
by: Elhelo, Amit, et al.
Published: (2024)

Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models
by: Cohen, Ido, et al.
Published: (2024)

Constructing Interpretable Features from Compositional Neuron Groups
by: Shafran, Or, et al.
Published: (2025)

Hallucinations Undermine Trust; Metacognition is a Way Forward
by: Yona, Gal, et al.
Published: (2026)

Eliciting Textual Descriptions from Representations of Continuous Prompts
by: Ramati, Dana, et al.
Published: (2024)

Can Large Language Models Faithfully Express Their Intrinsic Uncertainty in Words?
by: Yona, Gal, et al.
Published: (2024)

Intrinsic Test of Unlearning Using Parametric Knowledge Traces
by: Hong, Yihuai, et al.
Published: (2024)

Routers Learn the Geometry of Their Experts: Geometric Coupling in Sparse Mixture-of-Experts
by: Ahrac, Sagi, et al.
Published: (2026)

Preventing Rogue Agents Improves Multi-Agent Collaboration
by: Barbi, Ohav, et al.
Published: (2025)

Faithfulness Metrics Don't Measure Faithfulness: A Meta-Evaluation with Ground Truth
by: Gur-Arieh, Yoav, et al.
Published: (2026)

Disentangling MLP Neuron Weights in Vocabulary Space
by: Avrahamy, Asaf, et al.
Published: (2026)

Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context
by: Gur-Arieh, Yoav, et al.
Published: (2025)

Detecting (Un)answerability in Large Language Models with Linear Directions
by: Lavi, Maor Juliet, et al.
Published: (2025)

Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs
by: Gekhman, Zorik, et al.
Published: (2026)

Think Again! The Effect of Test-Time Compute on Preferences, Opinions, and Beliefs of Large Language Models
by: Kour, George, et al.
Published: (2025)

Mechanistically Interpretable Neural Encoding Reveals Fine-Grained Functional Selectivity in Human Visual Cortex
by: Grosbard, Idan Daniel, et al.
Published: (2026)

From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty
by: Ivgi, Maor, et al.
Published: (2024)

Indications of Belief-Guided Agency and Meta-Cognitive Monitoring in Large Language Models
by: Yalon, Noam Steinmetz, et al.
Published: (2026)

Jump to Conclusions: Short-Cutting Transformers With Linear Transformations
by: Din, Alexander Yom, et al.
Published: (2023)

Backward Lens: Projecting Language Model Gradients into the Vocabulary Space
by: Katz, Shahar, et al.
Published: (2024)

Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?
by: Yang, Sohee, et al.
Published: (2024)

Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries
by: Biran, Eden, et al.
Published: (2024)

Do Large Language Models Latently Perform Multi-Hop Reasoning?
by: Yang, Sohee, et al.
Published: (2024)

Don't Blame the Annotator: Bias Already Starts in the Annotation Instructions
by: Parmar, Mihir, et al.
Published: (2022)

LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations
by: Gottesman, Daniela, et al.
Published: (2025)

From Directions to Regions: Decomposing Activations in Language Models via Local Geometry
by: Shafran, Or, et al.
Published: (2026)

Enhancing Automated Interpretability with Output-Centric Feature Descriptions
by: Gur-Arieh, Yoav, et al.
Published: (2025)

The Hidden Space of Transformer Language Adapters
by: Alabi, Jesujoba O., et al.
Published: (2024)

Precise In-Parameter Concept Erasure in Large Language Models
by: Gur-Arieh, Yoav, et al.
Published: (2025)

From Insights to Actions: The Impact of Interpretability and Analysis Research on NLP
by: Mosbach, Marius, et al.
Published: (2024)

RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations
by: Huang, Jing, et al.
Published: (2024)

On the Robustness of Agentic Function Calling
by: Rabinovich, Ella, et al.
Published: (2025)

Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
by: Ghandeharioun, Asma, et al.
Published: (2024)

Effective Red-Teaming of Policy-Adherent Agents
by: Nakash, Itay, et al.
Published: (2025)

How Well Can Reasoning Models Identify and Recover from Unhelpful Thoughts?
by: Yang, Sohee, et al.
Published: (2025)

What's the Plan? Evaluating and Developing Planning-Aware Techniques for Language Models
by: Hirsch, Eran, et al.
Published: (2024)

Latent Reasoning with Supervised Thinking States
by: Amos, Ido, et al.
Published: (2026)