Saved in:
| Main Authors: | Tavor, Almog, Ebenspanger, Itay, Cnaan, Neil, Geva, Mor |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.01395 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Estimating Knowledge in Large Language Models Without Generating a Single Token
by: Gottesman, Daniela, et al.
Published: (2024)
by: Gottesman, Daniela, et al.
Published: (2024)
Friends and Grandmothers in Silico: Localizing Entity Cells in Language Models
by: Yona, Itay, et al.
Published: (2026)
by: Yona, Itay, et al.
Published: (2026)
Narrowing the Knowledge Evaluation Gap: Open-Domain Question Answering with Multi-Granularity Answers
by: Yona, Gal, et al.
Published: (2024)
by: Yona, Gal, et al.
Published: (2024)
Inferring Functionality of Attention Heads from their Parameters
by: Elhelo, Amit, et al.
Published: (2024)
by: Elhelo, Amit, et al.
Published: (2024)
Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models
by: Cohen, Ido, et al.
Published: (2024)
by: Cohen, Ido, et al.
Published: (2024)
Constructing Interpretable Features from Compositional Neuron Groups
by: Shafran, Or, et al.
Published: (2025)
by: Shafran, Or, et al.
Published: (2025)
Hallucinations Undermine Trust; Metacognition is a Way Forward
by: Yona, Gal, et al.
Published: (2026)
by: Yona, Gal, et al.
Published: (2026)
Eliciting Textual Descriptions from Representations of Continuous Prompts
by: Ramati, Dana, et al.
Published: (2024)
by: Ramati, Dana, et al.
Published: (2024)
Can Large Language Models Faithfully Express Their Intrinsic Uncertainty in Words?
by: Yona, Gal, et al.
Published: (2024)
by: Yona, Gal, et al.
Published: (2024)
Intrinsic Test of Unlearning Using Parametric Knowledge Traces
by: Hong, Yihuai, et al.
Published: (2024)
by: Hong, Yihuai, et al.
Published: (2024)
Routers Learn the Geometry of Their Experts: Geometric Coupling in Sparse Mixture-of-Experts
by: Ahrac, Sagi, et al.
Published: (2026)
by: Ahrac, Sagi, et al.
Published: (2026)
Preventing Rogue Agents Improves Multi-Agent Collaboration
by: Barbi, Ohav, et al.
Published: (2025)
by: Barbi, Ohav, et al.
Published: (2025)
Faithfulness Metrics Don't Measure Faithfulness: A Meta-Evaluation with Ground Truth
by: Gur-Arieh, Yoav, et al.
Published: (2026)
by: Gur-Arieh, Yoav, et al.
Published: (2026)
Disentangling MLP Neuron Weights in Vocabulary Space
by: Avrahamy, Asaf, et al.
Published: (2026)
by: Avrahamy, Asaf, et al.
Published: (2026)
Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context
by: Gur-Arieh, Yoav, et al.
Published: (2025)
by: Gur-Arieh, Yoav, et al.
Published: (2025)
Detecting (Un)answerability in Large Language Models with Linear Directions
by: Lavi, Maor Juliet, et al.
Published: (2025)
by: Lavi, Maor Juliet, et al.
Published: (2025)
Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs
by: Gekhman, Zorik, et al.
Published: (2026)
by: Gekhman, Zorik, et al.
Published: (2026)
Think Again! The Effect of Test-Time Compute on Preferences, Opinions, and Beliefs of Large Language Models
by: Kour, George, et al.
Published: (2025)
by: Kour, George, et al.
Published: (2025)
Mechanistically Interpretable Neural Encoding Reveals Fine-Grained Functional Selectivity in Human Visual Cortex
by: Grosbard, Idan Daniel, et al.
Published: (2026)
by: Grosbard, Idan Daniel, et al.
Published: (2026)
From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty
by: Ivgi, Maor, et al.
Published: (2024)
by: Ivgi, Maor, et al.
Published: (2024)
Indications of Belief-Guided Agency and Meta-Cognitive Monitoring in Large Language Models
by: Yalon, Noam Steinmetz, et al.
Published: (2026)
by: Yalon, Noam Steinmetz, et al.
Published: (2026)
Jump to Conclusions: Short-Cutting Transformers With Linear Transformations
by: Din, Alexander Yom, et al.
Published: (2023)
by: Din, Alexander Yom, et al.
Published: (2023)
Backward Lens: Projecting Language Model Gradients into the Vocabulary Space
by: Katz, Shahar, et al.
Published: (2024)
by: Katz, Shahar, et al.
Published: (2024)
Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?
by: Yang, Sohee, et al.
Published: (2024)
by: Yang, Sohee, et al.
Published: (2024)
Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries
by: Biran, Eden, et al.
Published: (2024)
by: Biran, Eden, et al.
Published: (2024)
Do Large Language Models Latently Perform Multi-Hop Reasoning?
by: Yang, Sohee, et al.
Published: (2024)
by: Yang, Sohee, et al.
Published: (2024)
Don't Blame the Annotator: Bias Already Starts in the Annotation Instructions
by: Parmar, Mihir, et al.
Published: (2022)
by: Parmar, Mihir, et al.
Published: (2022)
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations
by: Gottesman, Daniela, et al.
Published: (2025)
by: Gottesman, Daniela, et al.
Published: (2025)
From Directions to Regions: Decomposing Activations in Language Models via Local Geometry
by: Shafran, Or, et al.
Published: (2026)
by: Shafran, Or, et al.
Published: (2026)
Enhancing Automated Interpretability with Output-Centric Feature Descriptions
by: Gur-Arieh, Yoav, et al.
Published: (2025)
by: Gur-Arieh, Yoav, et al.
Published: (2025)
The Hidden Space of Transformer Language Adapters
by: Alabi, Jesujoba O., et al.
Published: (2024)
by: Alabi, Jesujoba O., et al.
Published: (2024)
Precise In-Parameter Concept Erasure in Large Language Models
by: Gur-Arieh, Yoav, et al.
Published: (2025)
by: Gur-Arieh, Yoav, et al.
Published: (2025)
From Insights to Actions: The Impact of Interpretability and Analysis Research on NLP
by: Mosbach, Marius, et al.
Published: (2024)
by: Mosbach, Marius, et al.
Published: (2024)
RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations
by: Huang, Jing, et al.
Published: (2024)
by: Huang, Jing, et al.
Published: (2024)
On the Robustness of Agentic Function Calling
by: Rabinovich, Ella, et al.
Published: (2025)
by: Rabinovich, Ella, et al.
Published: (2025)
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
by: Ghandeharioun, Asma, et al.
Published: (2024)
by: Ghandeharioun, Asma, et al.
Published: (2024)
Effective Red-Teaming of Policy-Adherent Agents
by: Nakash, Itay, et al.
Published: (2025)
by: Nakash, Itay, et al.
Published: (2025)
How Well Can Reasoning Models Identify and Recover from Unhelpful Thoughts?
by: Yang, Sohee, et al.
Published: (2025)
by: Yang, Sohee, et al.
Published: (2025)
What's the Plan? Evaluating and Developing Planning-Aware Techniques for Language Models
by: Hirsch, Eran, et al.
Published: (2024)
by: Hirsch, Eran, et al.
Published: (2024)
Latent Reasoning with Supervised Thinking States
by: Amos, Ido, et al.
Published: (2026)
by: Amos, Ido, et al.
Published: (2026)
Similar Items
-
Estimating Knowledge in Large Language Models Without Generating a Single Token
by: Gottesman, Daniela, et al.
Published: (2024) -
Friends and Grandmothers in Silico: Localizing Entity Cells in Language Models
by: Yona, Itay, et al.
Published: (2026) -
Narrowing the Knowledge Evaluation Gap: Open-Domain Question Answering with Multi-Granularity Answers
by: Yona, Gal, et al.
Published: (2024) -
Inferring Functionality of Attention Heads from their Parameters
by: Elhelo, Amit, et al.
Published: (2024) -
Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models
by: Cohen, Ido, et al.
Published: (2024)