Saved in:
| Main Authors: | Chughtai, Bilal, Cooney, Alan, Nanda, Neel |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.07321 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Building Production-Ready Probes For Gemini
by: Kramár, János, et al.
Published: (2026)
by: Kramár, János, et al.
Published: (2026)
Difficulties with Evaluating a Deception Detector for AIs
by: Smith, Lewis, et al.
Published: (2025)
by: Smith, Lewis, et al.
Published: (2025)
Towards a Holistic Evaluation of LLMs on Factual Knowledge Recall
by: Yuan, Jiaqing, et al.
Published: (2024)
by: Yuan, Jiaqing, et al.
Published: (2024)
Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language Models
by: Lv, Ang, et al.
Published: (2024)
by: Lv, Ang, et al.
Published: (2024)
Predictable Confabulations: Factual Recall by LLMs Scales with Model Size and Topic Frequency
by: Smith, Matthew L., et al.
Published: (2026)
by: Smith, Matthew L., et al.
Published: (2026)
Transformer Circuit Faithfulness Metrics are not Robust
by: Miller, Joseph, et al.
Published: (2024)
by: Miller, Joseph, et al.
Published: (2024)
Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning
by: Minder, Julian, et al.
Published: (2025)
by: Minder, Julian, et al.
Published: (2025)
Explorations of Self-Repair in Language Models
by: Rushing, Cody, et al.
Published: (2024)
by: Rushing, Cody, et al.
Published: (2024)
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods
by: Zhang, Fred, et al.
Published: (2023)
by: Zhang, Fred, et al.
Published: (2023)
Transcoders Find Interpretable LLM Feature Circuits
by: Dunefsky, Jacob, et al.
Published: (2024)
by: Dunefsky, Jacob, et al.
Published: (2024)
Understanding Factual Recall in Transformers via Associative Memories
by: Nichani, Eshaan, et al.
Published: (2024)
by: Nichani, Eshaan, et al.
Published: (2024)
Through a Compressed Lens: Investigating The Impact of Quantization on Factual Knowledge Recall
by: Wang, Qianli, et al.
Published: (2025)
by: Wang, Qianli, et al.
Published: (2025)
What's the plan? Metrics for implicit planning in LLMs and their application to rhyme generation and question answering
by: Maar, Jim, et al.
Published: (2026)
by: Maar, Jim, et al.
Published: (2026)
The Impact of Inference Acceleration on Bias of LLMs
by: Kirsten, Elisabeth, et al.
Published: (2024)
by: Kirsten, Elisabeth, et al.
Published: (2024)
Locate-then-edit for Multi-hop Factual Recall under Knowledge Editing
by: Zhang, Zhuoran, et al.
Published: (2024)
by: Zhang, Zhuoran, et al.
Published: (2024)
Profiling News Media for Factuality and Bias Using LLMs and the Fact-Checking Methodology of Human Experts
by: Mujahid, Zain Muhammad, et al.
Published: (2025)
by: Mujahid, Zain Muhammad, et al.
Published: (2025)
Evaluating Sparse Autoencoders on Targeted Concept Erasure Tasks
by: Karvonen, Adam, et al.
Published: (2024)
by: Karvonen, Adam, et al.
Published: (2024)
AtP*: An efficient and scalable method for localizing LLM behaviour to components
by: Kramár, János, et al.
Published: (2024)
by: Kramár, János, et al.
Published: (2024)
Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation
by: Casademunt, Helena, et al.
Published: (2026)
by: Casademunt, Helena, et al.
Published: (2026)
Towards Reliable Latent Knowledge Estimation in LLMs: Zero-Prompt Many-Shot Based Factual Knowledge Extraction
by: Wu, Qinyuan, et al.
Published: (2024)
by: Wu, Qinyuan, et al.
Published: (2024)
Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs
by: Laine, Rudolf, et al.
Published: (2024)
by: Laine, Rudolf, et al.
Published: (2024)
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
by: Ferrando, Javier, et al.
Published: (2024)
by: Ferrando, Javier, et al.
Published: (2024)
Exploring Precision and Recall to assess the quality and diversity of LLMs
by: Bronnec, Florian Le, et al.
Published: (2024)
by: Bronnec, Florian Le, et al.
Published: (2024)
Persuasion Tokens for Editing Factual Knowledge in LLMs
by: Youssef, Paul, et al.
Published: (2026)
by: Youssef, Paul, et al.
Published: (2026)
Simple Mechanistic Explanations for Out-Of-Context Reasoning
by: Wang, Atticus, et al.
Published: (2025)
by: Wang, Atticus, et al.
Published: (2025)
Thought Branches: Interpreting LLM Reasoning Requires Resampling
by: Macar, Uzay, et al.
Published: (2025)
by: Macar, Uzay, et al.
Published: (2025)
Thought Anchors: Which LLM Reasoning Steps Matter?
by: Bogdan, Paul C., et al.
Published: (2025)
by: Bogdan, Paul C., et al.
Published: (2025)
Factual Confidence of LLMs: on Reliability and Robustness of Current Estimators
by: Mahaut, Matéo, et al.
Published: (2024)
by: Mahaut, Matéo, et al.
Published: (2024)
FacLens: Transferable Probe for Foreseeing Non-Factuality in Fact-Seeking Question Answering of Large Language Models
by: Wang, Yanling, et al.
Published: (2024)
by: Wang, Yanling, et al.
Published: (2024)
Layerwise Recall and the Geometry of Interwoven Knowledge in LLMs
by: Lei, Ge, et al.
Published: (2025)
by: Lei, Ge, et al.
Published: (2025)
Hallucination to Truth: A Review of Fact-Checking and Factuality Evaluation in Large Language Models
by: Rahman, Subhey Sadi, et al.
Published: (2025)
by: Rahman, Subhey Sadi, et al.
Published: (2025)
Knowledge-Level Consistency Reinforcement Learning: Dual-Fact Alignment for Long-Form Factuality
by: Li, Junliang, et al.
Published: (2025)
by: Li, Junliang, et al.
Published: (2025)
Scaling sparse feature circuit finding for in-context learning
by: Kharlapenko, Dmitrii, et al.
Published: (2025)
by: Kharlapenko, Dmitrii, et al.
Published: (2025)
FactTest: Factuality Testing in Large Language Models with Finite-Sample and Distribution-Free Guarantees
by: Nie, Fan, et al.
Published: (2024)
by: Nie, Fan, et al.
Published: (2024)
Patent Language Model Pretraining with ModernBERT
by: Yousefiramandi, Amirhossein, et al.
Published: (2025)
by: Yousefiramandi, Amirhossein, et al.
Published: (2025)
Assessing Episodic Memory in LLMs with Sequence Order Recall Tasks
by: Pink, Mathis, et al.
Published: (2024)
by: Pink, Mathis, et al.
Published: (2024)
Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning
by: Casademunt, Helena, et al.
Published: (2025)
by: Casademunt, Helena, et al.
Published: (2025)
Chain-of-Thought Reasoning In The Wild Is Not Always Faithful
by: Arcuschin, Iván, et al.
Published: (2025)
by: Arcuschin, Iván, et al.
Published: (2025)
Real-Time Detection of Hallucinated Entities in Long-Form Generation
by: Obeso, Oscar, et al.
Published: (2025)
by: Obeso, Oscar, et al.
Published: (2025)
FactSelfCheck: Fact-Level Black-Box Hallucination Detection for LLMs
by: Sawczyn, Albert, et al.
Published: (2025)
by: Sawczyn, Albert, et al.
Published: (2025)
Similar Items
-
Building Production-Ready Probes For Gemini
by: Kramár, János, et al.
Published: (2026) -
Difficulties with Evaluating a Deception Detector for AIs
by: Smith, Lewis, et al.
Published: (2025) -
Towards a Holistic Evaluation of LLMs on Factual Knowledge Recall
by: Yuan, Jiaqing, et al.
Published: (2024) -
Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language Models
by: Lv, Ang, et al.
Published: (2024) -
Predictable Confabulations: Factual Recall by LLMs Scales with Model Size and Topic Frequency
by: Smith, Matthew L., et al.
Published: (2026)