:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chughtai, Bilal, Cooney, Alan, Nanda, Neel
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Computation and Language
Online Access:	https://arxiv.org/abs/2402.07321
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Building Production-Ready Probes For Gemini
by: Kramár, János, et al.
Published: (2026)

Difficulties with Evaluating a Deception Detector for AIs
by: Smith, Lewis, et al.
Published: (2025)

Towards a Holistic Evaluation of LLMs on Factual Knowledge Recall
by: Yuan, Jiaqing, et al.
Published: (2024)

Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language Models
by: Lv, Ang, et al.
Published: (2024)

Predictable Confabulations: Factual Recall by LLMs Scales with Model Size and Topic Frequency
by: Smith, Matthew L., et al.
Published: (2026)

Transformer Circuit Faithfulness Metrics are not Robust
by: Miller, Joseph, et al.
Published: (2024)

Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning
by: Minder, Julian, et al.
Published: (2025)

Explorations of Self-Repair in Language Models
by: Rushing, Cody, et al.
Published: (2024)

Towards Best Practices of Activation Patching in Language Models: Metrics and Methods
by: Zhang, Fred, et al.
Published: (2023)

Transcoders Find Interpretable LLM Feature Circuits
by: Dunefsky, Jacob, et al.
Published: (2024)

Understanding Factual Recall in Transformers via Associative Memories
by: Nichani, Eshaan, et al.
Published: (2024)

Through a Compressed Lens: Investigating The Impact of Quantization on Factual Knowledge Recall
by: Wang, Qianli, et al.
Published: (2025)

What's the plan? Metrics for implicit planning in LLMs and their application to rhyme generation and question answering
by: Maar, Jim, et al.
Published: (2026)

The Impact of Inference Acceleration on Bias of LLMs
by: Kirsten, Elisabeth, et al.
Published: (2024)

Locate-then-edit for Multi-hop Factual Recall under Knowledge Editing
by: Zhang, Zhuoran, et al.
Published: (2024)

Profiling News Media for Factuality and Bias Using LLMs and the Fact-Checking Methodology of Human Experts
by: Mujahid, Zain Muhammad, et al.
Published: (2025)

Evaluating Sparse Autoencoders on Targeted Concept Erasure Tasks
by: Karvonen, Adam, et al.
Published: (2024)

AtP*: An efficient and scalable method for localizing LLM behaviour to components
by: Kramár, János, et al.
Published: (2024)

Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation
by: Casademunt, Helena, et al.
Published: (2026)

Towards Reliable Latent Knowledge Estimation in LLMs: Zero-Prompt Many-Shot Based Factual Knowledge Extraction
by: Wu, Qinyuan, et al.
Published: (2024)

Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs
by: Laine, Rudolf, et al.
Published: (2024)

Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
by: Ferrando, Javier, et al.
Published: (2024)

Exploring Precision and Recall to assess the quality and diversity of LLMs
by: Bronnec, Florian Le, et al.
Published: (2024)

Persuasion Tokens for Editing Factual Knowledge in LLMs
by: Youssef, Paul, et al.
Published: (2026)

Simple Mechanistic Explanations for Out-Of-Context Reasoning
by: Wang, Atticus, et al.
Published: (2025)

Thought Branches: Interpreting LLM Reasoning Requires Resampling
by: Macar, Uzay, et al.
Published: (2025)

Thought Anchors: Which LLM Reasoning Steps Matter?
by: Bogdan, Paul C., et al.
Published: (2025)

Factual Confidence of LLMs: on Reliability and Robustness of Current Estimators
by: Mahaut, Matéo, et al.
Published: (2024)

FacLens: Transferable Probe for Foreseeing Non-Factuality in Fact-Seeking Question Answering of Large Language Models
by: Wang, Yanling, et al.
Published: (2024)

Layerwise Recall and the Geometry of Interwoven Knowledge in LLMs
by: Lei, Ge, et al.
Published: (2025)

Hallucination to Truth: A Review of Fact-Checking and Factuality Evaluation in Large Language Models
by: Rahman, Subhey Sadi, et al.
Published: (2025)

Knowledge-Level Consistency Reinforcement Learning: Dual-Fact Alignment for Long-Form Factuality
by: Li, Junliang, et al.
Published: (2025)

Scaling sparse feature circuit finding for in-context learning
by: Kharlapenko, Dmitrii, et al.
Published: (2025)

FactTest: Factuality Testing in Large Language Models with Finite-Sample and Distribution-Free Guarantees
by: Nie, Fan, et al.
Published: (2024)

Patent Language Model Pretraining with ModernBERT
by: Yousefiramandi, Amirhossein, et al.
Published: (2025)

Assessing Episodic Memory in LLMs with Sequence Order Recall Tasks
by: Pink, Mathis, et al.
Published: (2024)

Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning
by: Casademunt, Helena, et al.
Published: (2025)

Chain-of-Thought Reasoning In The Wild Is Not Always Faithful
by: Arcuschin, Iván, et al.
Published: (2025)

Real-Time Detection of Hallucinated Entities in Long-Form Generation
by: Obeso, Oscar, et al.
Published: (2025)

FactSelfCheck: Fact-Level Black-Box Hallucination Detection for LLMs
by: Sawczyn, Albert, et al.
Published: (2025)