:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Sun, Yiyou, Gai, Yu, Chen, Lijie, Ravichander, Abhilasha, Choi, Yejin, Song, Dawn
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2504.12691
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

HALoGEN: Fantastic LLM Hallucinations and Where to Find Them
by: Ravichander, Abhilasha, et al.
Published: (2025)

Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question?
by: Balepur, Nishant, et al.
Published: (2024)

RESTOR: Knowledge Recovery in Machine Unlearning
by: Rezaei, Keivan, et al.
Published: (2024)

WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries
by: Zhao, Wenting, et al.
Published: (2024)

What Has Been Lost with Synthetic Evaluation?
by: Gill, Alexander, et al.
Published: (2025)

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
by: Lin, Bill Yuchen, et al.
Published: (2024)

The Curious Case of Factuality Finetuning: Models' Internal Beliefs Can Improve Factuality
by: Newman, Benjamin, et al.
Published: (2025)

RL Grokking Recipe: How Does RL Unlock and Transfer New Algorithms in LLMs?
by: Sun, Yiyou, et al.
Published: (2025)

Agent Lumos: Unified and Modular Training for Open-Source Language Agents
by: Yin, Da, et al.
Published: (2023)

Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models
by: Ravichander, Abhilasha, et al.
Published: (2025)

The Surprising Effectiveness of Membership Inference with Simple N-Gram Coverage
by: Hallinan, Skyler, et al.
Published: (2025)

MacGyver: Are Large Language Models Creative Problem Solvers?
by: Tian, Yufei, et al.
Published: (2023)

OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization
by: Sun, Yiyou, et al.
Published: (2025)

Fractional Rotation, Full Potential? Investigating Performance and Convergence of Partial RoPE
by: Khan, Mohammad Aflah, et al.
Published: (2026)

Can LLMs Ask Good Questions?
by: Zhang, Yueheng, et al.
Published: (2025)

Climbing the Ladder of Reasoning: What LLMs Can-and Still Can't-Solve after SFT?
by: Sun, Yiyou, et al.
Published: (2025)

Strategy Executability in Mathematical Reasoning: Leveraging Human-Model Differences for Effective Guidance
by: Liang, Weida, et al.
Published: (2026)

Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can't Answer?
by: Balepur, Nishant, et al.
Published: (2024)

KnowHalu: Hallucination Detection via Multi-Form Knowledge Based Factual Checking
by: Zhang, Jiawei, et al.
Published: (2024)

How much reliable is ChatGPT's prediction on Information Extraction under Input Perturbations?
by: Mondal, Ishani, et al.
Published: (2024)

The Art of Saying No: Contextual Noncompliance in Language Models
by: Brahman, Faeze, et al.
Published: (2024)

Can LLMs Reason with Rules? Logic Scaffolding for Stress-Testing and Improving LLMs
by: Wang, Siyuan, et al.
Published: (2024)

In Agents We Trust, but Who Do Agents Trust? Latent Source Preferences Steer LLM Generations
by: Khan, Mohammad Aflah, et al.
Published: (2026)

CodaRAG: Connecting the Dots with Associativity Inspired by Complementary Learning
by: Li, Cheng-Yen, et al.
Published: (2026)

DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life
by: Chiu, Yu Ying, et al.
Published: (2024)

H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs
by: Gao, Cheng, et al.
Published: (2025)

Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Agents
by: Li, Xu, et al.
Published: (2026)

DFA-RAG: Conversational Semantic Router for Large Language Model with Definite Finite Automaton
by: Sun, Yiyou, et al.
Published: (2024)

Being Kind Isn't Always Being Safe: Diagnosing Affective Hallucination in LLMs
by: Kim, Sewon, et al.
Published: (2025)

CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification
by: Tian, Yuchen, et al.
Published: (2024)

Why LLMs Hallucinate on Structured Knowledge: A Mechanistic Analysis of Reasoning over Linearized Representations
by: Li, Shanghao, et al.
Published: (2026)

Look Within, Why LLMs Hallucinate: A Causal Perspective
by: Li, He, et al.
Published: (2024)

Why Fine-Tuning Encourages Hallucinations and How to Fix It
by: Kaplan, Guy, et al.
Published: (2026)

Why Language Models Hallucinate
by: Kalai, Adam Tauman, et al.
Published: (2025)

Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data
by: Treutlein, Johannes, et al.
Published: (2024)

Opt-ICL at LeWiDi-2025: Maximizing In-Context Signal from Rater Examples via Meta-Learning
by: Sorensen, Taylor, et al.
Published: (2025)

Understanding How Value Neurons Shape the Generation of Specified Values in LLMs
by: Su, Yi, et al.
Published: (2025)

Why LLMs Cannot Think and How to Fix It
by: Jahrens, Marius, et al.
Published: (2025)

DALD: Improving Logits-based Detector without Logits from Black-box LLMs
by: Zeng, Cong, et al.
Published: (2024)

From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization
by: Belem, Catarina G., et al.
Published: (2024)