Saved in:
| Main Authors: | Sun, Chongren, Li, Yuran, Wu, Di, Boulet, Benoit |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.12975 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Training-Free Regeneration Paradigm: Contrastive Reflection Memory Guided Self-Verification and Self-Improvement
by: Li, Yuran, et al.
Published: (2026)
by: Li, Yuran, et al.
Published: (2026)
Leveraging LLMs as Meta-Judges: A Multi-Agent Framework for Evaluating LLM Judgments
by: Li, Yuran, et al.
Published: (2025)
by: Li, Yuran, et al.
Published: (2025)
HaluEval-Wild: Evaluating Hallucinations of Language Models in the Wild
by: Zhu, Zhiying, et al.
Published: (2024)
by: Zhu, Zhiying, et al.
Published: (2024)
Beyond Facts: Evaluating Intent Hallucination in Large Language Models
by: Hao, Yijie, et al.
Published: (2025)
by: Hao, Yijie, et al.
Published: (2025)
PerHalluEval: Persian Hallucination Evaluation Benchmark for Large Language Models
by: Hosseini, Mohammad, et al.
Published: (2025)
by: Hosseini, Mohammad, et al.
Published: (2025)
Detecting LLM Fact-conflicting Hallucinations Enhanced by Temporal-logic-based Reasoning
by: Li, Ningke, et al.
Published: (2025)
by: Li, Ningke, et al.
Published: (2025)
On Large Language Models' Hallucination with Regard to Known Facts
by: Jiang, Che, et al.
Published: (2024)
by: Jiang, Che, et al.
Published: (2024)
Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models
by: Wu, Junjie, et al.
Published: (2024)
by: Wu, Junjie, et al.
Published: (2024)
Hallucination to Truth: A Review of Fact-Checking and Factuality Evaluation in Large Language Models
by: Rahman, Subhey Sadi, et al.
Published: (2025)
by: Rahman, Subhey Sadi, et al.
Published: (2025)
Towards Unification of Hallucination Detection and Fact Verification for Large Language Models
by: Su, Weihang, et al.
Published: (2025)
by: Su, Weihang, et al.
Published: (2025)
RoleEval: A Bilingual Role Evaluation Benchmark for Large Language Models
by: Shen, Tianhao, et al.
Published: (2023)
by: Shen, Tianhao, et al.
Published: (2023)
Language Models Hallucinate, but May Excel at Fact Verification
by: Guan, Jian, et al.
Published: (2023)
by: Guan, Jian, et al.
Published: (2023)
OpenHuEval: Evaluating Large Language Model on Hungarian Specifics
by: Yang, Haote, et al.
Published: (2025)
by: Yang, Haote, et al.
Published: (2025)
BenHalluEval: A Multi-Task Hallucination Evaluation Framework for Large Language Models on Bengali
by: Adib, Shefayat E Shams, et al.
Published: (2026)
by: Adib, Shefayat E Shams, et al.
Published: (2026)
Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models
by: Jiang, Chaoya, et al.
Published: (2024)
by: Jiang, Chaoya, et al.
Published: (2024)
Multi-Modal Fact-Verification Framework for Reducing Hallucinations in Large Language Models
by: Patel, Piyushkumar
Published: (2025)
by: Patel, Piyushkumar
Published: (2025)
Mitigating Hallucinations in Large Vision-Language Models with Internal Fact-based Contrastive Decoding
by: Wang, Chao, et al.
Published: (2025)
by: Wang, Chao, et al.
Published: (2025)
Poly-FEVER: A Multilingual Fact Verification Benchmark for Hallucination Detection in Large Language Models
by: Zhang, Hanzhi, et al.
Published: (2025)
by: Zhang, Hanzhi, et al.
Published: (2025)
Large Language Models are Skeptics: False Negative Problem of Input-conflicting Hallucination
by: Song, Jongyoon, et al.
Published: (2024)
by: Song, Jongyoon, et al.
Published: (2024)
R-Eval: A Unified Toolkit for Evaluating Domain Knowledge of Retrieval Augmented Large Language Models
by: Tu, Shangqing, et al.
Published: (2024)
by: Tu, Shangqing, et al.
Published: (2024)
Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector
by: Cheng, Xiaoxue, et al.
Published: (2024)
by: Cheng, Xiaoxue, et al.
Published: (2024)
LexEval: A Comprehensive Chinese Legal Benchmark for Evaluating Large Language Models
by: Li, Haitao, et al.
Published: (2024)
by: Li, Haitao, et al.
Published: (2024)
Give Us the Facts: Enhancing Large Language Models with Knowledge Graphs for Fact-aware Language Modeling
by: Yang, Linyao, et al.
Published: (2023)
by: Yang, Linyao, et al.
Published: (2023)
ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models
by: Zhao, Haiquan, et al.
Published: (2024)
by: Zhao, Haiquan, et al.
Published: (2024)
ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks
by: Yu, Xiaodong, et al.
Published: (2023)
by: Yu, Xiaodong, et al.
Published: (2023)
SocialEval: Evaluating Social Intelligence of Large Language Models
by: Zhou, Jinfeng, et al.
Published: (2025)
by: Zhou, Jinfeng, et al.
Published: (2025)
CriticEval: Evaluating Large Language Model as Critic
by: Lan, Tian, et al.
Published: (2024)
by: Lan, Tian, et al.
Published: (2024)
LalaEval: A Holistic Human Evaluation Framework for Domain-Specific Large Language Models
by: Sun, Chongyan, et al.
Published: (2024)
by: Sun, Chongyan, et al.
Published: (2024)
MedEthicEval: Evaluating Large Language Models Based on Chinese Medical Ethics
by: Jin, Haoan, et al.
Published: (2025)
by: Jin, Haoan, et al.
Published: (2025)
AttackEval: How to Evaluate the Effectiveness of Jailbreak Attacking on Large Language Models
by: Shu, Dong, et al.
Published: (2024)
by: Shu, Dong, et al.
Published: (2024)
HD-Eval: Aligning Large Language Model Evaluators Through Hierarchical Criteria Decomposition
by: Liu, Yuxuan, et al.
Published: (2024)
by: Liu, Yuxuan, et al.
Published: (2024)
LexInstructEval: Lexical Instruction Following Evaluation for Large Language Models
by: Ren, Huimin, et al.
Published: (2025)
by: Ren, Huimin, et al.
Published: (2025)
Hallucination Detection and Evaluation of Large Language Model
by: Zhang, Chenggong, et al.
Published: (2025)
by: Zhang, Chenggong, et al.
Published: (2025)
From Hallucinations to Facts: Enhancing Language Models with Curated Knowledge Graphs
by: Joshi, Ratnesh Kumar, et al.
Published: (2024)
by: Joshi, Ratnesh Kumar, et al.
Published: (2024)
RealFactBench: A Benchmark for Evaluating Large Language Models in Real-World Fact-Checking
by: Yang, Shuo, et al.
Published: (2025)
by: Yang, Shuo, et al.
Published: (2025)
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
by: Tang, Liyan, et al.
Published: (2024)
by: Tang, Liyan, et al.
Published: (2024)
Unified Hallucination Detection for Multimodal Large Language Models
by: Chen, Xiang, et al.
Published: (2024)
by: Chen, Xiang, et al.
Published: (2024)
MultiPragEval: Multilingual Pragmatic Evaluation of Large Language Models
by: Park, Dojun, et al.
Published: (2024)
by: Park, Dojun, et al.
Published: (2024)
A Unified Hallucination Mitigation Framework for Large Vision-Language Models
by: Chang, Yue, et al.
Published: (2024)
by: Chang, Yue, et al.
Published: (2024)
Hallucination Detection with Small Language Models
by: Cheung, Ming
Published: (2025)
by: Cheung, Ming
Published: (2025)
Similar Items
-
A Training-Free Regeneration Paradigm: Contrastive Reflection Memory Guided Self-Verification and Self-Improvement
by: Li, Yuran, et al.
Published: (2026) -
Leveraging LLMs as Meta-Judges: A Multi-Agent Framework for Evaluating LLM Judgments
by: Li, Yuran, et al.
Published: (2025) -
HaluEval-Wild: Evaluating Hallucinations of Language Models in the Wild
by: Zhu, Zhiying, et al.
Published: (2024) -
Beyond Facts: Evaluating Intent Hallucination in Large Language Models
by: Hao, Yijie, et al.
Published: (2025) -
PerHalluEval: Persian Hallucination Evaluation Benchmark for Large Language Models
by: Hosseini, Mohammad, et al.
Published: (2025)