Saved in:
| Main Authors: | Zhang, Fujie, Yu, Peiqi, Yi, Biao, Zhang, Baolei, Li, Tong, Liu, Zheli |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.04847 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CTRAP: Embedding Collapse Trap to Safeguard Large Language Models from Harmful Fine-Tuning
by: Yi, Biao, et al.
Published: (2025)
by: Yi, Biao, et al.
Published: (2025)
BadActs: A Universal Backdoor Defense in the Activation Space
by: Yi, Biao, et al.
Published: (2024)
by: Yi, Biao, et al.
Published: (2024)
Gradient Surgery for Safe LLM Fine-Tuning
by: Yi, Biao, et al.
Published: (2025)
by: Yi, Biao, et al.
Published: (2025)
Probe before You Talk: Towards Black-box Defense against Backdoor Unalignment for Large Language Models
by: Yi, Biao, et al.
Published: (2025)
by: Yi, Biao, et al.
Published: (2025)
BadReasoner: Planting Tunable Overthinking Backdoors into Large Reasoning Models for Fun or Profit
by: Yi, Biao, et al.
Published: (2025)
by: Yi, Biao, et al.
Published: (2025)
Hallucination Detection via Internal States and Structured Reasoning Consistency in Large Language Models
by: Song, Yusheng, et al.
Published: (2025)
by: Song, Yusheng, et al.
Published: (2025)
Detection Method for Prompt Injection by Integrating Pre-trained Model and Heuristic Feature Engineering
by: Ji, Yi, et al.
Published: (2025)
by: Ji, Yi, et al.
Published: (2025)
Confabulation: The Surprising Value of Large Language Model Hallucinations
by: Sui, Peiqi, et al.
Published: (2024)
by: Sui, Peiqi, et al.
Published: (2024)
Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models
by: Su, Weihang, et al.
Published: (2024)
by: Su, Weihang, et al.
Published: (2024)
INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection
by: Chen, Chao, et al.
Published: (2024)
by: Chen, Chao, et al.
Published: (2024)
Detecting Hallucinations in Large Language Models via Internal Attention Divergence Signals
by: van Dijk, Gijs
Published: (2026)
by: van Dijk, Gijs
Published: (2026)
Attention Sinks as Internal Signals for Hallucination Detection in Large Language Models
by: Binkowski, Jakub, et al.
Published: (2026)
by: Binkowski, Jakub, et al.
Published: (2026)
Knowledge Overshadowing Causes Amalgamated Hallucination in Large Language Models
by: Zhang, Yuji, et al.
Published: (2024)
by: Zhang, Yuji, et al.
Published: (2024)
Bolster Hallucination Detection via Prompt-Guided Data Augmentation
by: Li, Wenyun, et al.
Published: (2025)
by: Li, Wenyun, et al.
Published: (2025)
Hallucination Detection and Evaluation of Large Language Model
by: Zhang, Chenggong, et al.
Published: (2025)
by: Zhang, Chenggong, et al.
Published: (2025)
Practical Framework for Privacy-Preserving and Byzantine-robust Federated Learning
by: Zhang, Baolei, et al.
Published: (2025)
by: Zhang, Baolei, et al.
Published: (2025)
Enhancing Robustness in Large Language Models: Prompting for Mitigating the Impact of Irrelevant Information
by: Jiang, Ming, et al.
Published: (2024)
by: Jiang, Ming, et al.
Published: (2024)
HaluNet: Learning Hallucination Risk from Internal Signals in LLM Question Answering
by: Tong, Chaodong, et al.
Published: (2025)
by: Tong, Chaodong, et al.
Published: (2025)
Active Prompting with Chain-of-Thought for Large Language Models
by: Diao, Shizhe, et al.
Published: (2023)
by: Diao, Shizhe, et al.
Published: (2023)
Chain-of-Thought Prompting Obscures Hallucination Cues in Large Language Models: An Empirical Evaluation
by: Cheng, Jiahao, et al.
Published: (2025)
by: Cheng, Jiahao, et al.
Published: (2025)
Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models
by: Chen, Yuyan, et al.
Published: (2024)
by: Chen, Yuyan, et al.
Published: (2024)
Mitigating Prompt-Induced Hallucinations in Large Language Models via Structured Reasoning
by: Hao, Jinbo, et al.
Published: (2026)
by: Hao, Jinbo, et al.
Published: (2026)
Mitigating Hallucinations in Large Vision-Language Models by Self-Injecting Hallucinations
by: Lu, Yifan, et al.
Published: (2025)
by: Lu, Yifan, et al.
Published: (2025)
PretrainRL: Alleviating Factuality Hallucination of Large Language Models at the Beginning
by: Liu, Langming, et al.
Published: (2026)
by: Liu, Langming, et al.
Published: (2026)
PromptIntern: Saving Inference Costs by Internalizing Recurrent Prompt during Large Language Model Fine-tuning
by: Zou, Jiaru, et al.
Published: (2024)
by: Zou, Jiaru, et al.
Published: (2024)
Efficient Detection of Toxic Prompts in Large Language Models
by: Liu, Yi, et al.
Published: (2024)
by: Liu, Yi, et al.
Published: (2024)
Loki's Dance of Illusions: A Comprehensive Survey of Hallucination in Large Language Models
by: Li, Chaozhuo, et al.
Published: (2025)
by: Li, Chaozhuo, et al.
Published: (2025)
Preference Orchestrator: Prompt-Aware Multi-Objective Alignment for Large Language Models
by: Liu, Biao, et al.
Published: (2025)
by: Liu, Biao, et al.
Published: (2025)
Traceback of Poisoning Attacks to Retrieval-Augmented Generation
by: Zhang, Baolei, et al.
Published: (2025)
by: Zhang, Baolei, et al.
Published: (2025)
Hallucination Detection with the Internal Layers of LLMs
by: Preiß, Martin
Published: (2025)
by: Preiß, Martin
Published: (2025)
Alleviating Hallucinations of Large Language Models through Induced Hallucinations
by: Zhang, Yue, et al.
Published: (2023)
by: Zhang, Yue, et al.
Published: (2023)
Calibrating Reasoning in Language Models with Internal Consistency
by: Xie, Zhihui, et al.
Published: (2024)
by: Xie, Zhihui, et al.
Published: (2024)
Triggering Hallucinations in LLMs: A Quantitative Study of Prompt-Induced Hallucination in Large Language Models
by: Sato, Makoto
Published: (2025)
by: Sato, Makoto
Published: (2025)
Critical Confabulation: Can LLMs Hallucinate for Social Good?
by: Sui, Peiqi, et al.
Published: (2025)
by: Sui, Peiqi, et al.
Published: (2025)
Mitigating Hallucinations in Large Vision-Language Models with Internal Fact-based Contrastive Decoding
by: Wang, Chao, et al.
Published: (2025)
by: Wang, Chao, et al.
Published: (2025)
HIVE: Hidden-Evidence Verification for Hallucination Detection in Diffusion Large Language Models
by: Zhao, Guoshenghui, et al.
Published: (2026)
by: Zhao, Guoshenghui, et al.
Published: (2026)
Dynamic Attention-Guided Context Decoding for Mitigating Context Faithfulness Hallucinations in Large Language Models
by: Huang, Yanwen, et al.
Published: (2025)
by: Huang, Yanwen, et al.
Published: (2025)
Scalable Token-Level Hallucination Detection in Large Language Models
by: Min, Rui, et al.
Published: (2026)
by: Min, Rui, et al.
Published: (2026)
HAD: HAllucination Detection Language Models Based on a Comprehensive Hallucination Taxonomy
by: Xu, Fan, et al.
Published: (2025)
by: Xu, Fan, et al.
Published: (2025)
Prompt-Response Semantic Divergence Metrics for Faithfulness Hallucination and Misalignment Detection in Large Language Models
by: Halperin, Igor
Published: (2025)
by: Halperin, Igor
Published: (2025)
Similar Items
-
CTRAP: Embedding Collapse Trap to Safeguard Large Language Models from Harmful Fine-Tuning
by: Yi, Biao, et al.
Published: (2025) -
BadActs: A Universal Backdoor Defense in the Activation Space
by: Yi, Biao, et al.
Published: (2024) -
Gradient Surgery for Safe LLM Fine-Tuning
by: Yi, Biao, et al.
Published: (2025) -
Probe before You Talk: Towards Black-box Defense against Backdoor Unalignment for Large Language Models
by: Yi, Biao, et al.
Published: (2025) -
BadReasoner: Planting Tunable Overthinking Backdoors into Large Reasoning Models for Fun or Profit
by: Yi, Biao, et al.
Published: (2025)