:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Shen, Chenglei, Sun, Zhongxiang, Shi, Teng, Zhang, Xiao, Xu, Jun
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2508.04530
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

When Personalization Misleads: Understanding and Mitigating Hallucinations in Personalized LLMs
by: Sun, Zhongxiang, et al.
Published: (2026)

On the Decision-Making Abilities in Role-Playing using Large Language Models
by: Shen, Chenglei, et al.
Published: (2024)

LLaDA-Rec: Discrete Diffusion for Parallel Semantic ID Generation in Generative Recommendation
by: Shi, Teng, et al.
Published: (2025)

Effective In-Context Example Selection through Data Compression
by: Sun, Zhongxiang, et al.
Published: (2024)

SteerX: Disentangled Steering for LLM Personalization
by: Zhao, Xiaoyan, et al.
Published: (2025)

MAPS: Motivation-Aware Personalized Search via LLM-Driven Consultation Alignment
by: Qin, Weicong, et al.
Published: (2025)

Trigger$^3$: Refining Query Correction via Adaptive Model Selector
by: Zhang, Kepu, et al.
Published: (2024)

Detection and Mitigation of Hallucination in Large Reasoning Models: A Mechanistic Perspective
by: Sun, Zhongxiang, et al.
Published: (2025)

An Explicit Syllogistic Legal Reasoning Framework for Large Language Models
by: Zhang, Kepu, et al.
Published: (2025)

Large Language Models Help Humans Verify Truthfulness -- Except When They Are Convincingly Wrong
by: Si, Chenglei, et al.
Published: (2023)

DRIFT: Detecting Representational Inconsistencies for Factual Truthfulness
by: Bhatnagar, Rohan, et al.
Published: (2026)

MoRE: A Mixture of Reflectors Framework for Large Language Model-Based Sequential Recommendation
by: Qin, Weicong, et al.
Published: (2024)

Towards Understanding Continual Factual Knowledge Acquisition of Language Models: From Theory to Algorithm
by: Wang, Haoyu, et al.
Published: (2026)

Interpretable LLM Guardrails via Sparse Representation Steering
by: He, Zeqing, et al.
Published: (2025)

LargePiG: Your Large Language Model is Secretly a Pointer Generator
by: Sun, Zhongxiang, et al.
Published: (2024)

ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability
by: Sun, Zhongxiang, et al.
Published: (2024)

Logic Rules as Explanations for Legal Case Retrieval
by: Sun, Zhongxiang, et al.
Published: (2024)

TruthFlow: Truthful LLM Generation via Representation Flow Correction
by: Wang, Hanyu, et al.
Published: (2025)

Exploring the Nexus of Large Language Models and Legal Systems: A Short Survey
by: Qin, Weicong, et al.
Published: (2024)

Discern Truth from Falsehood: Reducing Over-Refusal via Contrastive Refinement
by: Lu, Yuxiao, et al.
Published: (2026)

PrLM: Learning Explicit Reasoning for Personalized RAG via Contrastive Reward Optimization
by: Zhang, Kepu, et al.
Published: (2025)

Disentangled VAD Representations via a Variational Framework for Political Stance Detection
by: Xu, Beiyu, et al.
Published: (2025)

Deep Search with Hierarchical Meta-Cognitive Monitoring Inspired by Cognitive Neuroscience
by: Sun, Zhongxiang, et al.
Published: (2026)

Balancing Truthfulness and Informativeness with Uncertainty-Aware Instruction Fine-Tuning
by: Wu, Tianyi, et al.
Published: (2025)

SAE-SSV: Supervised Steering in Sparse Representation Spaces for Reliable Control of Language Models
by: He, Zirui, et al.
Published: (2025)

ReARTeR: Retrieval-Augmented Reasoning with Trustworthy Process Rewarding
by: Sun, Zhongxiang, et al.
Published: (2025)

TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning
by: Wei, Zhepei, et al.
Published: (2025)

CoT is Not the Chain of Truth: An Empirical Internal Analysis of Reasoning LLMs for Fake News Generation
by: Tong, Zhao, et al.
Published: (2026)

Interpretable Discriminative Text Representations via Agreement and Label Disentanglement
by: Wang, Tong, et al.
Published: (2026)

Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories
by: Wang, Tianlong, et al.
Published: (2024)

The Cylindrical Representation Hypothesis for Language Model Steering
by: Gao, Lang, et al.
Published: (2026)

Improved Representation Steering for Language Models
by: Wu, Zhengxuan, et al.
Published: (2025)

Training-free Truthfulness Detection via Value Vectors in LLMs
by: Liu, Runheng, et al.
Published: (2025)

DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations
by: Qi, Tianhao, et al.
Published: (2024)

Representational and Behavioral Stability of Truth in Large Language Models
by: Dies, Samantha, et al.
Published: (2025)

In-Distribution Steering: Balancing Control and Coherence in Language Model Generation
by: Vogels, Arthur, et al.
Published: (2025)

How Context Shapes Truth: Geometric Transformations of Statement-level Truth Representations in LLMs
by: Adarsh, Shivam, et al.
Published: (2026)

Breaking the Generator Barrier: Disentangled Representation for Generalizable AI-Text Detection
by: Pu, Xiao, et al.
Published: (2026)

LLaVA Steering: Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering
by: Bi, Jinhe, et al.
Published: (2024)

SteerRM: Debiasing Reward Models via Sparse Autoencoders
by: Sun, Mengyuan, et al.
Published: (2026)