:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Zhehao, Xu, Weijie, Wu, Fanyou, Reddy, Chandan K.
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2505.08054
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Think Before Refusal : Triggering Safety Reflection in LLMs to Mitigate False Refusal Behavior
by: Si, Shengyun, et al.
Published: (2025)

Sequence-Level Certainty Reduces Hallucination In Knowledge-Grounded Dialogue Generation
by: Wan, Yixin, et al.
Published: (2023)

Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
by: Yuan, Youliang, et al.
Published: (2024)

Distractor Injection Attacks on Large Reasoning Models: Characterization and Defense
by: Zhang, Zhehao, et al.
Published: (2025)

Synthesizing Conversations from Unlabeled Documents using Automatic Response Segmentation
by: Wu, Fanyou, et al.
Published: (2024)

Reasoning Towards Fairness: Mitigating Bias in Language Models through Reasoning-Guided Fine-Tuning
by: Kabra, Sanchit, et al.
Published: (2025)

The Personalization Trap: How User Memory Alters Emotional Reasoning in LLMs
by: Fang, Xi, et al.
Published: (2025)

HR-Agent: A Task-Oriented Dialogue (TOD) LLM Agent Tailored for HR Applications
by: Xu, Weijie, et al.
Published: (2024)

Learning to Refuse: Towards Mitigating Privacy Risks in LLMs
by: Liu, Zhenhua, et al.
Published: (2024)

Mitigating Over-Refusal in Aligned Large Language Models via Inference-Time Activation Energy
by: Jiang, Eric Hanchen, et al.
Published: (2025)

Discern Truth from Falsehood: Reducing Over-Refusal via Contrastive Refinement
by: Lu, Yuxiao, et al.
Published: (2026)

ORFuzz: Fuzzing the "Other Side" of LLM Safety -- Testing Over-Refusal
by: Zhang, Haonan, et al.
Published: (2025)

Can LLMs Refuse Questions They Do Not Know? Measuring Knowledge-Aware Refusal in Factual Tasks
by: Pan, Wenbo, et al.
Published: (2025)

Quantifying Fairness in LLMs Beyond Tokens: A Semantic and Statistical Perspective
by: Xu, Weijie, et al.
Published: (2025)

OR-Bench: An Over-Refusal Benchmark for Large Language Models
by: Cui, Justin, et al.
Published: (2024)

LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers
by: Abhyankar, Nikhil, et al.
Published: (2025)

Beyond No: Quantifying AI Over-Refusal and Emotional Attachment Boundaries
by: Noever, David, et al.
Published: (2025)

Mitigating Selection Bias with Node Pruning and Auxiliary Options
by: Choi, Hyeong Kyu, et al.
Published: (2024)

From Oracle to Noisy Context: Mitigating Contextual Exposure Bias in Speech-LLMs
by: Guo, Xiaoyong, et al.
Published: (2026)

H-STAR: LLM-driven Hybrid SQL-Text Adaptive Reasoning on Tables
by: Abhyankar, Nikhil, et al.
Published: (2024)

When Personalization Misleads: Understanding and Mitigating Hallucinations in Personalized LLMs
by: Sun, Zhongxiang, et al.
Published: (2026)

SATA-BENCH: Select All That Apply Benchmark for Multiple Choice Questions
by: Xu, Weijie, et al.
Published: (2025)

Towards Understanding and Improving Refusal in Compressed Models via Mechanistic Interpretability
by: Chhabra, Vishnu Kabir, et al.
Published: (2025)

RUST-BENCH: Benchmarking LLM Reasoning on Unstructured Text within Structured Tables
by: Abhyankar, Nikhil, et al.
Published: (2025)

Cannot or Should Not? Automatic Analysis of Refusal Composition in IFT/RLHF Datasets and Refusal Behavior of Black-Box LLMs
by: von Recum, Alexander, et al.
Published: (2024)

Aligning Reasoning LLMs for Materials Discovery with Physics-aware Rejection Sampling
by: Hyun, Lee, et al.
Published: (2025)

Where Do Reasoning Models Refuse?
by: Yamaguchi, Kureha, et al.
Published: (2025)

TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning
by: Xu, Zhangchen, et al.
Published: (2025)

When Safety Blocks Sense: Measuring Semantic Confusion in LLM Refusals
by: Anonto, Riad Ahmed, et al.
Published: (2025)

RefusalGuard: Geometry-Preserving Fine-Tuning for Safety in LLMs
by: Asif, Sadia, et al.
Published: (2026)

STAR-S: Improving Safety Alignment through Self-Taught Reasoning on Safety Rules
by: Wu, Di, et al.
Published: (2026)

Aligned at the Start: Conceptual Groupings in LLM Embeddings
by: Khatir, Mehrdad, et al.
Published: (2024)

Benchmarking Contextual and Paralinguistic Reasoning in Speech-LLMs: A Case Study with In-the-Wild Data
by: Wang, Qiongqiong, et al.
Published: (2025)

Improving Grammatical Error Correction via Contextual Data Augmentation
by: Wang, Yixuan, et al.
Published: (2024)

Does Refusal Training in LLMs Generalize to the Past Tense?
by: Andriushchenko, Maksym, et al.
Published: (2024)

Food Noise & False Safety: A Systematic Evaluation of How LLMs Fail to Adapt to Eating Disorder Queries with Clinician Feedback
by: Pucci, Giulia, et al.
Published: (2026)

HiBench: Benchmarking LLMs Capability on Hierarchical Structure Reasoning
by: Jiang, Zhuohang, et al.
Published: (2025)

Learn to Disguise: Avoid Refusal Responses in LLM's Defense via a Multi-agent Attacker-Disguiser Game
by: Xu, Qianqiao, et al.
Published: (2024)

Refusal Steering: Fine-grained Control over LLM Refusal Behaviour for Sensitive Topics
by: García-Ferrero, Iker, et al.
Published: (2025)

Mission Impossible: Feedback-Guided Dynamic Interactive Planning for Improving Reasoning on LLMs
by: Yan, Dong, et al.
Published: (2025)