Saved in:
| Main Authors: | Zhang, Zhehao, Xu, Weijie, Wu, Fanyou, Reddy, Chandan K. |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.08054 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Think Before Refusal : Triggering Safety Reflection in LLMs to Mitigate False Refusal Behavior
by: Si, Shengyun, et al.
Published: (2025)
by: Si, Shengyun, et al.
Published: (2025)
Sequence-Level Certainty Reduces Hallucination In Knowledge-Grounded Dialogue Generation
by: Wan, Yixin, et al.
Published: (2023)
by: Wan, Yixin, et al.
Published: (2023)
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
by: Yuan, Youliang, et al.
Published: (2024)
by: Yuan, Youliang, et al.
Published: (2024)
Distractor Injection Attacks on Large Reasoning Models: Characterization and Defense
by: Zhang, Zhehao, et al.
Published: (2025)
by: Zhang, Zhehao, et al.
Published: (2025)
Synthesizing Conversations from Unlabeled Documents using Automatic Response Segmentation
by: Wu, Fanyou, et al.
Published: (2024)
by: Wu, Fanyou, et al.
Published: (2024)
Reasoning Towards Fairness: Mitigating Bias in Language Models through Reasoning-Guided Fine-Tuning
by: Kabra, Sanchit, et al.
Published: (2025)
by: Kabra, Sanchit, et al.
Published: (2025)
The Personalization Trap: How User Memory Alters Emotional Reasoning in LLMs
by: Fang, Xi, et al.
Published: (2025)
by: Fang, Xi, et al.
Published: (2025)
HR-Agent: A Task-Oriented Dialogue (TOD) LLM Agent Tailored for HR Applications
by: Xu, Weijie, et al.
Published: (2024)
by: Xu, Weijie, et al.
Published: (2024)
Learning to Refuse: Towards Mitigating Privacy Risks in LLMs
by: Liu, Zhenhua, et al.
Published: (2024)
by: Liu, Zhenhua, et al.
Published: (2024)
Mitigating Over-Refusal in Aligned Large Language Models via Inference-Time Activation Energy
by: Jiang, Eric Hanchen, et al.
Published: (2025)
by: Jiang, Eric Hanchen, et al.
Published: (2025)
Discern Truth from Falsehood: Reducing Over-Refusal via Contrastive Refinement
by: Lu, Yuxiao, et al.
Published: (2026)
by: Lu, Yuxiao, et al.
Published: (2026)
ORFuzz: Fuzzing the "Other Side" of LLM Safety -- Testing Over-Refusal
by: Zhang, Haonan, et al.
Published: (2025)
by: Zhang, Haonan, et al.
Published: (2025)
Can LLMs Refuse Questions They Do Not Know? Measuring Knowledge-Aware Refusal in Factual Tasks
by: Pan, Wenbo, et al.
Published: (2025)
by: Pan, Wenbo, et al.
Published: (2025)
Quantifying Fairness in LLMs Beyond Tokens: A Semantic and Statistical Perspective
by: Xu, Weijie, et al.
Published: (2025)
by: Xu, Weijie, et al.
Published: (2025)
OR-Bench: An Over-Refusal Benchmark for Large Language Models
by: Cui, Justin, et al.
Published: (2024)
by: Cui, Justin, et al.
Published: (2024)
LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers
by: Abhyankar, Nikhil, et al.
Published: (2025)
by: Abhyankar, Nikhil, et al.
Published: (2025)
Beyond No: Quantifying AI Over-Refusal and Emotional Attachment Boundaries
by: Noever, David, et al.
Published: (2025)
by: Noever, David, et al.
Published: (2025)
Mitigating Selection Bias with Node Pruning and Auxiliary Options
by: Choi, Hyeong Kyu, et al.
Published: (2024)
by: Choi, Hyeong Kyu, et al.
Published: (2024)
From Oracle to Noisy Context: Mitigating Contextual Exposure Bias in Speech-LLMs
by: Guo, Xiaoyong, et al.
Published: (2026)
by: Guo, Xiaoyong, et al.
Published: (2026)
H-STAR: LLM-driven Hybrid SQL-Text Adaptive Reasoning on Tables
by: Abhyankar, Nikhil, et al.
Published: (2024)
by: Abhyankar, Nikhil, et al.
Published: (2024)
When Personalization Misleads: Understanding and Mitigating Hallucinations in Personalized LLMs
by: Sun, Zhongxiang, et al.
Published: (2026)
by: Sun, Zhongxiang, et al.
Published: (2026)
SATA-BENCH: Select All That Apply Benchmark for Multiple Choice Questions
by: Xu, Weijie, et al.
Published: (2025)
by: Xu, Weijie, et al.
Published: (2025)
Towards Understanding and Improving Refusal in Compressed Models via Mechanistic Interpretability
by: Chhabra, Vishnu Kabir, et al.
Published: (2025)
by: Chhabra, Vishnu Kabir, et al.
Published: (2025)
RUST-BENCH: Benchmarking LLM Reasoning on Unstructured Text within Structured Tables
by: Abhyankar, Nikhil, et al.
Published: (2025)
by: Abhyankar, Nikhil, et al.
Published: (2025)
Cannot or Should Not? Automatic Analysis of Refusal Composition in IFT/RLHF Datasets and Refusal Behavior of Black-Box LLMs
by: von Recum, Alexander, et al.
Published: (2024)
by: von Recum, Alexander, et al.
Published: (2024)
Aligning Reasoning LLMs for Materials Discovery with Physics-aware Rejection Sampling
by: Hyun, Lee, et al.
Published: (2025)
by: Hyun, Lee, et al.
Published: (2025)
Where Do Reasoning Models Refuse?
by: Yamaguchi, Kureha, et al.
Published: (2025)
by: Yamaguchi, Kureha, et al.
Published: (2025)
TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning
by: Xu, Zhangchen, et al.
Published: (2025)
by: Xu, Zhangchen, et al.
Published: (2025)
When Safety Blocks Sense: Measuring Semantic Confusion in LLM Refusals
by: Anonto, Riad Ahmed, et al.
Published: (2025)
by: Anonto, Riad Ahmed, et al.
Published: (2025)
RefusalGuard: Geometry-Preserving Fine-Tuning for Safety in LLMs
by: Asif, Sadia, et al.
Published: (2026)
by: Asif, Sadia, et al.
Published: (2026)
STAR-S: Improving Safety Alignment through Self-Taught Reasoning on Safety Rules
by: Wu, Di, et al.
Published: (2026)
by: Wu, Di, et al.
Published: (2026)
Aligned at the Start: Conceptual Groupings in LLM Embeddings
by: Khatir, Mehrdad, et al.
Published: (2024)
by: Khatir, Mehrdad, et al.
Published: (2024)
Benchmarking Contextual and Paralinguistic Reasoning in Speech-LLMs: A Case Study with In-the-Wild Data
by: Wang, Qiongqiong, et al.
Published: (2025)
by: Wang, Qiongqiong, et al.
Published: (2025)
Improving Grammatical Error Correction via Contextual Data Augmentation
by: Wang, Yixuan, et al.
Published: (2024)
by: Wang, Yixuan, et al.
Published: (2024)
Does Refusal Training in LLMs Generalize to the Past Tense?
by: Andriushchenko, Maksym, et al.
Published: (2024)
by: Andriushchenko, Maksym, et al.
Published: (2024)
Food Noise & False Safety: A Systematic Evaluation of How LLMs Fail to Adapt to Eating Disorder Queries with Clinician Feedback
by: Pucci, Giulia, et al.
Published: (2026)
by: Pucci, Giulia, et al.
Published: (2026)
HiBench: Benchmarking LLMs Capability on Hierarchical Structure Reasoning
by: Jiang, Zhuohang, et al.
Published: (2025)
by: Jiang, Zhuohang, et al.
Published: (2025)
Learn to Disguise: Avoid Refusal Responses in LLM's Defense via a Multi-agent Attacker-Disguiser Game
by: Xu, Qianqiao, et al.
Published: (2024)
by: Xu, Qianqiao, et al.
Published: (2024)
Refusal Steering: Fine-grained Control over LLM Refusal Behaviour for Sensitive Topics
by: García-Ferrero, Iker, et al.
Published: (2025)
by: García-Ferrero, Iker, et al.
Published: (2025)
Mission Impossible: Feedback-Guided Dynamic Interactive Planning for Improving Reasoning on LLMs
by: Yan, Dong, et al.
Published: (2025)
by: Yan, Dong, et al.
Published: (2025)
Similar Items
-
Think Before Refusal : Triggering Safety Reflection in LLMs to Mitigate False Refusal Behavior
by: Si, Shengyun, et al.
Published: (2025) -
Sequence-Level Certainty Reduces Hallucination In Knowledge-Grounded Dialogue Generation
by: Wan, Yixin, et al.
Published: (2023) -
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
by: Yuan, Youliang, et al.
Published: (2024) -
Distractor Injection Attacks on Large Reasoning Models: Characterization and Defense
by: Zhang, Zhehao, et al.
Published: (2025) -
Synthesizing Conversations from Unlabeled Documents using Automatic Response Segmentation
by: Wu, Fanyou, et al.
Published: (2024)