Saved in:
| Main Authors: | Yan, Siyu, Zeng, Long, Wu, Xuecheng, Han, Chengcheng, Zhang, Kongcheng, Peng, Chong, Cao, Xuezhi, Cai, Xunliang, Guo, Chenjuan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.14651 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
HKD4VLM: A Progressive Hybrid Knowledge Distillation Framework for Robust Multimodal Hallucination and Factuality Detection in VLMs
by: Zhang, Zijian, et al.
Published: (2025)
by: Zhang, Zijian, et al.
Published: (2025)
TokenFocus-VQA: Enhancing Text-to-Image Alignment with Position-Aware Focus and Multi-Perspective Aggregations on LVLMs
by: Zhang, Zijian, et al.
Published: (2025)
by: Zhang, Zijian, et al.
Published: (2025)
Length Desensitization in Direct Preference Optimization
by: Liu, Wei, et al.
Published: (2024)
by: Liu, Wei, et al.
Published: (2024)
Why Not Act on What You Know? Unleashing Safety Potential of LLMs via Self-Aware Guard Enhancement
by: Ding, Peng, et al.
Published: (2025)
by: Ding, Peng, et al.
Published: (2025)
Friend or Foe: How LLMs' Safety Mind Gets Fooled by Intent Shift Attack
by: Ding, Peng, et al.
Published: (2025)
by: Ding, Peng, et al.
Published: (2025)
TRIDENT: Enhancing Large Language Model Safety with Tri-Dimensional Diversified Red-Teaming Data Synthesis
by: Wu, Xiaorui, et al.
Published: (2025)
by: Wu, Xiaorui, et al.
Published: (2025)
ChatSOP: An SOP-Guided MCTS Planning Framework for Controllable LLM Dialogue Agents
by: Li, Zhigen, et al.
Published: (2024)
by: Li, Zhigen, et al.
Published: (2024)
Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs
by: Ding, Peng, et al.
Published: (2024)
by: Ding, Peng, et al.
Published: (2024)
RedCoder: Automated Multi-Turn Red Teaming for Code LLMs
by: Mo, Wenjie Jacky, et al.
Published: (2025)
by: Mo, Wenjie Jacky, et al.
Published: (2025)
Red Teaming Language Models for Processing Contradictory Dialogues
by: Wen, Xiaofei, et al.
Published: (2024)
by: Wen, Xiaofei, et al.
Published: (2024)
Audio Turing Test: Benchmarking the Human-likeness of Large Language Model-based Text-to-Speech Systems in Chinese
by: Wang, Xihuai, et al.
Published: (2025)
by: Wang, Xihuai, et al.
Published: (2025)
AMemGym: Interactive Memory Benchmarking for Assistants in Long-Horizon Conversations
by: Jiayang, Cheng, et al.
Published: (2026)
by: Jiayang, Cheng, et al.
Published: (2026)
LARY: A Latent Action Representation Yielding Benchmark for Generalizable Vision-to-Action Alignment
by: Nie, Dujun, et al.
Published: (2026)
by: Nie, Dujun, et al.
Published: (2026)
Automatically Benchmarking LLM Code Agents through Agent-Driven Annotation and Evaluation
by: Fu, Lingyue, et al.
Published: (2025)
by: Fu, Lingyue, et al.
Published: (2025)
Red Teaming Large Reasoning Models
by: Chen, Jiawei, et al.
Published: (2025)
by: Chen, Jiawei, et al.
Published: (2025)
Meeseeks: A Feedback-Driven, Iterative Self-Correction Benchmark evaluating LLMs' Instruction Following Capability
by: wang, Jiaming, et al.
Published: (2025)
by: wang, Jiaming, et al.
Published: (2025)
CoSafe: Evaluating Large Language Model Safety in Multi-Turn Dialogue Coreference
by: Yu, Erxin, et al.
Published: (2024)
by: Yu, Erxin, et al.
Published: (2024)
Speak Out of Turn: Safety Vulnerability of Large Language Models in Multi-turn Dialogue
by: Zhou, Zhenhong, et al.
Published: (2024)
by: Zhou, Zhenhong, et al.
Published: (2024)
ARIADNE: Agentic Reward-Informed Adaptive Decision Exploration via Blackboard-Driven MCTS for Competitive Program Generation
by: Wei, Minnan, et al.
Published: (2026)
by: Wei, Minnan, et al.
Published: (2026)
SafeDialBench: A Fine-Grained Safety Evaluation Benchmark for Large Language Models in Multi-Turn Dialogues with Diverse Jailbreak Attacks
by: Cao, Hongye, et al.
Published: (2025)
by: Cao, Hongye, et al.
Published: (2025)
Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks
by: Guo, Ruohao, et al.
Published: (2025)
by: Guo, Ruohao, et al.
Published: (2025)
MTMCS-Bench: Evaluating Contextual Safety of Multimodal Large Language Models in Multi-Turn Dialogues
by: Liu, Zheyuan, et al.
Published: (2026)
by: Liu, Zheyuan, et al.
Published: (2026)
Reflecting Twice before Speaking with Empathy: Self-Reflective Alternating Inference for Empathy-Aware End-to-End Spoken Dialogue
by: Jia, Yuhang, et al.
Published: (2026)
by: Jia, Yuhang, et al.
Published: (2026)
R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?
by: Lu, Yi, et al.
Published: (2025)
by: Lu, Yi, et al.
Published: (2025)
A Multi-Domain Red Teaming Framework for Safety, Robustness, and Fairness Evaluation of Medical Large Language Models
by: Feier, Andrei Marian, et al.
Published: (2026)
by: Feier, Andrei Marian, et al.
Published: (2026)
Instance-level Randomization: Toward More Stable LLM Evaluations
by: Li, Yiyang, et al.
Published: (2025)
by: Li, Yiyang, et al.
Published: (2025)
Physics-Constrained Neural Dynamics: A Unified Manifold Framework for Large-Scale Power Flow Computation
by: Liu, Xuezhi
Published: (2025)
by: Liu, Xuezhi
Published: (2025)
Reasoning in Action: MCTS-Driven Knowledge Retrieval for Large Language Models
by: Liu, Shuqi, et al.
Published: (2025)
by: Liu, Shuqi, et al.
Published: (2025)
AMO-Bench: Large Language Models Still Struggle in High School Math Competitions
by: An, Shengnan, et al.
Published: (2025)
by: An, Shengnan, et al.
Published: (2025)
Human-Robot Red Teaming for Safety-Aware Reasoning
by: Sheetz, Emily, et al.
Published: (2025)
by: Sheetz, Emily, et al.
Published: (2025)
Red Teaming AI Red Teaming
by: Majumdar, Subhabrata, et al.
Published: (2025)
by: Majumdar, Subhabrata, et al.
Published: (2025)
PersonaTeaming: Supporting Persona-Driven Red-Teaming for Generative AI
by: Deng, Wesley Hanwen, et al.
Published: (2026)
by: Deng, Wesley Hanwen, et al.
Published: (2026)
TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas
by: Jian, Ai, et al.
Published: (2026)
by: Jian, Ai, et al.
Published: (2026)
Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward
by: Wan, Yanming, et al.
Published: (2025)
by: Wan, Yanming, et al.
Published: (2025)
SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models
by: Diao, Muxi, et al.
Published: (2024)
by: Diao, Muxi, et al.
Published: (2024)
LALM-as-a-Judge: Benchmarking Large Audio-Language Models for Safety Evaluation in Multi-Turn Spoken Dialogues
by: Ivry, Amir, et al.
Published: (2026)
by: Ivry, Amir, et al.
Published: (2026)
A Red Teaming Roadmap Towards System-Level Safety
by: Wang, Zifan, et al.
Published: (2025)
by: Wang, Zifan, et al.
Published: (2025)
ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming
by: Tedeschi, Simone, et al.
Published: (2024)
by: Tedeschi, Simone, et al.
Published: (2024)
AutoRISE: Agent-Driven Strategy Evolution for Red-Teaming Large Language Models
by: Gautam, Tanmay, et al.
Published: (2026)
by: Gautam, Tanmay, et al.
Published: (2026)
GUI-CIDER: Mid-training GUI Agents via Causal Internalization and Density-aware Exemplar Reselection
by: Wu, Zheng, et al.
Published: (2026)
by: Wu, Zheng, et al.
Published: (2026)
Similar Items
-
HKD4VLM: A Progressive Hybrid Knowledge Distillation Framework for Robust Multimodal Hallucination and Factuality Detection in VLMs
by: Zhang, Zijian, et al.
Published: (2025) -
TokenFocus-VQA: Enhancing Text-to-Image Alignment with Position-Aware Focus and Multi-Perspective Aggregations on LVLMs
by: Zhang, Zijian, et al.
Published: (2025) -
Length Desensitization in Direct Preference Optimization
by: Liu, Wei, et al.
Published: (2024) -
Why Not Act on What You Know? Unleashing Safety Potential of LLMs via Self-Aware Guard Enhancement
by: Ding, Peng, et al.
Published: (2025) -
Friend or Foe: How LLMs' Safety Mind Gets Fooled by Intent Shift Attack
by: Ding, Peng, et al.
Published: (2025)