Saved in:
| Main Authors: | Yuan, Youliang, Jiao, Wenxiang, Xie, Yuejin, Shen, Chihao, Tian, Menghan, Wang, Wenxuan, Huang, Jen-tse, He, Pinjia |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.17455 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher
by: Yuan, Youliang, et al.
Published: (2023)
by: Yuan, Youliang, et al.
Published: (2023)
LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models
by: Wan, Yuxuan, et al.
Published: (2024)
by: Wan, Yuxuan, et al.
Published: (2024)
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
by: Yuan, Youliang, et al.
Published: (2024)
by: Yuan, Youliang, et al.
Published: (2024)
Can't See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs
by: Wang, Wenxuan, et al.
Published: (2025)
by: Wang, Wenxuan, et al.
Published: (2025)
Insight Over Sight: Exploring the Vision-Knowledge Conflicts in Multimodal LLMs
by: Liu, Xiaoyuan, et al.
Published: (2024)
by: Liu, Xiaoyuan, et al.
Published: (2024)
All Languages Matter: On the Multilingual Safety of Large Language Models
by: Wang, Wenxuan, et al.
Published: (2023)
by: Wang, Wenxuan, et al.
Published: (2023)
Curing Miracle Steps in LLM Mathematical Reasoning with Rubric Rewards
by: Yuan, Youliang, et al.
Published: (2025)
by: Yuan, Youliang, et al.
Published: (2025)
PaSBench-Video: A Streaming Video Benchmark for Proactive Safety Warning
by: Zhao, Yusong, et al.
Published: (2026)
by: Zhao, Yusong, et al.
Published: (2026)
Chain-of-Jailbreak Attack for Image Generation Models via Editing Step by Step
by: Wang, Wenxuan, et al.
Published: (2024)
by: Wang, Wenxuan, et al.
Published: (2024)
Human Cognitive Benchmarks Reveal Foundational Visual Gaps in MLLMs
by: Huang, Jen-Tse, et al.
Published: (2025)
by: Huang, Jen-Tse, et al.
Published: (2025)
VisBias: Measuring Explicit and Implicit Social Biases in Vision Language Models
by: Huang, Jen-tse, et al.
Published: (2025)
by: Huang, Jen-tse, et al.
Published: (2025)
On the Shortcut Learning in Multilingual Neural Machine Translation
by: Wang, Wenxuan, et al.
Published: (2024)
by: Wang, Wenxuan, et al.
Published: (2024)
Revisiting the Reliability of Psychological Scales on Large Language Models
by: Huang, Jen-tse, et al.
Published: (2023)
by: Huang, Jen-tse, et al.
Published: (2023)
How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments
by: Huang, Jen-tse, et al.
Published: (2024)
by: Huang, Jen-tse, et al.
Published: (2024)
Not All Countries Celebrate Thanksgiving: On the Cultural Dominance in Large Language Models
by: Wang, Wenxuan, et al.
Published: (2023)
by: Wang, Wenxuan, et al.
Published: (2023)
Identifying the Achilles' Heel: An Iterative Method for Dynamically Uncovering Factual Errors in Large Language Models
by: Wang, Wenxuan, et al.
Published: (2024)
by: Wang, Wenxuan, et al.
Published: (2024)
Who is ChatGPT? Benchmarking LLMs' Psychological Portrayal Using PsychoBench
by: Huang, Jen-tse, et al.
Published: (2023)
by: Huang, Jen-tse, et al.
Published: (2023)
Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs
by: Zhao, Sihang, et al.
Published: (2024)
by: Zhao, Sihang, et al.
Published: (2024)
Learning to Ask: When LLM Agents Meet Unclear Instruction
by: Wang, Wenxuan, et al.
Published: (2024)
by: Wang, Wenxuan, et al.
Published: (2024)
On the Failure of Latent State Persistence in Large Language Models
by: Huang, Jen-tse, et al.
Published: (2025)
by: Huang, Jen-tse, et al.
Published: (2025)
Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench
by: Huang, Jen-tse, et al.
Published: (2023)
by: Huang, Jen-tse, et al.
Published: (2023)
Probing Multimodal Large Language Models on Cognitive Biases in Chinese Short-Video Misinformation
by: Huang, Jen-tse, et al.
Published: (2026)
by: Huang, Jen-tse, et al.
Published: (2026)
New Job, New Gender? Measuring the Social Bias in Image Generation Models
by: Wang, Wenxuan, et al.
Published: (2024)
by: Wang, Wenxuan, et al.
Published: (2024)
SHAPE: Unifying Safety, Helpfulness and Pedagogy for Educational LLMs
by: Zhao, Sihang, et al.
Published: (2026)
by: Zhao, Sihang, et al.
Published: (2026)
AI Sees Your Location, But With A Bias Toward The Wealthy World
by: Huang, Jingyuan, et al.
Published: (2025)
by: Huang, Jingyuan, et al.
Published: (2025)
Evaluating Proactive Risk Awareness of Large Language Models
by: Luo, Xuan, et al.
Published: (2026)
by: Luo, Xuan, et al.
Published: (2026)
How Well Can LLMs Echo Us? Evaluating AI Chatbots' Role-Play Ability with ECHO
by: Ng, Man Tik, et al.
Published: (2024)
by: Ng, Man Tik, et al.
Published: (2024)
FairCoder: Evaluating Social Bias of LLMs in Code Generation
by: Du, Yongkang, et al.
Published: (2025)
by: Du, Yongkang, et al.
Published: (2025)
Where Fact Ends and Fairness Begins: Redefining AI Bias Evaluation through Cognitive Biases
by: Huang, Jen-tse, et al.
Published: (2025)
by: Huang, Jen-tse, et al.
Published: (2025)
The Chameleon's Limit: Investigating Persona Collapse and Homogenization in Large Language Models
by: Xiao, Yunze, et al.
Published: (2026)
by: Xiao, Yunze, et al.
Published: (2026)
What do Language Models Learn and When? The Implicit Curriculum Hypothesis
by: Liu, Emmy, et al.
Published: (2026)
by: Liu, Emmy, et al.
Published: (2026)
The PIMMUR Principles: Ensuring Validity in Collective Behavior of LLM Societies
by: Zhou, Jiaxu, et al.
Published: (2025)
by: Zhou, Jiaxu, et al.
Published: (2025)
Artificial Intolerance: Stigmatizing Language in Clinical Documentation Skews Large Language Model Decision-Making
by: Huang, Jen-tse, et al.
Published: (2026)
by: Huang, Jen-tse, et al.
Published: (2026)
BackportBench: A Multilingual Benchmark for Automated Backporting of Patches
by: Zhong, Zhiqing, et al.
Published: (2025)
by: Zhong, Zhiqing, et al.
Published: (2025)
Diversity-Enhanced Reasoning for Subjective Questions
by: Wang, Yumeng, et al.
Published: (2025)
by: Wang, Yumeng, et al.
Published: (2025)
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
by: Liu, Xiaoyuan, et al.
Published: (2025)
by: Liu, Xiaoyuan, et al.
Published: (2025)
Understanding and Mitigating the Uncertainty in Zero-Shot Translation
by: Wang, Wenxuan, et al.
Published: (2022)
by: Wang, Wenxuan, et al.
Published: (2022)
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
by: Liang, Tian, et al.
Published: (2023)
by: Liang, Tian, et al.
Published: (2023)
ComboBench: Can LLMs Manipulate Physical Devices to Play Virtual Reality Games?
by: Li, Shuqing, et al.
Published: (2025)
by: Li, Shuqing, et al.
Published: (2025)
AgentDisCo: Towards Disentanglement and Collaboration in Open-ended Deep Research Agents
by: Jin, Jiarui, et al.
Published: (2026)
by: Jin, Jiarui, et al.
Published: (2026)
Similar Items
-
GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher
by: Yuan, Youliang, et al.
Published: (2023) -
LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models
by: Wan, Yuxuan, et al.
Published: (2024) -
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
by: Yuan, Youliang, et al.
Published: (2024) -
Can't See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs
by: Wang, Wenxuan, et al.
Published: (2025) -
Insight Over Sight: Exploring the Vision-Knowledge Conflicts in Multimodal LLMs
by: Liu, Xiaoyuan, et al.
Published: (2024)