:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yuan, Youliang, Jiao, Wenxiang, Xie, Yuejin, Shen, Chihao, Tian, Menghan, Wang, Wenxuan, Huang, Jen-tse, He, Pinjia
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2505.17455
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher
by: Yuan, Youliang, et al.
Published: (2023)

LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models
by: Wan, Yuxuan, et al.
Published: (2024)

Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
by: Yuan, Youliang, et al.
Published: (2024)

Can't See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs
by: Wang, Wenxuan, et al.
Published: (2025)

Insight Over Sight: Exploring the Vision-Knowledge Conflicts in Multimodal LLMs
by: Liu, Xiaoyuan, et al.
Published: (2024)

All Languages Matter: On the Multilingual Safety of Large Language Models
by: Wang, Wenxuan, et al.
Published: (2023)

Curing Miracle Steps in LLM Mathematical Reasoning with Rubric Rewards
by: Yuan, Youliang, et al.
Published: (2025)

PaSBench-Video: A Streaming Video Benchmark for Proactive Safety Warning
by: Zhao, Yusong, et al.
Published: (2026)

Chain-of-Jailbreak Attack for Image Generation Models via Editing Step by Step
by: Wang, Wenxuan, et al.
Published: (2024)

Human Cognitive Benchmarks Reveal Foundational Visual Gaps in MLLMs
by: Huang, Jen-Tse, et al.
Published: (2025)

VisBias: Measuring Explicit and Implicit Social Biases in Vision Language Models
by: Huang, Jen-tse, et al.
Published: (2025)

On the Shortcut Learning in Multilingual Neural Machine Translation
by: Wang, Wenxuan, et al.
Published: (2024)

Revisiting the Reliability of Psychological Scales on Large Language Models
by: Huang, Jen-tse, et al.
Published: (2023)

How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments
by: Huang, Jen-tse, et al.
Published: (2024)

Not All Countries Celebrate Thanksgiving: On the Cultural Dominance in Large Language Models
by: Wang, Wenxuan, et al.
Published: (2023)

Identifying the Achilles' Heel: An Iterative Method for Dynamically Uncovering Factual Errors in Large Language Models
by: Wang, Wenxuan, et al.
Published: (2024)

Who is ChatGPT? Benchmarking LLMs' Psychological Portrayal Using PsychoBench
by: Huang, Jen-tse, et al.
Published: (2023)

Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs
by: Zhao, Sihang, et al.
Published: (2024)

Learning to Ask: When LLM Agents Meet Unclear Instruction
by: Wang, Wenxuan, et al.
Published: (2024)

On the Failure of Latent State Persistence in Large Language Models
by: Huang, Jen-tse, et al.
Published: (2025)

Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench
by: Huang, Jen-tse, et al.
Published: (2023)

Probing Multimodal Large Language Models on Cognitive Biases in Chinese Short-Video Misinformation
by: Huang, Jen-tse, et al.
Published: (2026)

New Job, New Gender? Measuring the Social Bias in Image Generation Models
by: Wang, Wenxuan, et al.
Published: (2024)

SHAPE: Unifying Safety, Helpfulness and Pedagogy for Educational LLMs
by: Zhao, Sihang, et al.
Published: (2026)

AI Sees Your Location, But With A Bias Toward The Wealthy World
by: Huang, Jingyuan, et al.
Published: (2025)

Evaluating Proactive Risk Awareness of Large Language Models
by: Luo, Xuan, et al.
Published: (2026)

How Well Can LLMs Echo Us? Evaluating AI Chatbots' Role-Play Ability with ECHO
by: Ng, Man Tik, et al.
Published: (2024)

FairCoder: Evaluating Social Bias of LLMs in Code Generation
by: Du, Yongkang, et al.
Published: (2025)

Where Fact Ends and Fairness Begins: Redefining AI Bias Evaluation through Cognitive Biases
by: Huang, Jen-tse, et al.
Published: (2025)

The Chameleon's Limit: Investigating Persona Collapse and Homogenization in Large Language Models
by: Xiao, Yunze, et al.
Published: (2026)

What do Language Models Learn and When? The Implicit Curriculum Hypothesis
by: Liu, Emmy, et al.
Published: (2026)

The PIMMUR Principles: Ensuring Validity in Collective Behavior of LLM Societies
by: Zhou, Jiaxu, et al.
Published: (2025)

Artificial Intolerance: Stigmatizing Language in Clinical Documentation Skews Large Language Model Decision-Making
by: Huang, Jen-tse, et al.
Published: (2026)

BackportBench: A Multilingual Benchmark for Automated Backporting of Patches
by: Zhong, Zhiqing, et al.
Published: (2025)

Diversity-Enhanced Reasoning for Subjective Questions
by: Wang, Yumeng, et al.
Published: (2025)

Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
by: Liu, Xiaoyuan, et al.
Published: (2025)

Understanding and Mitigating the Uncertainty in Zero-Shot Translation
by: Wang, Wenxuan, et al.
Published: (2022)

Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
by: Liang, Tian, et al.
Published: (2023)

ComboBench: Can LLMs Manipulate Physical Devices to Play Virtual Reality Games?
by: Li, Shuqing, et al.
Published: (2025)

AgentDisCo: Towards Disentanglement and Collaboration in Open-ended Deep Research Agents
by: Jin, Jiarui, et al.
Published: (2026)