Saved in:
| Main Authors: | Ma, Xinbei, Ma, Ruotian, Chen, Xingyu, Shi, Zhengliang, Wang, Mengru, Huang, Jen-tse, Yang, Qu, Wang, Wenxuan, Ye, Fanghua, Jiang, Qingxuan, Zhou, Mengfei, Zhang, Zhuosheng, Wang, Rui, Zhao, Hai, Tu, Zhaopeng, Li, Xiaolong, Linus |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.26126 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Social Welfare Function Leaderboard: When LLM Agents Allocate Social Welfare
by: Shi, Zhengliang, et al.
Published: (2025)
by: Shi, Zhengliang, et al.
Published: (2025)
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains
by: Yi, Zihao, et al.
Published: (2025)
by: Yi, Zihao, et al.
Published: (2025)
BatonVoice: An Operationalist Framework for Enhancing Controllable Speech Synthesis with Linguistic Intelligence from LLMs
by: Wang, Yue, et al.
Published: (2025)
by: Wang, Yue, et al.
Published: (2025)
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models
by: Zhang, Bang, et al.
Published: (2025)
by: Zhang, Bang, et al.
Published: (2025)
Insight Over Sight: Exploring the Vision-Knowledge Conflicts in Multimodal LLMs
by: Liu, Xiaoyuan, et al.
Published: (2024)
by: Liu, Xiaoyuan, et al.
Published: (2024)
CoCo-Agent: A Comprehensive Cognitive MLLM Agent for Smartphone GUI Automation
by: Ma, Xinbei, et al.
Published: (2024)
by: Ma, Xinbei, et al.
Published: (2024)
On the Shortcut Learning in Multilingual Neural Machine Translation
by: Wang, Wenxuan, et al.
Published: (2024)
by: Wang, Wenxuan, et al.
Published: (2024)
RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents
by: Wang, Peisong, et al.
Published: (2025)
by: Wang, Peisong, et al.
Published: (2025)
Think Fast and Slow: Step-Level Cognitive Depth Adaptation for LLM Agents
by: Yang, Ruihan, et al.
Published: (2026)
by: Yang, Ruihan, et al.
Published: (2026)
Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training
by: Wang, Mengru, et al.
Published: (2025)
by: Wang, Mengru, et al.
Published: (2025)
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
by: Chen, Jiaqi, et al.
Published: (2025)
by: Chen, Jiaqi, et al.
Published: (2025)
GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher
by: Yuan, Youliang, et al.
Published: (2023)
by: Yuan, Youliang, et al.
Published: (2023)
Chain-of-Jailbreak Attack for Image Generation Models via Editing Step by Step
by: Wang, Wenxuan, et al.
Published: (2024)
by: Wang, Wenxuan, et al.
Published: (2024)
Can't See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs
by: Wang, Wenxuan, et al.
Published: (2025)
by: Wang, Wenxuan, et al.
Published: (2025)
Not All Countries Celebrate Thanksgiving: On the Cultural Dominance in Large Language Models
by: Wang, Wenxuan, et al.
Published: (2023)
by: Wang, Wenxuan, et al.
Published: (2023)
All Languages Matter: On the Multilingual Safety of Large Language Models
by: Wang, Wenxuan, et al.
Published: (2023)
by: Wang, Wenxuan, et al.
Published: (2023)
MEGen: Generative Backdoor into Large Language Models via Model Editing
by: Qiu, Jiyang, et al.
Published: (2024)
by: Qiu, Jiyang, et al.
Published: (2024)
Plan-over-Graph: Towards Parallelable LLM Agent Schedule
by: Zhang, Shiqi, et al.
Published: (2025)
by: Zhang, Shiqi, et al.
Published: (2025)
Chain-of-Trigger: An Agentic Backdoor that Paradoxically Enhances Agentic Robustness
by: Qiu, Jiyang, et al.
Published: (2025)
by: Qiu, Jiyang, et al.
Published: (2025)
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
by: Yuan, Youliang, et al.
Published: (2024)
by: Yuan, Youliang, et al.
Published: (2024)
How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments
by: Huang, Jen-tse, et al.
Published: (2024)
by: Huang, Jen-tse, et al.
Published: (2024)
On the Robustness of Editing Large Language Models
by: Ma, Xinbei, et al.
Published: (2024)
by: Ma, Xinbei, et al.
Published: (2024)
Caution for the Environment: Multimodal LLM Agents are Susceptible to Environmental Distractions
by: Ma, Xinbei, et al.
Published: (2024)
by: Ma, Xinbei, et al.
Published: (2024)
On the Failure of Latent State Persistence in Large Language Models
by: Huang, Jen-tse, et al.
Published: (2025)
by: Huang, Jen-tse, et al.
Published: (2025)
Agent-Dice: Disentangling Knowledge Updates via Geometric Consensus for Agent Continual Learning
by: Wu, Zheng, et al.
Published: (2026)
by: Wu, Zheng, et al.
Published: (2026)
Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench
by: Huang, Jen-tse, et al.
Published: (2023)
by: Huang, Jen-tse, et al.
Published: (2023)
CogDual: Enhancing Dual Cognition of LLMs via Reinforcement Learning with Implicit Rule-Based Rewards
by: Liu, Cheng, et al.
Published: (2025)
by: Liu, Cheng, et al.
Published: (2025)
How Deep is Love in LLMs' Hearts? Exploring Semantic Size in Human-like Cognition
by: Yao, Yao, et al.
Published: (2025)
by: Yao, Yao, et al.
Published: (2025)
Who is ChatGPT? Benchmarking LLMs' Psychological Portrayal Using PsychoBench
by: Huang, Jen-tse, et al.
Published: (2023)
by: Huang, Jen-tse, et al.
Published: (2023)
Houston Food Bank's Hunger Game Thrives on Competition
Published: (2024)
Published: (2024)
Identifying the Achilles' Heel: An Iterative Method for Dynamically Uncovering Factual Errors in Large Language Models
by: Wang, Wenxuan, et al.
Published: (2024)
by: Wang, Wenxuan, et al.
Published: (2024)
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
by: Wang, Yue, et al.
Published: (2025)
by: Wang, Yue, et al.
Published: (2025)
ComboBench: Can LLMs Manipulate Physical Devices to Play Virtual Reality Games?
by: Li, Shuqing, et al.
Published: (2025)
by: Li, Shuqing, et al.
Published: (2025)
RaSA: Rank-Sharing Low-Rank Adaptation
by: He, Zhiwei, et al.
Published: (2025)
by: He, Zhiwei, et al.
Published: (2025)
Improving Machine Translation with Human Feedback: An Exploration of Quality Estimation as a Reward Model
by: He, Zhiwei, et al.
Published: (2024)
by: He, Zhiwei, et al.
Published: (2024)
VisBias: Measuring Explicit and Implicit Social Biases in Vision Language Models
by: Huang, Jen-tse, et al.
Published: (2025)
by: Huang, Jen-tse, et al.
Published: (2025)
AI Sees Your Location, But With A Bias Toward The Wealthy World
by: Huang, Jingyuan, et al.
Published: (2025)
by: Huang, Jingyuan, et al.
Published: (2025)
The Lighthouse of Language: Enhancing LLM Agents via Critique-Guided Improvement
by: Yang, Ruihan, et al.
Published: (2025)
by: Yang, Ruihan, et al.
Published: (2025)
Understanding and Mitigating the Uncertainty in Zero-Shot Translation
by: Wang, Wenxuan, et al.
Published: (2022)
by: Wang, Wenxuan, et al.
Published: (2022)
Plan-MCTS: Plan Exploration for Action Exploitation in Web Navigation
by: Zhang, Weiming, et al.
Published: (2026)
by: Zhang, Weiming, et al.
Published: (2026)
Similar Items
-
Social Welfare Function Leaderboard: When LLM Agents Allocate Social Welfare
by: Shi, Zhengliang, et al.
Published: (2025) -
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains
by: Yi, Zihao, et al.
Published: (2025) -
BatonVoice: An Operationalist Framework for Enhancing Controllable Speech Synthesis with Linguistic Intelligence from LLMs
by: Wang, Yue, et al.
Published: (2025) -
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models
by: Zhang, Bang, et al.
Published: (2025) -
Insight Over Sight: Exploring the Vision-Knowledge Conflicts in Multimodal LLMs
by: Liu, Xiaoyuan, et al.
Published: (2024)