:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Guo, Yuan, Miao, Tingjia, Wu, Zheng, Cheng, Pengzhou, Zhou, Ming, Zhang, Zhuosheng
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2506.08972
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents
by: Cheng, Pengzhou, et al.
Published: (2025)

GEM: Gaussian Embedding Modeling for Out-of-Distribution Detection in GUI Agents
by: Wu, Zheng, et al.
Published: (2025)

Say One Thing, Do Another? Diagnosing Reasoning-Execution Gaps in VLM-Powered Mobile-Use Agents
by: Dong, Lingzhong, et al.
Published: (2025)

Smoothing Grounding and Reasoning for MLLM-Powered GUI Agents with Query-Oriented Pivot Tasks
by: Wu, Zongru, et al.
Published: (2025)

Acquiring Clean Language Models from Backdoor Poisoned Datasets by Downscaling Frequency Space
by: Wu, Zongru, et al.
Published: (2024)

Gracefully Filtering Backdoor Samples for Generative Large Language Models without Retraining
by: Wu, Zongru, et al.
Published: (2024)

Agent-ScanKit: Unraveling Memory and Reasoning of Multimodal Agents via Sensitivity Perturbations
by: Cheng, Pengzhou, et al.
Published: (2025)

VeriOS: Query-Driven Proactive Human-Agent-GUI Interaction for Trustworthy OS Agents
by: Wu, Zheng, et al.
Published: (2025)

OS-SPEAR: A Toolkit for the Safety, Performance,Efficiency, and Robustness Analysis of OS Agents
by: Wu, Zheng, et al.
Published: (2026)

See, Think, Act: Teaching Multimodal Agents to Effectively Interact with GUI by Identifying Toggles
by: Wu, Zongru, et al.
Published: (2025)

TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in Large Language Models
by: Cheng, Pengzhou, et al.
Published: (2024)

SynGhost: Invisible and Universal Task-agnostic Backdoor Attack via Syntactic Transfer
by: Cheng, Pengzhou, et al.
Published: (2024)

GuideBench: Benchmarking Domain-Oriented Guideline Following for LLM Agents
by: Diao, Lingxiao, et al.
Published: (2025)

ScholarSearch: Benchmarking Scholar Searching Ability of LLMs
by: Zhou, Junting, et al.
Published: (2025)

Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities
by: Ju, Tianjie, et al.
Published: (2024)

OS-Kairos: Adaptive Interaction for MLLM-Powered GUI Agents
by: Cheng, Pengzhou, et al.
Published: (2025)

Faithful Mobile GUI Agents with Guided Advantage Estimator
by: Hu, Haowen, et al.
Published: (2026)

Quick on the Uptake: Eliciting Implicit Intents from Human Demonstrations for Personalized Mobile-Use Agents
by: Wu, Zheng, et al.
Published: (2025)

On the Adaptive Psychological Persuasion of Large Language Models
by: Ju, Tianjie, et al.
Published: (2025)

When Disagreements Elicit Robustness: Investigating Self-Repair Capabilities under LLM Multi-Agent Disagreements
by: Ju, Tianjie, et al.
Published: (2025)

CoCo-Agent: A Comprehensive Cognitive MLLM Agent for Smartphone GUI Automation
by: Ma, Xinbei, et al.
Published: (2024)

You Only Look at Screens: Multimodal Chain-of-Action Agents
by: Zhang, Zhuosheng, et al.
Published: (2023)

Measuring Bargaining Abilities of LLMs: A Benchmark and A Buyer-Enhancement Method
by: Xia, Tian, et al.
Published: (2024)

ColorBench: Benchmarking Mobile Agents with Graph-Structured Framework for Complex Long-Horizon Tasks
by: Song, Yuanyi, et al.
Published: (2025)

Agent-Dice: Disentangling Knowledge Updates via Geometric Consensus for Agent Continual Learning
by: Wu, Zheng, et al.
Published: (2026)

DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems
by: Zou, Anni, et al.
Published: (2024)

R-Judge: Benchmarking Safety Risk Awareness for LLM Agents
by: Yuan, Tongxin, et al.
Published: (2024)

MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive and MCP-Augmented Environments
by: Kong, Quyu, et al.
Published: (2025)

Caution for the Environment: Multimodal LLM Agents are Susceptible to Environmental Distractions
by: Ma, Xinbei, et al.
Published: (2024)

LLMSYS-HPOBench: Hyperparameter Optimization Benchmark Suite for Real-World LLM Systems
by: Wu, Siyu, et al.
Published: (2026)

Disagreements in Reasoning: How a Model's Thinking Process Dictates Persuasion in Multi-Agent Systems
by: Zhao, Haodong, et al.
Published: (2025)

GUI-CIDER: Mid-training GUI Agents via Causal Internalization and Density-aware Exemplar Reselection
by: Wu, Zheng, et al.
Published: (2026)

Mitigating Misleading Chain-of-Thought Reasoning with Selective Filtering
by: Wu, Yexin, et al.
Published: (2024)

On the Overscaling Curse of Parallel Thinking: System Efficacy Contradicts Sample Efficiency
by: Wang, Yiming, et al.
Published: (2026)

AgentIF-OneDay: A Task-level Instruction-Following Benchmark for General AI Agents in Daily Scenarios
by: Chen, Kaiyuan, et al.
Published: (2026)

OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows
by: Sun, Qiushi, et al.
Published: (2025)

Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents
by: Deng, Shihan, et al.
Published: (2024)

Investigating Multi-Hop Factual Shortcuts in Knowledge Editing of Large Language Models
by: Ju, Tianjie, et al.
Published: (2024)

Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
by: Wang, Junyang, et al.
Published: (2024)

Bilingual Text-to-Motion Generation: A New Benchmark and Baselines
by: Weng, Wanjiang, et al.
Published: (2026)