Saved in:
| Main Authors: | Guo, Yuan, Miao, Tingjia, Wu, Zheng, Cheng, Pengzhou, Zhou, Ming, Zhang, Zhuosheng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.08972 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents
by: Cheng, Pengzhou, et al.
Published: (2025)
by: Cheng, Pengzhou, et al.
Published: (2025)
GEM: Gaussian Embedding Modeling for Out-of-Distribution Detection in GUI Agents
by: Wu, Zheng, et al.
Published: (2025)
by: Wu, Zheng, et al.
Published: (2025)
Say One Thing, Do Another? Diagnosing Reasoning-Execution Gaps in VLM-Powered Mobile-Use Agents
by: Dong, Lingzhong, et al.
Published: (2025)
by: Dong, Lingzhong, et al.
Published: (2025)
Smoothing Grounding and Reasoning for MLLM-Powered GUI Agents with Query-Oriented Pivot Tasks
by: Wu, Zongru, et al.
Published: (2025)
by: Wu, Zongru, et al.
Published: (2025)
Acquiring Clean Language Models from Backdoor Poisoned Datasets by Downscaling Frequency Space
by: Wu, Zongru, et al.
Published: (2024)
by: Wu, Zongru, et al.
Published: (2024)
Gracefully Filtering Backdoor Samples for Generative Large Language Models without Retraining
by: Wu, Zongru, et al.
Published: (2024)
by: Wu, Zongru, et al.
Published: (2024)
Agent-ScanKit: Unraveling Memory and Reasoning of Multimodal Agents via Sensitivity Perturbations
by: Cheng, Pengzhou, et al.
Published: (2025)
by: Cheng, Pengzhou, et al.
Published: (2025)
VeriOS: Query-Driven Proactive Human-Agent-GUI Interaction for Trustworthy OS Agents
by: Wu, Zheng, et al.
Published: (2025)
by: Wu, Zheng, et al.
Published: (2025)
OS-SPEAR: A Toolkit for the Safety, Performance,Efficiency, and Robustness Analysis of OS Agents
by: Wu, Zheng, et al.
Published: (2026)
by: Wu, Zheng, et al.
Published: (2026)
See, Think, Act: Teaching Multimodal Agents to Effectively Interact with GUI by Identifying Toggles
by: Wu, Zongru, et al.
Published: (2025)
by: Wu, Zongru, et al.
Published: (2025)
TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in Large Language Models
by: Cheng, Pengzhou, et al.
Published: (2024)
by: Cheng, Pengzhou, et al.
Published: (2024)
SynGhost: Invisible and Universal Task-agnostic Backdoor Attack via Syntactic Transfer
by: Cheng, Pengzhou, et al.
Published: (2024)
by: Cheng, Pengzhou, et al.
Published: (2024)
GuideBench: Benchmarking Domain-Oriented Guideline Following for LLM Agents
by: Diao, Lingxiao, et al.
Published: (2025)
by: Diao, Lingxiao, et al.
Published: (2025)
ScholarSearch: Benchmarking Scholar Searching Ability of LLMs
by: Zhou, Junting, et al.
Published: (2025)
by: Zhou, Junting, et al.
Published: (2025)
Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities
by: Ju, Tianjie, et al.
Published: (2024)
by: Ju, Tianjie, et al.
Published: (2024)
OS-Kairos: Adaptive Interaction for MLLM-Powered GUI Agents
by: Cheng, Pengzhou, et al.
Published: (2025)
by: Cheng, Pengzhou, et al.
Published: (2025)
Faithful Mobile GUI Agents with Guided Advantage Estimator
by: Hu, Haowen, et al.
Published: (2026)
by: Hu, Haowen, et al.
Published: (2026)
Quick on the Uptake: Eliciting Implicit Intents from Human Demonstrations for Personalized Mobile-Use Agents
by: Wu, Zheng, et al.
Published: (2025)
by: Wu, Zheng, et al.
Published: (2025)
On the Adaptive Psychological Persuasion of Large Language Models
by: Ju, Tianjie, et al.
Published: (2025)
by: Ju, Tianjie, et al.
Published: (2025)
When Disagreements Elicit Robustness: Investigating Self-Repair Capabilities under LLM Multi-Agent Disagreements
by: Ju, Tianjie, et al.
Published: (2025)
by: Ju, Tianjie, et al.
Published: (2025)
CoCo-Agent: A Comprehensive Cognitive MLLM Agent for Smartphone GUI Automation
by: Ma, Xinbei, et al.
Published: (2024)
by: Ma, Xinbei, et al.
Published: (2024)
You Only Look at Screens: Multimodal Chain-of-Action Agents
by: Zhang, Zhuosheng, et al.
Published: (2023)
by: Zhang, Zhuosheng, et al.
Published: (2023)
Measuring Bargaining Abilities of LLMs: A Benchmark and A Buyer-Enhancement Method
by: Xia, Tian, et al.
Published: (2024)
by: Xia, Tian, et al.
Published: (2024)
ColorBench: Benchmarking Mobile Agents with Graph-Structured Framework for Complex Long-Horizon Tasks
by: Song, Yuanyi, et al.
Published: (2025)
by: Song, Yuanyi, et al.
Published: (2025)
Agent-Dice: Disentangling Knowledge Updates via Geometric Consensus for Agent Continual Learning
by: Wu, Zheng, et al.
Published: (2026)
by: Wu, Zheng, et al.
Published: (2026)
DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems
by: Zou, Anni, et al.
Published: (2024)
by: Zou, Anni, et al.
Published: (2024)
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents
by: Yuan, Tongxin, et al.
Published: (2024)
by: Yuan, Tongxin, et al.
Published: (2024)
MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive and MCP-Augmented Environments
by: Kong, Quyu, et al.
Published: (2025)
by: Kong, Quyu, et al.
Published: (2025)
Caution for the Environment: Multimodal LLM Agents are Susceptible to Environmental Distractions
by: Ma, Xinbei, et al.
Published: (2024)
by: Ma, Xinbei, et al.
Published: (2024)
LLMSYS-HPOBench: Hyperparameter Optimization Benchmark Suite for Real-World LLM Systems
by: Wu, Siyu, et al.
Published: (2026)
by: Wu, Siyu, et al.
Published: (2026)
Disagreements in Reasoning: How a Model's Thinking Process Dictates Persuasion in Multi-Agent Systems
by: Zhao, Haodong, et al.
Published: (2025)
by: Zhao, Haodong, et al.
Published: (2025)
GUI-CIDER: Mid-training GUI Agents via Causal Internalization and Density-aware Exemplar Reselection
by: Wu, Zheng, et al.
Published: (2026)
by: Wu, Zheng, et al.
Published: (2026)
Mitigating Misleading Chain-of-Thought Reasoning with Selective Filtering
by: Wu, Yexin, et al.
Published: (2024)
by: Wu, Yexin, et al.
Published: (2024)
On the Overscaling Curse of Parallel Thinking: System Efficacy Contradicts Sample Efficiency
by: Wang, Yiming, et al.
Published: (2026)
by: Wang, Yiming, et al.
Published: (2026)
AgentIF-OneDay: A Task-level Instruction-Following Benchmark for General AI Agents in Daily Scenarios
by: Chen, Kaiyuan, et al.
Published: (2026)
by: Chen, Kaiyuan, et al.
Published: (2026)
OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows
by: Sun, Qiushi, et al.
Published: (2025)
by: Sun, Qiushi, et al.
Published: (2025)
Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents
by: Deng, Shihan, et al.
Published: (2024)
by: Deng, Shihan, et al.
Published: (2024)
Investigating Multi-Hop Factual Shortcuts in Knowledge Editing of Large Language Models
by: Ju, Tianjie, et al.
Published: (2024)
by: Ju, Tianjie, et al.
Published: (2024)
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
by: Wang, Junyang, et al.
Published: (2024)
by: Wang, Junyang, et al.
Published: (2024)
Bilingual Text-to-Motion Generation: A New Benchmark and Baselines
by: Weng, Wanjiang, et al.
Published: (2026)
by: Weng, Wanjiang, et al.
Published: (2026)
Similar Items
-
Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents
by: Cheng, Pengzhou, et al.
Published: (2025) -
GEM: Gaussian Embedding Modeling for Out-of-Distribution Detection in GUI Agents
by: Wu, Zheng, et al.
Published: (2025) -
Say One Thing, Do Another? Diagnosing Reasoning-Execution Gaps in VLM-Powered Mobile-Use Agents
by: Dong, Lingzhong, et al.
Published: (2025) -
Smoothing Grounding and Reasoning for MLLM-Powered GUI Agents with Query-Oriented Pivot Tasks
by: Wu, Zongru, et al.
Published: (2025) -
Acquiring Clean Language Models from Backdoor Poisoned Datasets by Downscaling Frequency Space
by: Wu, Zongru, et al.
Published: (2024)