:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhao, Haoren, Chen, Tianyi, Wang, Zhen
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2504.04716
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

WinDeskGround: A Benchmark for Robust GUI Grounding in Complex Multi-Window Desktop Environments
by: Zhao, Haoren, et al.
Published: (2026)

POINTS-GUI-G: GUI-Grounding Journey
by: Zhao, Zhongyin, et al.
Published: (2026)

SparkUI-Parser: Enhancing GUI Perception with Robust Grounding and Parsing
by: Jing, Hongyi, et al.
Published: (2025)

Generalist Scanner Meets Specialist Locator: A Synergistic Coarse-to-Fine Framework for Robust GUI Grounding
by: Li, Zhecheng, et al.
Published: (2025)

Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks
by: Xie, Peng, et al.
Published: (2024)

AutoGUI: Scaling GUI Grounding with Automatic Functionality Annotations from LLMs
by: Li, Hongxin, et al.
Published: (2025)

Phi-Ground Tech Report: Advancing Perception in GUI Grounding
by: Zhang, Miaosen, et al.
Published: (2025)

Attack Deterministic Conditional Image Generative Models for Diverse and Controllable Generation
by: Chu, Tianyi, et al.
Published: (2024)

GoClick: Lightweight Element Grounding Model for Autonomous GUI Interaction
by: Li, Hongxin, et al.
Published: (2026)

Towards GUI Agents: Vision-Language Diffusion Models for GUI Grounding
by: Kumbhar, Shrinidhi, et al.
Published: (2026)

GUI-C$^2$: Coarse-to-Fine GUI Grounding via Difficulty-Aware Reinforcement Learning
by: Li, Junlong, et al.
Published: (2026)

Learning GUI Grounding with Spatial Reasoning from Visual Feedback
by: Zhao, Yu, et al.
Published: (2025)

Robust Identity Perceptual Watermark Against Deepfake Face Swapping
by: Wang, Tianyi, et al.
Published: (2023)

Trifuse: Enhancing Attention-Based GUI Grounding via Multimodal Fusion
by: Ma, Longhui, et al.
Published: (2026)

GUI-ARP: Enhancing Grounding with Adaptive Region Perception for GUI Agents
by: Ye, Xianhang, et al.
Published: (2025)

How Auxiliary Reasoning Unleashes GUI Grounding in VLMs
by: Li, Weiming, et al.
Published: (2025)

Robustness of Vision Language Models Against Split-Image Harmful Input Attacks
by: Rashid, Md Rafi Ur, et al.
Published: (2026)

Robust Spiking Neural Networks Against Adversarial Attacks
by: Wang, Shuai, et al.
Published: (2026)

Attention-driven GUI Grounding: Leveraging Pretrained Multimodal Large Language Models without Fine-Tuning
by: Xu, Hai-Ming, et al.
Published: (2024)

Unified Prompt Attack Against Text-to-Image Generation Models
by: Peng, Duo, et al.
Published: (2025)

AdaZoom-GUI: Adaptive Zoom-based GUI Grounding with Instruction Refinement
by: Pei, Siqi, et al.
Published: (2026)

GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
by: Wu, Qianhui, et al.
Published: (2025)

Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding
by: Fan, Yue, et al.
Published: (2024)

MVP: Multiple View Prediction Improves GUI Grounding
by: Zhang, Yunzhu, et al.
Published: (2025)

GUI-G$^2$: Gaussian Reward Modeling for GUI Grounding
by: Tang, Fei, et al.
Published: (2025)

SIFT-Graph: Benchmarking Multimodal Defense Against Image Adversarial Attacks With Robust Feature Graph
by: He, Jingjie, et al.
Published: (2025)

GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration
by: Fan, Yue, et al.
Published: (2025)

R-VLM: Region-Aware Vision Language Model for Precise GUI Grounding
by: Park, Joonhyung, et al.
Published: (2025)

ResGuard: Enhancing Robustness Against Known Original Attacks in Deep Watermarking
by: Wang, Hanyi, et al.
Published: (2026)

Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models
by: Zhao, Ruisi, et al.
Published: (2026)

Enhancing Trustworthy GUI Grounding via Self-Critiqued Reinforcement Learning
by: Zhang, Shaojie, et al.
Published: (2025)

Towards Unified Robustness Against Both Backdoor and Adversarial Attacks
by: Niu, Zhenxing, et al.
Published: (2024)

Reproducibility Study on Adversarial Attacks Against Robust Transformer Trackers
by: Nokabadi, Fatemeh Nourilenjan, et al.
Published: (2024)

Robust and Transferable Backdoor Attacks Against Deep Image Compression With Selective Frequency Prior
by: Yu, Yi, et al.
Published: (2024)

Membership Inference Attack Against Masked Image Modeling
by: Li, Zheng, et al.
Published: (2024)

UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning
by: Chen, Liangyu, et al.
Published: (2025)

Improving GUI Grounding with Explicit Position-to-Coordinate Mapping
by: Wang, Suyuchen, et al.
Published: (2025)

\textsc{GUI-Spotlight}: Adaptive Iterative Focus Refinement for Enhanced GUI Visual Grounding
by: Lei, Bin, et al.
Published: (2025)

BAMI: Training-Free Bias Mitigation in GUI Grounding
by: Zhang, Borui, et al.
Published: (2026)

Towards Robust Content Watermarking Against Removal and Forgery Attacks
by: Zhu, Yifan, et al.
Published: (2026)