Saved in:
| Main Authors: | Wang, Shaokang, Fu, Pei, Zhang, Ruoceng, Zhang, Shaojie, Xi, Xiuwen, Yang, Jiahui, Qin, Bin, Huang, Ying, Luo, Zhenbo, Luan, Jian |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.18197 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Enhancing Trustworthy GUI Grounding via Self-Critiqued Reinforcement Learning
by: Zhang, Shaojie, et al.
Published: (2025)
by: Zhang, Shaojie, et al.
Published: (2025)
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent
by: Zhang, Shaojie, et al.
Published: (2025)
by: Zhang, Shaojie, et al.
Published: (2025)
Beyond Binary: Reframing GUI Critique as Continuous Semantic Alignment
by: Sun, Yuchen, et al.
Published: (2026)
by: Sun, Yuchen, et al.
Published: (2026)
Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs
by: Zhang, Shaojie, et al.
Published: (2025)
by: Zhang, Shaojie, et al.
Published: (2025)
IMTBench: A Multi-Scenario Cross-Modal Collaborative Evaluation Benchmark for In-Image Machine Translation
by: Lyu, Jiahao, et al.
Published: (2026)
by: Lyu, Jiahao, et al.
Published: (2026)
PatchCue: Enhancing Vision-Language Model Reasoning with Patch-Based Visual Cues
by: Qi, Yukun, et al.
Published: (2026)
by: Qi, Yukun, et al.
Published: (2026)
Q-Mask: Query-driven Causal Masks for Text Anchoring in OCR-Oriented Vision-Language Models
by: Xu, Longwei, et al.
Published: (2026)
by: Xu, Longwei, et al.
Published: (2026)
NaturalGAIA: A Verifiable Benchmark and Hierarchical Framework for Long-Horizon GUI Tasks
by: Zheng, Zihan, et al.
Published: (2025)
by: Zheng, Zihan, et al.
Published: (2025)
Thinking in cocktail party: Chain-of-Thought and reinforcement learning for target speaker automatic speech recognition
by: Zhang, Yiru, et al.
Published: (2025)
by: Zhang, Yiru, et al.
Published: (2025)
TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding
by: Xu, Boshen, et al.
Published: (2025)
by: Xu, Boshen, et al.
Published: (2025)
Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle
by: Zhu, Linghao, et al.
Published: (2025)
by: Zhu, Linghao, et al.
Published: (2025)
MagicGUI: A Foundational Mobile GUI Agent with Scalable Data Pipeline and Reinforcement Fine-tuning
by: Tang, Liujian, et al.
Published: (2025)
by: Tang, Liujian, et al.
Published: (2025)
EMO-R3: Reflective Reinforcement Learning for Emotional Reasoning in Multimodal Large Language Models
by: Fang, Yiyang, et al.
Published: (2026)
by: Fang, Yiyang, et al.
Published: (2026)
Scaling, Benchmarking, and Reasoning of Vision-Language Agents for Mobile GUI Navigation
by: Qu, Heng, et al.
Published: (2026)
by: Qu, Heng, et al.
Published: (2026)
GUI-Shift: Enhancing VLM-Based GUI Agents through Self-supervised Reinforcement Learning
by: Gao, Longxi, et al.
Published: (2025)
by: Gao, Longxi, et al.
Published: (2025)
HomeFlow: A Data Flywheel for Smart Home Agent Training with Verifiable Simulation
by: Gu, Yi, et al.
Published: (2026)
by: Gu, Yi, et al.
Published: (2026)
GUI-PRA: Process Reward Agent for GUI Tasks
by: Xiong, Tao, et al.
Published: (2025)
by: Xiong, Tao, et al.
Published: (2025)
GUI Testing Arena: A Unified Benchmark for Advancing Autonomous GUI Testing Agent
by: Zhao, Kangjia, et al.
Published: (2024)
by: Zhao, Kangjia, et al.
Published: (2024)
AgenticQwen: Training Small Agentic Language Models with Dual Data Flywheels for Industrial-Scale Tool Use
by: Lyu, Yuanjie, et al.
Published: (2026)
by: Lyu, Yuanjie, et al.
Published: (2026)
Video DataFlywheel: Resolving the Impossible Data Trinity in Video-Language Understanding
by: Wang, Xiao, et al.
Published: (2024)
by: Wang, Xiao, et al.
Published: (2024)
Visual Test-time Scaling for GUI Agent Grounding
by: Luo, Tiange, et al.
Published: (2025)
by: Luo, Tiange, et al.
Published: (2025)
GAIA
by: Ernesto Cardenal
Published: (2007)
by: Ernesto Cardenal
Published: (2007)
LatexBlend: Scaling Multi-concept Customized Generation with Latent Textual Blending
by: Jin, Jian, et al.
Published: (2025)
by: Jin, Jian, et al.
Published: (2025)
Test-Time Training Done Right
by: Zhang, Tianyuan, et al.
Published: (2025)
by: Zhang, Tianyuan, et al.
Published: (2025)
Learning Regularities from Data using Spiking Functions: A Theory
by: Zhang, Canlin, et al.
Published: (2024)
by: Zhang, Canlin, et al.
Published: (2024)
Doc-V*:Coarse-to-Fine Interactive Visual Reasoning for Multi-Page Document VQA
by: Zheng, Yuanlei, et al.
Published: (2026)
by: Zheng, Yuanlei, et al.
Published: (2026)
Multiple q-zeta values and traces
by: Qin, Zhenbo
Published: (2025)
by: Qin, Zhenbo
Published: (2025)
CharacterFlywheel: Scaling Iterative Improvement of Engaging and Steerable LLMs in Production
by: Nie, Yixin, et al.
Published: (2026)
by: Nie, Yixin, et al.
Published: (2026)
Pricing, Mergers, and Regulation Under the AI Flywheel Effect
by: Yuzhou Chen, et al.
Published: (2025)
by: Yuzhou Chen, et al.
Published: (2025)
Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
by: Tan, Wenhui, et al.
Published: (2025)
by: Tan, Wenhui, et al.
Published: (2025)
GTA1: GUI Test-time Scaling Agent
by: Yang, Yan, et al.
Published: (2025)
by: Yang, Yan, et al.
Published: (2025)
Restoring Exploration after Post-Training: Latent Exploration Decoding for Large Reasoning Models
by: Tan, Wenhui, et al.
Published: (2026)
by: Tan, Wenhui, et al.
Published: (2026)
BacktrackAgent: Enhancing GUI Agent with Error Detection and Backtracking Mechanism
by: Wu, Qinzhuo, et al.
Published: (2025)
by: Wu, Qinzhuo, et al.
Published: (2025)
AutoLink: Autonomous Schema Exploration and Expansion for Scalable Schema Linking in Text-to-SQL at Scale
by: Wang, Ziyang, et al.
Published: (2025)
by: Wang, Ziyang, et al.
Published: (2025)
Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding
by: Wang, Ye, et al.
Published: (2025)
by: Wang, Ye, et al.
Published: (2025)
DAJ: Data-Reweighted LLM Judge for Test-Time Scaling in Code Generation
by: Qin, Peijia, et al.
Published: (2026)
by: Qin, Peijia, et al.
Published: (2026)
Video-OPD: Efficient Post-Training of Multimodal Large Language Models for Temporal Video Grounding via On-Policy Distillation
by: Li, Jiaze, et al.
Published: (2026)
by: Li, Jiaze, et al.
Published: (2026)
Deep Reinforcement Learning for Automated Web GUI Testing
by: Gu, Zhiyu, et al.
Published: (2025)
by: Gu, Zhiyu, et al.
Published: (2025)
Agent-in-the-Loop: A Data Flywheel for Continuous Improvement in LLM-based Customer Support
by: Zhao, Cen Mia, et al.
Published: (2025)
by: Zhao, Cen Mia, et al.
Published: (2025)
Listening with Time: Precise Temporal Awareness for Long-Form Audio Understanding
by: Shao, Mingchen, et al.
Published: (2026)
by: Shao, Mingchen, et al.
Published: (2026)
Similar Items
-
Enhancing Trustworthy GUI Grounding via Self-Critiqued Reinforcement Learning
by: Zhang, Shaojie, et al.
Published: (2025) -
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent
by: Zhang, Shaojie, et al.
Published: (2025) -
Beyond Binary: Reframing GUI Critique as Continuous Semantic Alignment
by: Sun, Yuchen, et al.
Published: (2026) -
Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs
by: Zhang, Shaojie, et al.
Published: (2025) -
IMTBench: A Multi-Scenario Cross-Modal Collaborative Evaluation Benchmark for In-Image Machine Translation
by: Lyu, Jiahao, et al.
Published: (2026)