:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xu, Haolei, Hong, Haiwen, Li, Hongxing, Zhou, Rui, Zhang, Yang, Huang, Longtao, Xue, Hui, Shen, Yongliang, Lu, Weiming, Zhuang, Yueting
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2604.08541
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering
by: Xu, Haolei, et al.
Published: (2025)

Let LRMs Break Free from Overthinking via Self-Braking Tuning
by: Zhao, Haoran, et al.
Published: (2025)

Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning
by: Xu, Haolei, et al.
Published: (2025)

Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow
by: Zhang, Wenqi, et al.
Published: (2023)

Dynamic Mixture of Curriculum LoRA Experts for Continual Multimodal Instruction Tuning
by: Ge, Chendi, et al.
Published: (2025)

SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models
by: Li, Hongxing, et al.
Published: (2025)

ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents
by: Tang, Fei, et al.
Published: (2026)

Think Twice, Click Once: Enhancing GUI Grounding via Fast and Slow Systems
by: Tang, Fei, et al.
Published: (2025)

How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities
by: Xu, Ziwen, et al.
Published: (2026)

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
by: Zhang, Wenqi, et al.
Published: (2025)

Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning
by: Wang, Aozhe, et al.
Published: (2026)

Milestone-Guided Policy Learning for Long-Horizon Language Agents
by: Wang, Zixuan, et al.
Published: (2026)

Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives
by: Zhang, Wenqi, et al.
Published: (2024)

How LoRA Remembers? A Parametric Memory Law for LLM Finetuning
by: Xu, Ziwen, et al.
Published: (2026)

GroundAct: Can LLM Agents Ground Actions in Environmental States?
by: Wang, Zixuan, et al.
Published: (2025)

ViewSpatial-Bench: Evaluating Multi-perspective Spatial Localization in Vision-Language Models
by: Li, Dingming, et al.
Published: (2025)

DB-Explore: Automated Database Exploration and Instruction Synthesis for Text-to-SQL
by: Ma, Haoyuan, et al.
Published: (2025)

SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation
by: Chen, Siqi, et al.
Published: (2025)

A Survey on (M)LLM-Based GUI Agents
by: Tang, Fei, et al.
Published: (2025)

InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models
by: Yan, Yuchen, et al.
Published: (2025)

GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation
by: Tang, Fei, et al.
Published: (2024)

TaskBench: Benchmarking Large Language Models for Task Automation
by: Shen, Yongliang, et al.
Published: (2023)

Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model
by: Zhang, Wenqi, et al.
Published: (2024)

UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding
by: Tang, Fei, et al.
Published: (2026)

InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning
by: Yan, Yuchen, et al.
Published: (2026)

GUI-G$^2$: Gaussian Reward Modeling for GUI Grounding
by: Tang, Fei, et al.
Published: (2025)

LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization
by: Wu, Xingyu, et al.
Published: (2025)

Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization
by: Zhang, Wenqi, et al.
Published: (2024)

Hierarchical Budget Policy Optimization for Adaptive Reasoning
by: Lyu, Shangke, et al.
Published: (2025)

simpleposter: a simple baseline for product poster generation
by: Cui, Benlei, et al.
Published: (2026)

LD-MoLE: Learnable Dynamic Routing for Mixture of LoRA Experts
by: Zhuang, Yuan, et al.
Published: (2025)

Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
by: Zhang, Wenqi, et al.
Published: (2025)

Maximum Score Routing For Mixture-of-Experts
by: Dong, Bowen, et al.
Published: (2025)

Insert or Attach: Taxonomy Completion via Box Embedding
by: Xue, Wei, et al.
Published: (2023)

MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models
by: Shen, Leyang, et al.
Published: (2024)

GSM8K-V: Can Vision Language Models Solve Grade School Math Word Problems in Visual Contexts
by: Yuan, Fan, et al.
Published: (2025)

TC-Padé: Trajectory-Consistent Padé Approximation for Diffusion Acceleration
by: Cui, Benlei, et al.
Published: (2026)

Diffusion Probe: Generated Image Result Prediction Using CNN Probes
by: Cui, Benlei, et al.
Published: (2026)

Self-Distilled Agentic Reinforcement Learning
by: Lu, Zhengxi, et al.
Published: (2026)

MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts
by: Su, Zhenpeng, et al.
Published: (2024)