:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lei, Bin, Kang, Weitai, Zhang, Zijian, Chen, Winson, Xie, Xi, Zuo, Shan, Xie, Mimi, Payani, Ali, Hong, Mingyi, Yan, Yan, Ding, Caiwen
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2505.10887
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems
by: Lei, Bin, et al.
Published: (2024)

StitchCUDA: An Automated Multi-Agents End-to-End GPU Programing Framework with Rubric-based Agentic Reinforcement Learning
by: Li, Shiyang, et al.
Published: (2026)

GuirlVG: Incentivize GUI Visual Grounding via Empirical Exploration on Reinforcement Learning
by: Kang, Weitai, et al.
Published: (2025)

\textsc{GUI-Spotlight}: Adaptive Iterative Focus Refinement for Enhanced GUI Visual Grounding
by: Lei, Bin, et al.
Published: (2025)

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization
by: Zhang, Zijian, et al.
Published: (2025)

Observer-Based Data-Driven Consensus Control for Nonlinear Multi-Agent Systems against DoS and FDI attacks
by: Zhang, Yi, et al.
Published: (2025)

CUDAHercules: Benchmarking Hardware-Aware Expert-level CUDA Optimization for LLMs
by: Li, Shiyang, et al.
Published: (2026)

Infant Agent: A Tool-Integrated, Logic-Driven Agent with Cost-Effective API Usage
by: Lei, Bin, et al.
Published: (2024)

AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents
by: Xie, Jingxu, et al.
Published: (2025)

ExpVG: Investigating the Design Space of Visual Grounding in Multimodal Large Language Model
by: Kang, Weitai, et al.
Published: (2025)

AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant
by: Jia, Chengyou, et al.
Published: (2024)

Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer
by: Wu, Junyi, et al.
Published: (2024)

RTop-K: Ultra-Fast Row-Wise Top-K Selection for Neural Network Acceleration on GPUs
by: Xie, Xi, et al.
Published: (2024)

Scaling Generalist Data-Analytic Agents
by: Qiao, Shuofei, et al.
Published: (2025)

Fortran2CPP: Automating Fortran-to-C++ Translation using LLMs via Multi-Turn Dialogue and Dual-Agent Integration
by: Chen, Le, et al.
Published: (2024)

SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
by: Kang, Weitai, et al.
Published: (2024)

ACTRESS: Active Retraining for Semi-supervised Visual Grounding
by: Kang, Weitai, et al.
Published: (2024)

OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent
by: Yang, Bowen, et al.
Published: (2026)

PresentAgent-2: Towards Generalist Multimodal Presentation Agents
by: Wu, Wei, et al.
Published: (2026)

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection
by: Liu, Yuhang, et al.
Published: (2025)

OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
by: Wu, Zhiyong, et al.
Published: (2024)

AdaPI: Facilitating DNN Model Adaptivity for Efficient Private Inference in Edge Computing
by: Zhou, Tong, et al.
Published: (2024)

FHE-Agent: Automating CKKS Configuration for Practical Encrypted Inference via an LLM-Guided Agentic Framework
by: Xu, Nuo, et al.
Published: (2025)

On the Faithfulness of Vision Transformer Explanations
by: Wu, Junyi, et al.
Published: (2024)

Visual Grounding with Attention-Driven Constraint Balancing
by: Kang, Weitai, et al.
Published: (2024)

Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning
by: Kang, Weitai, et al.
Published: (2024)

DR-MMSearchAgent: Deepening Reasoning in Multimodal Search Agents
by: Wang, Shengqin, et al.
Published: (2026)

Coding Agents with Multimodal Browsing are Generalist Problem Solvers
by: Soni, Aditya Bharat, et al.
Published: (2025)

Towards Enterprise-Ready Computer Using Generalist Agent
by: Marreed, Sami, et al.
Published: (2025)

An Embodied Generalist Agent in 3D World
by: Huang, Jiangyong, et al.
Published: (2023)

Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents
by: Agashe, Saaket, et al.
Published: (2025)

Beyond Code Pairs: Dialogue-Based Data Generation for LLM Code Translation
by: Chen, Le, et al.
Published: (2025)

Towards Hierarchical Multi-Agent Workflows for Zero-Shot Prompt Optimization
by: Liu, Yuchi, et al.
Published: (2024)

From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons
by: Szot, Andrew, et al.
Published: (2024)

Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents
by: Wang, Zihao, et al.
Published: (2025)

FlashVideo: A Framework for Swift Inference in Text-to-Video Generation
by: Lei, Bin, et al.
Published: (2023)

VGent: Visual Grounding via Modular Design for Disentangling Reasoning and Prediction
by: Kang, Weitai, et al.
Published: (2025)

Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention
by: Kang, Weitai, et al.
Published: (2024)

3DResT: A Strong Baseline for Semi-Supervised 3D Referring Expression Segmentation
by: Chen, Wenxin, et al.
Published: (2025)

Advanced Large Language Model (LLM)-Driven Verilog Development: Enhancing Power, Performance, and Area Optimization in Code Synthesis
by: Thorat, Kiran, et al.
Published: (2023)