Saved in:
| Main Authors: | Lei, Bin, Kang, Weitai, Zhang, Zijian, Chen, Winson, Xie, Xi, Zuo, Shan, Xie, Mimi, Payani, Ali, Hong, Mingyi, Yan, Yan, Ding, Caiwen |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.10887 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems
by: Lei, Bin, et al.
Published: (2024)
by: Lei, Bin, et al.
Published: (2024)
StitchCUDA: An Automated Multi-Agents End-to-End GPU Programing Framework with Rubric-based Agentic Reinforcement Learning
by: Li, Shiyang, et al.
Published: (2026)
by: Li, Shiyang, et al.
Published: (2026)
GuirlVG: Incentivize GUI Visual Grounding via Empirical Exploration on Reinforcement Learning
by: Kang, Weitai, et al.
Published: (2025)
by: Kang, Weitai, et al.
Published: (2025)
\textsc{GUI-Spotlight}: Adaptive Iterative Focus Refinement for Enhanced GUI Visual Grounding
by: Lei, Bin, et al.
Published: (2025)
by: Lei, Bin, et al.
Published: (2025)
CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization
by: Zhang, Zijian, et al.
Published: (2025)
by: Zhang, Zijian, et al.
Published: (2025)
Observer-Based Data-Driven Consensus Control for Nonlinear Multi-Agent Systems against DoS and FDI attacks
by: Zhang, Yi, et al.
Published: (2025)
by: Zhang, Yi, et al.
Published: (2025)
CUDAHercules: Benchmarking Hardware-Aware Expert-level CUDA Optimization for LLMs
by: Li, Shiyang, et al.
Published: (2026)
by: Li, Shiyang, et al.
Published: (2026)
Infant Agent: A Tool-Integrated, Logic-Driven Agent with Cost-Effective API Usage
by: Lei, Bin, et al.
Published: (2024)
by: Lei, Bin, et al.
Published: (2024)
AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents
by: Xie, Jingxu, et al.
Published: (2025)
by: Xie, Jingxu, et al.
Published: (2025)
ExpVG: Investigating the Design Space of Visual Grounding in Multimodal Large Language Model
by: Kang, Weitai, et al.
Published: (2025)
by: Kang, Weitai, et al.
Published: (2025)
AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant
by: Jia, Chengyou, et al.
Published: (2024)
by: Jia, Chengyou, et al.
Published: (2024)
Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer
by: Wu, Junyi, et al.
Published: (2024)
by: Wu, Junyi, et al.
Published: (2024)
RTop-K: Ultra-Fast Row-Wise Top-K Selection for Neural Network Acceleration on GPUs
by: Xie, Xi, et al.
Published: (2024)
by: Xie, Xi, et al.
Published: (2024)
Scaling Generalist Data-Analytic Agents
by: Qiao, Shuofei, et al.
Published: (2025)
by: Qiao, Shuofei, et al.
Published: (2025)
Fortran2CPP: Automating Fortran-to-C++ Translation using LLMs via Multi-Turn Dialogue and Dual-Agent Integration
by: Chen, Le, et al.
Published: (2024)
by: Chen, Le, et al.
Published: (2024)
SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
by: Kang, Weitai, et al.
Published: (2024)
by: Kang, Weitai, et al.
Published: (2024)
ACTRESS: Active Retraining for Semi-supervised Visual Grounding
by: Kang, Weitai, et al.
Published: (2024)
by: Kang, Weitai, et al.
Published: (2024)
OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent
by: Yang, Bowen, et al.
Published: (2026)
by: Yang, Bowen, et al.
Published: (2026)
PresentAgent-2: Towards Generalist Multimodal Presentation Agents
by: Wu, Wei, et al.
Published: (2026)
by: Wu, Wei, et al.
Published: (2026)
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection
by: Liu, Yuhang, et al.
Published: (2025)
by: Liu, Yuhang, et al.
Published: (2025)
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
by: Wu, Zhiyong, et al.
Published: (2024)
by: Wu, Zhiyong, et al.
Published: (2024)
AdaPI: Facilitating DNN Model Adaptivity for Efficient Private Inference in Edge Computing
by: Zhou, Tong, et al.
Published: (2024)
by: Zhou, Tong, et al.
Published: (2024)
FHE-Agent: Automating CKKS Configuration for Practical Encrypted Inference via an LLM-Guided Agentic Framework
by: Xu, Nuo, et al.
Published: (2025)
by: Xu, Nuo, et al.
Published: (2025)
On the Faithfulness of Vision Transformer Explanations
by: Wu, Junyi, et al.
Published: (2024)
by: Wu, Junyi, et al.
Published: (2024)
Visual Grounding with Attention-Driven Constraint Balancing
by: Kang, Weitai, et al.
Published: (2024)
by: Kang, Weitai, et al.
Published: (2024)
Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning
by: Kang, Weitai, et al.
Published: (2024)
by: Kang, Weitai, et al.
Published: (2024)
DR-MMSearchAgent: Deepening Reasoning in Multimodal Search Agents
by: Wang, Shengqin, et al.
Published: (2026)
by: Wang, Shengqin, et al.
Published: (2026)
Coding Agents with Multimodal Browsing are Generalist Problem Solvers
by: Soni, Aditya Bharat, et al.
Published: (2025)
by: Soni, Aditya Bharat, et al.
Published: (2025)
Towards Enterprise-Ready Computer Using Generalist Agent
by: Marreed, Sami, et al.
Published: (2025)
by: Marreed, Sami, et al.
Published: (2025)
An Embodied Generalist Agent in 3D World
by: Huang, Jiangyong, et al.
Published: (2023)
by: Huang, Jiangyong, et al.
Published: (2023)
Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents
by: Agashe, Saaket, et al.
Published: (2025)
by: Agashe, Saaket, et al.
Published: (2025)
Beyond Code Pairs: Dialogue-Based Data Generation for LLM Code Translation
by: Chen, Le, et al.
Published: (2025)
by: Chen, Le, et al.
Published: (2025)
Towards Hierarchical Multi-Agent Workflows for Zero-Shot Prompt Optimization
by: Liu, Yuchi, et al.
Published: (2024)
by: Liu, Yuchi, et al.
Published: (2024)
From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons
by: Szot, Andrew, et al.
Published: (2024)
by: Szot, Andrew, et al.
Published: (2024)
Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents
by: Wang, Zihao, et al.
Published: (2025)
by: Wang, Zihao, et al.
Published: (2025)
FlashVideo: A Framework for Swift Inference in Text-to-Video Generation
by: Lei, Bin, et al.
Published: (2023)
by: Lei, Bin, et al.
Published: (2023)
VGent: Visual Grounding via Modular Design for Disentangling Reasoning and Prediction
by: Kang, Weitai, et al.
Published: (2025)
by: Kang, Weitai, et al.
Published: (2025)
Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention
by: Kang, Weitai, et al.
Published: (2024)
by: Kang, Weitai, et al.
Published: (2024)
3DResT: A Strong Baseline for Semi-Supervised 3D Referring Expression Segmentation
by: Chen, Wenxin, et al.
Published: (2025)
by: Chen, Wenxin, et al.
Published: (2025)
Advanced Large Language Model (LLM)-Driven Verilog Development: Enhancing Power, Performance, and Area Optimization in Code Synthesis
by: Thorat, Kiran, et al.
Published: (2023)
by: Thorat, Kiran, et al.
Published: (2023)
Similar Items
-
MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems
by: Lei, Bin, et al.
Published: (2024) -
StitchCUDA: An Automated Multi-Agents End-to-End GPU Programing Framework with Rubric-based Agentic Reinforcement Learning
by: Li, Shiyang, et al.
Published: (2026) -
GuirlVG: Incentivize GUI Visual Grounding via Empirical Exploration on Reinforcement Learning
by: Kang, Weitai, et al.
Published: (2025) -
\textsc{GUI-Spotlight}: Adaptive Iterative Focus Refinement for Enhanced GUI Visual Grounding
by: Lei, Bin, et al.
Published: (2025) -
CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization
by: Zhang, Zijian, et al.
Published: (2025)