Guardado en:
| Autores principales: | Agashe, Saaket, Wong, Kyle, Tu, Vincent, Yang, Jiachen, Li, Ang, Wang, Xin Eric |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2504.00906 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Agent S: An Open Agentic Framework that Uses Computers Like a Human
por: Agashe, Saaket, et al.
Publicado: (2024)
por: Agashe, Saaket, et al.
Publicado: (2024)
Scaling Agents for Computer Use
por: Gonzalez-Pumariega, Gonzalo, et al.
Publicado: (2025)
por: Gonzalez-Pumariega, Gonzalo, et al.
Publicado: (2025)
On the Reliability of Computer Use Agents
por: Gonzalez-Pumariega, Gonzalo, et al.
Publicado: (2026)
por: Gonzalez-Pumariega, Gonzalo, et al.
Publicado: (2026)
An Embodied Generalist Agent in 3D World
por: Huang, Jiangyong, et al.
Publicado: (2023)
por: Huang, Jiangyong, et al.
Publicado: (2023)
OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent
por: Yang, Bowen, et al.
Publicado: (2026)
por: Yang, Bowen, et al.
Publicado: (2026)
OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use
por: Hu, Xueyu, et al.
Publicado: (2025)
por: Hu, Xueyu, et al.
Publicado: (2025)
GPT-4V(ision) is a Generalist Web Agent, if Grounded
por: Zheng, Boyuan, et al.
Publicado: (2024)
por: Zheng, Boyuan, et al.
Publicado: (2024)
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use
por: Hu, Siyuan, et al.
Publicado: (2024)
por: Hu, Siyuan, et al.
Publicado: (2024)
JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents
por: Zheng, Kaizhi, et al.
Publicado: (2022)
por: Zheng, Kaizhi, et al.
Publicado: (2022)
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web
por: Kapoor, Raghav, et al.
Publicado: (2024)
por: Kapoor, Raghav, et al.
Publicado: (2024)
MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning
por: Ashraf, Tajamul, et al.
Publicado: (2025)
por: Ashraf, Tajamul, et al.
Publicado: (2025)
Self-Resource Allocation in Multi-Agent LLM Systems
por: Amayuelas, Alfonso, et al.
Publicado: (2025)
por: Amayuelas, Alfonso, et al.
Publicado: (2025)
MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation
por: Li, Yan, et al.
Publicado: (2026)
por: Li, Yan, et al.
Publicado: (2026)
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation
por: Feng, Weixi, et al.
Publicado: (2024)
por: Feng, Weixi, et al.
Publicado: (2024)
OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks
por: Hu, Wenbo, et al.
Publicado: (2026)
por: Hu, Wenbo, et al.
Publicado: (2026)
ComCLIP: Training-Free Compositional Image and Text Matching
por: Jiang, Kenan, et al.
Publicado: (2022)
por: Jiang, Kenan, et al.
Publicado: (2022)
CLARIFY: A Specialist-Generalist Framework for Accurate and Lightweight Dermatological Visual Question Answering
por: Saha, Aranya, et al.
Publicado: (2025)
por: Saha, Aranya, et al.
Publicado: (2025)
InternAgent: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification
por: InternAgent Team, et al.
Publicado: (2025)
por: InternAgent Team, et al.
Publicado: (2025)
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models
por: Zhang, Fan, et al.
Publicado: (2024)
por: Zhang, Fan, et al.
Publicado: (2024)
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
por: Liu, Xiao, et al.
Publicado: (2024)
por: Liu, Xiao, et al.
Publicado: (2024)
From EduVisBench to EduVisAgent: A Benchmark and Multi-Agent Framework for Reasoning-Driven Pedagogical Visualization
por: Ji, Haonian, et al.
Publicado: (2025)
por: Ji, Haonian, et al.
Publicado: (2025)
METAL: A Multi-Agent Framework for Chart Generation with Test-Time Scaling
por: Li, Bingxuan, et al.
Publicado: (2025)
por: Li, Bingxuan, et al.
Publicado: (2025)
Navigation as Attackers Wish? Towards Building Robust Embodied Agents under Federated Learning
por: Zhang, Yunchao, et al.
Publicado: (2022)
por: Zhang, Yunchao, et al.
Publicado: (2022)
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning
por: LASA Team, et al.
Publicado: (2025)
por: LASA Team, et al.
Publicado: (2025)
DPO Learning with LLMs-Judge Signal for Computer Use Agents
por: Luo, Man, et al.
Publicado: (2025)
por: Luo, Man, et al.
Publicado: (2025)
PresentAgent-2: Towards Generalist Multimodal Presentation Agents
por: Wu, Wei, et al.
Publicado: (2026)
por: Wu, Wei, et al.
Publicado: (2026)
RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models
por: Chen, Haoyu, et al.
Publicado: (2024)
por: Chen, Haoyu, et al.
Publicado: (2024)
Teaching Embodied Reinforcement Learning Agents: Informativeness and Diversity of Language Use
por: Xi, Jiajun, et al.
Publicado: (2024)
por: Xi, Jiajun, et al.
Publicado: (2024)
Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences
por: Jin, Zhuoran, et al.
Publicado: (2025)
por: Jin, Zhuoran, et al.
Publicado: (2025)
A Multimodal Recaptioning Framework to Account for Perceptual Diversity Across Languages in Vision-Language Modeling
por: Buettner, Kyle, et al.
Publicado: (2025)
por: Buettner, Kyle, et al.
Publicado: (2025)
Inquire, Interact, and Integrate: A Proactive Agent Collaborative Framework for Zero-Shot Multimodal Medical Reasoning
por: Gu, Zishan, et al.
Publicado: (2024)
por: Gu, Zishan, et al.
Publicado: (2024)
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
por: Wu, Qianhui, et al.
Publicado: (2025)
por: Wu, Qianhui, et al.
Publicado: (2025)
A Multimodal Automated Interpretability Agent
por: Shaham, Tamar Rott, et al.
Publicado: (2024)
por: Shaham, Tamar Rott, et al.
Publicado: (2024)
Large Multimodal Agents: A Survey
por: Xie, Junlin, et al.
Publicado: (2024)
por: Xie, Junlin, et al.
Publicado: (2024)
OpenCUA: Open Foundations for Computer-Use Agents
por: Wang, Xinyuan, et al.
Publicado: (2025)
por: Wang, Xinyuan, et al.
Publicado: (2025)
AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security
por: Liu, Dongrui, et al.
Publicado: (2026)
por: Liu, Dongrui, et al.
Publicado: (2026)
Fara-7B: An Efficient Agentic Model for Computer Use
por: Awadallah, Ahmed, et al.
Publicado: (2025)
por: Awadallah, Ahmed, et al.
Publicado: (2025)
From Specialist to Generalist: Unlocking SAM's Learning Potential on Unlabeled Medical Images
por: Vu, Vi, et al.
Publicado: (2026)
por: Vu, Vi, et al.
Publicado: (2026)
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision
por: Wei, Cong, et al.
Publicado: (2024)
por: Wei, Cong, et al.
Publicado: (2024)
MCU: An Evaluation Framework for Open-Ended Game Agents
por: Zheng, Xinyue, et al.
Publicado: (2023)
por: Zheng, Xinyue, et al.
Publicado: (2023)
Ejemplares similares
-
Agent S: An Open Agentic Framework that Uses Computers Like a Human
por: Agashe, Saaket, et al.
Publicado: (2024) -
Scaling Agents for Computer Use
por: Gonzalez-Pumariega, Gonzalo, et al.
Publicado: (2025) -
On the Reliability of Computer Use Agents
por: Gonzalez-Pumariega, Gonzalo, et al.
Publicado: (2026) -
An Embodied Generalist Agent in 3D World
por: Huang, Jiangyong, et al.
Publicado: (2023) -
OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent
por: Yang, Bowen, et al.
Publicado: (2026)