Saved in:
| Main Authors: | Dai, Yutong, Ramakrishnan, Krithika, Gu, Jing, Fernandez, Matthew, Luo, Yanqi, Prabhu, Viraj, Hu, Zhenyu, Savarese, Silvio, Xiong, Caiming, Chen, Zeyuan, Xu, Ran |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.26506 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
WALT: Web Agents that Learn Tools
by: Prabhu, Viraj, et al.
Published: (2025)
by: Prabhu, Viraj, et al.
Published: (2025)
CoAct-1: Computer-using Multi-Agent System with Coding Actions
by: Song, Linxin, et al.
Published: (2025)
by: Song, Linxin, et al.
Published: (2025)
How Far Are Vision-Language Models from Constructing the Real World? A Benchmark for Physical Generative Reasoning
by: Yang, Luyu, et al.
Published: (2026)
by: Yang, Luyu, et al.
Published: (2026)
Trust but Verify: Programmatic VLM Evaluation in the Wild
by: Prabhu, Viraj, et al.
Published: (2024)
by: Prabhu, Viraj, et al.
Published: (2024)
Enabling High Data Throughput Reinforcement Learning on GPUs: A Domain Agnostic Framework for Data-Driven Scientific Research
by: Lan, Tian, et al.
Published: (2024)
by: Lan, Tian, et al.
Published: (2024)
DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs
by: Wang, Zhenhailong, et al.
Published: (2025)
by: Wang, Zhenhailong, et al.
Published: (2025)
Shared Imagination: LLMs Hallucinate Alike
by: Zhou, Yilun, et al.
Published: (2024)
by: Zhou, Yilun, et al.
Published: (2024)
GTA1: GUI Test-time Scaling Agent
by: Yang, Yan, et al.
Published: (2025)
by: Yang, Yan, et al.
Published: (2025)
Reasoning Curriculum: Bootstrapping Broad LLM Reasoning from Math
by: Pang, Bo, et al.
Published: (2025)
by: Pang, Bo, et al.
Published: (2025)
INDICT: Code Generation with Internal Dialogues of Critiques for Both Security and Helpfulness
by: Le, Hung, et al.
Published: (2024)
by: Le, Hung, et al.
Published: (2024)
GIFT-Eval: A Benchmark For General Time Series Forecasting Model Evaluation
by: Aksu, Taha, et al.
Published: (2024)
by: Aksu, Taha, et al.
Published: (2024)
UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG
by: Peng, Xiangyu, et al.
Published: (2025)
by: Peng, Xiangyu, et al.
Published: (2025)
BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation
by: Pang, Bo, et al.
Published: (2025)
by: Pang, Bo, et al.
Published: (2025)
Unified Training of Universal Time Series Forecasting Transformers
by: Woo, Gerald, et al.
Published: (2024)
by: Woo, Gerald, et al.
Published: (2024)
CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models
by: Li, Jierui, et al.
Published: (2024)
by: Li, Jierui, et al.
Published: (2024)
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
by: Qin, Can, et al.
Published: (2024)
by: Qin, Can, et al.
Published: (2024)
MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers
by: Luo, Ziyang, et al.
Published: (2025)
by: Luo, Ziyang, et al.
Published: (2025)
PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback
by: Peng, Yun, et al.
Published: (2024)
by: Peng, Yun, et al.
Published: (2024)
Asynchronous Tool Usage for Real-Time Agents
by: Ginart, Antonio A., et al.
Published: (2024)
by: Ginart, Antonio A., et al.
Published: (2024)
HIVE: Harnessing Human Feedback for Instructional Visual Editing
by: Zhang, Shu, et al.
Published: (2023)
by: Zhang, Shu, et al.
Published: (2023)
Text2Data: Low-Resource Data Generation with Textual Control
by: Wang, Shiyu, et al.
Published: (2024)
by: Wang, Shiyu, et al.
Published: (2024)
LZ Penalty: An information-theoretic repetition penalty for autoregressive language models
by: Ginart, Antonio A., et al.
Published: (2025)
by: Ginart, Antonio A., et al.
Published: (2025)
CodeXEmbed: A Generalist Embedding Model Family for Multiligual and Multi-task Code Retrieval
by: Liu, Ye, et al.
Published: (2024)
by: Liu, Ye, et al.
Published: (2024)
MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases
by: Murthy, Rithesh, et al.
Published: (2024)
by: Murthy, Rithesh, et al.
Published: (2024)
Strefer: Empowering Video LLMs with Space-Time Referring and Reasoning via Synthetic Instruction Data
by: Zhou, Honglu, et al.
Published: (2025)
by: Zhou, Honglu, et al.
Published: (2025)
Salesforce Salesforce Certified AI Specialist PDF
by: Certification Exam
Published: (2026)
by: Certification Exam
Published: (2026)
Salesforce Salesforce Certified Heroku Architect PDF
by: Certification Exam
Published: (2026)
by: Certification Exam
Published: (2026)
Salesforce Salesforce Plat-Admn-201 PDF
by: Certification Exam
Published: (2026)
by: Certification Exam
Published: (2026)
ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models
by: Zhang, Jieyu, et al.
Published: (2024)
by: Zhang, Jieyu, et al.
Published: (2024)
xGen-small Technical Report
by: Nijkamp, Erik, et al.
Published: (2025)
by: Nijkamp, Erik, et al.
Published: (2025)
SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents
by: Nguyen, Xuan-Phi, et al.
Published: (2025)
by: Nguyen, Xuan-Phi, et al.
Published: (2025)
Reward-Guided Speculative Decoding for Efficient LLM Reasoning
by: Liao, Baohao, et al.
Published: (2025)
by: Liao, Baohao, et al.
Published: (2025)
ViUniT: Visual Unit Tests for More Robust Visual Programming
by: Panagopoulou, Artemis, et al.
Published: (2024)
by: Panagopoulou, Artemis, et al.
Published: (2024)
Entropy-Based Block Pruning for Efficient Large Language Models
by: Yang, Liangwei, et al.
Published: (2025)
by: Yang, Liangwei, et al.
Published: (2025)
Enterprise Deep Research: Steerable Multi-Agent Deep Research for Enterprise Analytics
by: Prabhakar, Akshara, et al.
Published: (2025)
by: Prabhakar, Akshara, et al.
Published: (2025)
BLIP3o-NEXT: Next Frontier of Native Image Generation
by: Chen, Jiuhai, et al.
Published: (2025)
by: Chen, Jiuhai, et al.
Published: (2025)
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
by: Xie, Tianbao, et al.
Published: (2024)
by: Xie, Tianbao, et al.
Published: (2024)
Salesforce Salesforce Certified Platform App Builder PDF
by: Certification Exam
Published: (2026)
by: Certification Exam
Published: (2026)
Salesforce Salesforce Einstein-Analytics-and-Discovery-Consultant PDF
by: Certification Exam
Published: (2026)
by: Certification Exam
Published: (2026)
Salesforce Salesforce Certified Sharing and Visibility Architect PDF
by: Certification Exam
Published: (2026)
by: Certification Exam
Published: (2026)
Similar Items
-
WALT: Web Agents that Learn Tools
by: Prabhu, Viraj, et al.
Published: (2025) -
CoAct-1: Computer-using Multi-Agent System with Coding Actions
by: Song, Linxin, et al.
Published: (2025) -
How Far Are Vision-Language Models from Constructing the Real World? A Benchmark for Physical Generative Reasoning
by: Yang, Luyu, et al.
Published: (2026) -
Trust but Verify: Programmatic VLM Evaluation in the Wild
by: Prabhu, Viraj, et al.
Published: (2024) -
Enabling High Data Throughput Reinforcement Learning on GPUs: A Domain Agnostic Framework for Data-Driven Scientific Research
by: Lan, Tian, et al.
Published: (2024)