Saved in:
| Main Authors: | Bai, Hao, Taymanov, Alexey, Zhang, Tong, Kumar, Aviral, Whitehead, Spencer |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.02439 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks
by: Koh, Jing Yu, et al.
Published: (2024)
by: Koh, Jing Yu, et al.
Published: (2024)
Weblica: Scalable and Reproducible Training Environments for Visual Web Agents
by: Kar, Oğuzhan Fatih, et al.
Published: (2026)
by: Kar, Oğuzhan Fatih, et al.
Published: (2026)
OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents
by: Yang, Rui, et al.
Published: (2026)
by: Yang, Rui, et al.
Published: (2026)
OceanGym: A Benchmark Environment for Underwater Embodied Agents
by: Xue, Yida, et al.
Published: (2025)
by: Xue, Yida, et al.
Published: (2025)
WebInject: Prompt Injection Attack to Web Agents
by: Wang, Xilong, et al.
Published: (2025)
by: Wang, Xilong, et al.
Published: (2025)
WALT: Web Agents that Learn Tools
by: Prabhu, Viraj, et al.
Published: (2025)
by: Prabhu, Viraj, et al.
Published: (2025)
Conditional Diffusion on Web-Scale Image Pairs leads to Diverse Image Variations
by: Kumar, Manoj, et al.
Published: (2024)
by: Kumar, Manoj, et al.
Published: (2024)
LAION-C: An Out-of-Distribution Benchmark for Web-Scale Vision Models
by: Li, Fanfei, et al.
Published: (2025)
by: Li, Fanfei, et al.
Published: (2025)
WebSerial Vision Training for Microcontrollers: A Browser-Based Companion to On-Device CNN Training
by: Ellis, Jeremy
Published: (2026)
by: Ellis, Jeremy
Published: (2026)
No Training Wheels: Steering Vectors for Bias Correction at Inference Time
by: Gupta, Aviral, et al.
Published: (2025)
by: Gupta, Aviral, et al.
Published: (2025)
Web-based Melanoma Detection
by: Kim, SangHyuk, et al.
Published: (2024)
by: Kim, SangHyuk, et al.
Published: (2024)
Walking the Web of Concept-Class Relationships in Incrementally Trained Interpretable Models
by: Agrawal, Susmit, et al.
Published: (2025)
by: Agrawal, Susmit, et al.
Published: (2025)
InfiniteWeb: Scalable Web Environment Synthesis for GUI Agent Training
by: Zhang, Ziyun, et al.
Published: (2026)
by: Zhang, Ziyun, et al.
Published: (2026)
HistoGym: A Reinforcement Learning Environment for Histopathological Image Analysis
by: Liu, Zhi-Bo, et al.
Published: (2024)
by: Liu, Zhi-Bo, et al.
Published: (2024)
Web2Grasp: Learning Functional Grasps from Web Images of Hand-Object Interactions
by: Chen, Hongyi, et al.
Published: (2025)
by: Chen, Hongyi, et al.
Published: (2025)
The Unified Balance Theory of Second-Moment Exponential Scaling Optimizers in Visual Tasks
by: Zhang, Gongyue, et al.
Published: (2024)
by: Zhang, Gongyue, et al.
Published: (2024)
EEO-TFV: Escape-Explore Optimizer for Web-Scale Time-Series Forecasting and Vision Analysis
by: Wang, Hua, et al.
Published: (2026)
by: Wang, Hua, et al.
Published: (2026)
GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation
by: Cheang, Chi-Lam, et al.
Published: (2024)
by: Cheang, Chi-Lam, et al.
Published: (2024)
TimeWarp: Evaluating Web Agents by Revisiting the Past
by: Ishmam, Md Farhan, et al.
Published: (2026)
by: Ishmam, Md Farhan, et al.
Published: (2026)
Vision-Language Models Provide Promptable Representations for Reinforcement Learning
by: Chen, William, et al.
Published: (2024)
by: Chen, William, et al.
Published: (2024)
Noise-Tolerant Hybrid Prototypical Learning with Noisy Web Data
by: Liang, Chao, et al.
Published: (2025)
by: Liang, Chao, et al.
Published: (2025)
InnoGym: Benchmarking the Innovation Potential of AI Agents
by: Zhang, Jintian, et al.
Published: (2025)
by: Zhang, Jintian, et al.
Published: (2025)
Structuring a Training Strategy to Robustify Perception Models with Realistic Image Augmentations
by: Hammam, Ahmed, et al.
Published: (2024)
by: Hammam, Ahmed, et al.
Published: (2024)
MolmoWeb: Open Visual Web Agent and Open Data for the Open Web
by: Gupta, Tanmay, et al.
Published: (2026)
by: Gupta, Tanmay, et al.
Published: (2026)
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
by: Tang, Jingqun, et al.
Published: (2024)
by: Tang, Jingqun, et al.
Published: (2024)
Descripción automática de secciones delgadas de rocas: una aplicación Web
by: Paucar, Stalyn, et al.
Published: (2024)
by: Paucar, Stalyn, et al.
Published: (2024)
DRAGON: A Large-Scale Dataset of Realistic Images Generated by Diffusion Models
by: Bertazzini, Giulia, et al.
Published: (2025)
by: Bertazzini, Giulia, et al.
Published: (2025)
Vision Learners Meet Web Image-Text Pairs
by: Zhao, Bingchen, et al.
Published: (2023)
by: Zhao, Bingchen, et al.
Published: (2023)
CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents
by: Wang, Bowen, et al.
Published: (2026)
by: Wang, Bowen, et al.
Published: (2026)
ZeroMimic: Distilling Robotic Manipulation Skills from Web Videos
by: Shi, Junyao, et al.
Published: (2025)
by: Shi, Junyao, et al.
Published: (2025)
Growing Visual Generative Capacity for Pre-Trained MLLMs
by: Wang, Hanyu, et al.
Published: (2025)
by: Wang, Hanyu, et al.
Published: (2025)
RealWebAssist: A Benchmark for Long-Horizon Web Assistance with Real-World Users
by: Ye, Suyu, et al.
Published: (2025)
by: Ye, Suyu, et al.
Published: (2025)
Learning an Actionable Discrete Diffusion Policy via Large-Scale Actionless Video Pre-Training
by: He, Haoran, et al.
Published: (2024)
by: He, Haoran, et al.
Published: (2024)
Pix2Fact: When Vision Is Not Enough -- Benchmarking Fine-Grained VQA with Web Verification on High-Resolution Real-World Scenes
by: Jiang, Yifan, et al.
Published: (2026)
by: Jiang, Yifan, et al.
Published: (2026)
Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training
by: Chen, Yangyi, et al.
Published: (2025)
by: Chen, Yangyi, et al.
Published: (2025)
Visual Test-time Scaling for GUI Agent Grounding
by: Luo, Tiange, et al.
Published: (2025)
by: Luo, Tiange, et al.
Published: (2025)
Time-, Memory- and Parameter-Efficient Visual Adaptation
by: Mercea, Otniel-Bogdan, et al.
Published: (2024)
by: Mercea, Otniel-Bogdan, et al.
Published: (2024)
Scaling Laws for Task-Optimized Models of the Primate Visual Ventral Stream
by: Gokce, Abdulkadir, et al.
Published: (2024)
by: Gokce, Abdulkadir, et al.
Published: (2024)
WebCryptoAgent: Agentic Crypto Trading with Web Informatics
by: Kurban, Ali, et al.
Published: (2026)
by: Kurban, Ali, et al.
Published: (2026)
Tensor-Train Point Cloud Compression and Efficient Approximate Nearest-Neighbor Search
by: Novikov, Georgii, et al.
Published: (2024)
by: Novikov, Georgii, et al.
Published: (2024)
Similar Items
-
VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks
by: Koh, Jing Yu, et al.
Published: (2024) -
Weblica: Scalable and Reproducible Training Environments for Visual Web Agents
by: Kar, Oğuzhan Fatih, et al.
Published: (2026) -
OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents
by: Yang, Rui, et al.
Published: (2026) -
OceanGym: A Benchmark Environment for Underwater Embodied Agents
by: Xue, Yida, et al.
Published: (2025) -
WebInject: Prompt Injection Attack to Web Agents
by: Wang, Xilong, et al.
Published: (2025)