:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Bai, Hao, Taymanov, Alexey, Zhang, Tong, Kumar, Aviral, Whitehead, Spencer
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2601.02439
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks
by: Koh, Jing Yu, et al.
Published: (2024)

Weblica: Scalable and Reproducible Training Environments for Visual Web Agents
by: Kar, Oğuzhan Fatih, et al.
Published: (2026)

OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents
by: Yang, Rui, et al.
Published: (2026)

OceanGym: A Benchmark Environment for Underwater Embodied Agents
by: Xue, Yida, et al.
Published: (2025)

WebInject: Prompt Injection Attack to Web Agents
by: Wang, Xilong, et al.
Published: (2025)

WALT: Web Agents that Learn Tools
by: Prabhu, Viraj, et al.
Published: (2025)

Conditional Diffusion on Web-Scale Image Pairs leads to Diverse Image Variations
by: Kumar, Manoj, et al.
Published: (2024)

LAION-C: An Out-of-Distribution Benchmark for Web-Scale Vision Models
by: Li, Fanfei, et al.
Published: (2025)

WebSerial Vision Training for Microcontrollers: A Browser-Based Companion to On-Device CNN Training
by: Ellis, Jeremy
Published: (2026)

No Training Wheels: Steering Vectors for Bias Correction at Inference Time
by: Gupta, Aviral, et al.
Published: (2025)

Web-based Melanoma Detection
by: Kim, SangHyuk, et al.
Published: (2024)

Walking the Web of Concept-Class Relationships in Incrementally Trained Interpretable Models
by: Agrawal, Susmit, et al.
Published: (2025)

InfiniteWeb: Scalable Web Environment Synthesis for GUI Agent Training
by: Zhang, Ziyun, et al.
Published: (2026)

HistoGym: A Reinforcement Learning Environment for Histopathological Image Analysis
by: Liu, Zhi-Bo, et al.
Published: (2024)

Web2Grasp: Learning Functional Grasps from Web Images of Hand-Object Interactions
by: Chen, Hongyi, et al.
Published: (2025)

The Unified Balance Theory of Second-Moment Exponential Scaling Optimizers in Visual Tasks
by: Zhang, Gongyue, et al.
Published: (2024)

EEO-TFV: Escape-Explore Optimizer for Web-Scale Time-Series Forecasting and Vision Analysis
by: Wang, Hua, et al.
Published: (2026)

GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation
by: Cheang, Chi-Lam, et al.
Published: (2024)

TimeWarp: Evaluating Web Agents by Revisiting the Past
by: Ishmam, Md Farhan, et al.
Published: (2026)

Vision-Language Models Provide Promptable Representations for Reinforcement Learning
by: Chen, William, et al.
Published: (2024)

Noise-Tolerant Hybrid Prototypical Learning with Noisy Web Data
by: Liang, Chao, et al.
Published: (2025)

InnoGym: Benchmarking the Innovation Potential of AI Agents
by: Zhang, Jintian, et al.
Published: (2025)

Structuring a Training Strategy to Robustify Perception Models with Realistic Image Augmentations
by: Hammam, Ahmed, et al.
Published: (2024)

MolmoWeb: Open Visual Web Agent and Open Data for the Open Web
by: Gupta, Tanmay, et al.
Published: (2026)

TextSquare: Scaling up Text-Centric Visual Instruction Tuning
by: Tang, Jingqun, et al.
Published: (2024)

Descripción automática de secciones delgadas de rocas: una aplicación Web
by: Paucar, Stalyn, et al.
Published: (2024)

DRAGON: A Large-Scale Dataset of Realistic Images Generated by Diffusion Models
by: Bertazzini, Giulia, et al.
Published: (2025)

Vision Learners Meet Web Image-Text Pairs
by: Zhao, Bingchen, et al.
Published: (2023)

CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents
by: Wang, Bowen, et al.
Published: (2026)

ZeroMimic: Distilling Robotic Manipulation Skills from Web Videos
by: Shi, Junyao, et al.
Published: (2025)

Growing Visual Generative Capacity for Pre-Trained MLLMs
by: Wang, Hanyu, et al.
Published: (2025)

RealWebAssist: A Benchmark for Long-Horizon Web Assistance with Real-World Users
by: Ye, Suyu, et al.
Published: (2025)

Learning an Actionable Discrete Diffusion Policy via Large-Scale Actionless Video Pre-Training
by: He, Haoran, et al.
Published: (2024)

Pix2Fact: When Vision Is Not Enough -- Benchmarking Fine-Grained VQA with Web Verification on High-Resolution Real-World Scenes
by: Jiang, Yifan, et al.
Published: (2026)

Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training
by: Chen, Yangyi, et al.
Published: (2025)

Visual Test-time Scaling for GUI Agent Grounding
by: Luo, Tiange, et al.
Published: (2025)

Time-, Memory- and Parameter-Efficient Visual Adaptation
by: Mercea, Otniel-Bogdan, et al.
Published: (2024)

Scaling Laws for Task-Optimized Models of the Primate Visual Ventral Stream
by: Gokce, Abdulkadir, et al.
Published: (2024)

WebCryptoAgent: Agentic Crypto Trading with Web Informatics
by: Kurban, Ali, et al.
Published: (2026)

Tensor-Train Point Cloud Compression and Efficient Approximate Nearest-Neighbor Search
by: Novikov, Georgii, et al.
Published: (2024)