:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Bai, Hao, Zhou, Yifei, Cemri, Mert, Pan, Jiayi, Suhr, Alane, Levine, Sergey, Kumar, Aviral
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2406.11896
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Digi-Q: Learning Q-Value Functions for Training Device-Control Agents
by: Bai, Hao, et al.
Published: (2025)

Autonomous Evaluation and Refinement of Digital Agents
by: Pan, Jiayi, et al.
Published: (2024)

ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
by: Zhou, Yifei, et al.
Published: (2024)

Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
by: Zhai, Yuexiang, et al.
Published: (2024)

Is Value Learning Really the Main Bottleneck in Offline RL?
by: Park, Seohong, et al.
Published: (2024)

Training Software Engineering Agents and Verifiers with SWE-Gym
by: Pan, Jiayi, et al.
Published: (2024)

Scaling Test-Time Compute Without Verification or RL is Suboptimal
by: Setlur, Amrith, et al.
Published: (2025)

Vision-Language Models Provide Promptable Representations for Reinforcement Learning
by: Chen, William, et al.
Published: (2024)

Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data
by: Zhou, Zhiyuan, et al.
Published: (2024)

Learning Adaptive Parallel Reasoning with Language Models
by: Pan, Jiayi, et al.
Published: (2025)

Horizon Reduction Makes RL Scalable
by: Park, Seohong, et al.
Published: (2025)

RLAC: Reinforcement Learning with Adversarial Critic for Free-Form Generation Tasks
by: Wu, Mian, et al.
Published: (2025)

SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
by: Zhou, Yifei, et al.
Published: (2025)

Compute-Optimal Scaling for Value-Based Deep RL
by: Fu, Preston, et al.
Published: (2025)

Evaluating Model Perception of Color Illusions in Photorealistic Scenes
by: Mao, Lingjun, et al.
Published: (2024)

Grounding Language in Multi-Perspective Referential Communication
by: Tang, Zineng, et al.
Published: (2024)

Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
by: Nakamoto, Mitsuhiko, et al.
Published: (2023)

Value-Based Deep RL Scales Predictably
by: Rybkin, Oleh, et al.
Published: (2025)

ABBEL: LLM Agents Acting through Belief Bottlenecks Expressed in Language
by: Lidayan, Aly, et al.
Published: (2025)

DigiData: Training and Evaluating General-Purpose Mobile Control Agents
by: Sun, Yuxuan, et al.
Published: (2025)

Unfamiliar Finetuning Examples Control How Language Models Hallucinate
by: Kang, Katie, et al.
Published: (2024)

Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents
by: Zhou, Yifei, et al.
Published: (2024)

D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning
by: Rafailov, Rafael, et al.
Published: (2024)

Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance
by: Nakamoto, Mitsuhiko, et al.
Published: (2024)

WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks
by: Bai, Hao, et al.
Published: (2026)

Using Language Models to Disambiguate Lexical Choices in Translation
by: Barua, Josh, et al.
Published: (2024)

Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting
by: Sclar, Melanie, et al.
Published: (2023)

Long Chain-of-Thought Reasoning Across Languages
by: Barua, Josh, et al.
Published: (2025)

DigiSoup: A Zero-Training Entropy-Driven Agent Beats Trained Reinforcement Learning on Multi-Agent Social Dilemmas
by: Matthew, Fearne
Published: (2026)

DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
by: Wang, Taiyi, et al.
Published: (2024)

Visual Pre-Training on Unlabeled Images using Reinforcement Learning
by: Ghosh, Dibya, et al.
Published: (2025)

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
by: Farebrother, Jesse, et al.
Published: (2024)

floq: Training Critics via Flow-Matching for Scaling Compute in Value-Based RL
by: Agrawalla, Bhavya, et al.
Published: (2025)

Reinforcement Learning with Action Chunking
by: Li, Qiyang, et al.
Published: (2025)

ViVa: Video-Trained Value Functions for Guiding Online RL from Diverse Data
by: Dashora, Nitish, et al.
Published: (2025)

Training Diffusion Models with Reinforcement Learning
by: Black, Kevin, et al.
Published: (2023)

Self-Challenging Language Model Agents
by: Zhou, Yifei, et al.
Published: (2025)

SELFI: Autonomous Self-Improvement with Reinforcement Learning for Social Navigation
by: Hirose, Noriaki, et al.
Published: (2024)

What Do Learning Dynamics Reveal About Generalization in LLM Reasoning?
by: Kang, Katie, et al.
Published: (2024)

Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations
by: Hong, Joey, et al.
Published: (2024)