Saved in:
| Main Authors: | Bai, Hao, Zhou, Yifei, Cemri, Mert, Pan, Jiayi, Suhr, Alane, Levine, Sergey, Kumar, Aviral |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.11896 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Digi-Q: Learning Q-Value Functions for Training Device-Control Agents
by: Bai, Hao, et al.
Published: (2025)
by: Bai, Hao, et al.
Published: (2025)
Autonomous Evaluation and Refinement of Digital Agents
by: Pan, Jiayi, et al.
Published: (2024)
by: Pan, Jiayi, et al.
Published: (2024)
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
by: Zhou, Yifei, et al.
Published: (2024)
by: Zhou, Yifei, et al.
Published: (2024)
Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
by: Zhai, Yuexiang, et al.
Published: (2024)
by: Zhai, Yuexiang, et al.
Published: (2024)
Is Value Learning Really the Main Bottleneck in Offline RL?
by: Park, Seohong, et al.
Published: (2024)
by: Park, Seohong, et al.
Published: (2024)
Training Software Engineering Agents and Verifiers with SWE-Gym
by: Pan, Jiayi, et al.
Published: (2024)
by: Pan, Jiayi, et al.
Published: (2024)
Scaling Test-Time Compute Without Verification or RL is Suboptimal
by: Setlur, Amrith, et al.
Published: (2025)
by: Setlur, Amrith, et al.
Published: (2025)
Vision-Language Models Provide Promptable Representations for Reinforcement Learning
by: Chen, William, et al.
Published: (2024)
by: Chen, William, et al.
Published: (2024)
Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data
by: Zhou, Zhiyuan, et al.
Published: (2024)
by: Zhou, Zhiyuan, et al.
Published: (2024)
Learning Adaptive Parallel Reasoning with Language Models
by: Pan, Jiayi, et al.
Published: (2025)
by: Pan, Jiayi, et al.
Published: (2025)
Horizon Reduction Makes RL Scalable
by: Park, Seohong, et al.
Published: (2025)
by: Park, Seohong, et al.
Published: (2025)
RLAC: Reinforcement Learning with Adversarial Critic for Free-Form Generation Tasks
by: Wu, Mian, et al.
Published: (2025)
by: Wu, Mian, et al.
Published: (2025)
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
by: Zhou, Yifei, et al.
Published: (2025)
by: Zhou, Yifei, et al.
Published: (2025)
Compute-Optimal Scaling for Value-Based Deep RL
by: Fu, Preston, et al.
Published: (2025)
by: Fu, Preston, et al.
Published: (2025)
Evaluating Model Perception of Color Illusions in Photorealistic Scenes
by: Mao, Lingjun, et al.
Published: (2024)
by: Mao, Lingjun, et al.
Published: (2024)
Grounding Language in Multi-Perspective Referential Communication
by: Tang, Zineng, et al.
Published: (2024)
by: Tang, Zineng, et al.
Published: (2024)
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
by: Nakamoto, Mitsuhiko, et al.
Published: (2023)
by: Nakamoto, Mitsuhiko, et al.
Published: (2023)
Value-Based Deep RL Scales Predictably
by: Rybkin, Oleh, et al.
Published: (2025)
by: Rybkin, Oleh, et al.
Published: (2025)
ABBEL: LLM Agents Acting through Belief Bottlenecks Expressed in Language
by: Lidayan, Aly, et al.
Published: (2025)
by: Lidayan, Aly, et al.
Published: (2025)
DigiData: Training and Evaluating General-Purpose Mobile Control Agents
by: Sun, Yuxuan, et al.
Published: (2025)
by: Sun, Yuxuan, et al.
Published: (2025)
Unfamiliar Finetuning Examples Control How Language Models Hallucinate
by: Kang, Katie, et al.
Published: (2024)
by: Kang, Katie, et al.
Published: (2024)
Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents
by: Zhou, Yifei, et al.
Published: (2024)
by: Zhou, Yifei, et al.
Published: (2024)
D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning
by: Rafailov, Rafael, et al.
Published: (2024)
by: Rafailov, Rafael, et al.
Published: (2024)
Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance
by: Nakamoto, Mitsuhiko, et al.
Published: (2024)
by: Nakamoto, Mitsuhiko, et al.
Published: (2024)
WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks
by: Bai, Hao, et al.
Published: (2026)
by: Bai, Hao, et al.
Published: (2026)
Using Language Models to Disambiguate Lexical Choices in Translation
by: Barua, Josh, et al.
Published: (2024)
by: Barua, Josh, et al.
Published: (2024)
Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting
by: Sclar, Melanie, et al.
Published: (2023)
by: Sclar, Melanie, et al.
Published: (2023)
Long Chain-of-Thought Reasoning Across Languages
by: Barua, Josh, et al.
Published: (2025)
by: Barua, Josh, et al.
Published: (2025)
DigiSoup: A Zero-Training Entropy-Driven Agent Beats Trained Reinforcement Learning on Multi-Agent Social Dilemmas
by: Matthew, Fearne
Published: (2026)
by: Matthew, Fearne
Published: (2026)
DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
by: Wang, Taiyi, et al.
Published: (2024)
by: Wang, Taiyi, et al.
Published: (2024)
Visual Pre-Training on Unlabeled Images using Reinforcement Learning
by: Ghosh, Dibya, et al.
Published: (2025)
by: Ghosh, Dibya, et al.
Published: (2025)
Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
by: Farebrother, Jesse, et al.
Published: (2024)
by: Farebrother, Jesse, et al.
Published: (2024)
floq: Training Critics via Flow-Matching for Scaling Compute in Value-Based RL
by: Agrawalla, Bhavya, et al.
Published: (2025)
by: Agrawalla, Bhavya, et al.
Published: (2025)
Reinforcement Learning with Action Chunking
by: Li, Qiyang, et al.
Published: (2025)
by: Li, Qiyang, et al.
Published: (2025)
ViVa: Video-Trained Value Functions for Guiding Online RL from Diverse Data
by: Dashora, Nitish, et al.
Published: (2025)
by: Dashora, Nitish, et al.
Published: (2025)
Training Diffusion Models with Reinforcement Learning
by: Black, Kevin, et al.
Published: (2023)
by: Black, Kevin, et al.
Published: (2023)
Self-Challenging Language Model Agents
by: Zhou, Yifei, et al.
Published: (2025)
by: Zhou, Yifei, et al.
Published: (2025)
SELFI: Autonomous Self-Improvement with Reinforcement Learning for Social Navigation
by: Hirose, Noriaki, et al.
Published: (2024)
by: Hirose, Noriaki, et al.
Published: (2024)
What Do Learning Dynamics Reveal About Generalization in LLM Reasoning?
by: Kang, Katie, et al.
Published: (2024)
by: Kang, Katie, et al.
Published: (2024)
Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations
by: Hong, Joey, et al.
Published: (2024)
by: Hong, Joey, et al.
Published: (2024)
Similar Items
-
Digi-Q: Learning Q-Value Functions for Training Device-Control Agents
by: Bai, Hao, et al.
Published: (2025) -
Autonomous Evaluation and Refinement of Digital Agents
by: Pan, Jiayi, et al.
Published: (2024) -
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
by: Zhou, Yifei, et al.
Published: (2024) -
Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
by: Zhai, Yuexiang, et al.
Published: (2024) -
Is Value Learning Really the Main Bottleneck in Offline RL?
by: Park, Seohong, et al.
Published: (2024)