:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Song, Yuda, Wu, Lili, Foster, Dylan J., Krishnamurthy, Akshay
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2405.19269
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Reinforcement Learning under Latent Dynamics: Toward Statistical and Algorithmic Modularity
by: Amortila, Philip, et al.
Published: (2024)

Scalable Online Exploration via Coverability
by: Amortila, Philip, et al.
Published: (2024)

Representation-Based Exploration for Language Models: From Test-Time to Post-Training
by: Tuyls, Jens, et al.
Published: (2025)

Hybrid Reinforcement Learning from Offline Observation Alone
by: Song, Yuda, et al.
Published: (2024)

Computational-Statistical Tradeoffs at the Next-Token Prediction Barrier: Autoregressive and Imitation Learning under Misspecification
by: Rohatgi, Dhruv, et al.
Published: (2025)

Learning to Reason with Curriculum I: Provable Benefits of Autocurriculum
by: Rajaraman, Nived, et al.
Published: (2026)

Can large language models explore in-context?
by: Krishnamurthy, Akshay, et al.
Published: (2024)

The Role of Environment Access in Agnostic Reinforcement Learning
by: Krishnamurthy, Akshay, et al.
Published: (2025)

Is Best-of-N the Best of Them? Coverage, Scaling, and Optimality in Inference-Time Alignment
by: Huang, Audrey, et al.
Published: (2025)

Mitigating Covariate Shift in Misspecified Regression with Applications to Reinforcement Learning
by: Amortila, Philip, et al.
Published: (2024)

Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF
by: Xie, Tengyang, et al.
Published: (2024)

Necessary and Sufficient Oracles: Toward a Computational Taxonomy For Reinforcement Learning
by: Rohatgi, Dhruv, et al.
Published: (2025)

Expanding the Capabilities of Reinforcement Learning via Text Feedback
by: Song, Yuda, et al.
Published: (2026)

The Power of Resets in Online Reinforcement Learning
by: Mhammedi, Zakaria, et al.
Published: (2024)

To Distill or Decide? Understanding the Algorithmic Trade-off in Partially Observable Reinforcement Learning
by: Song, Yuda, et al.
Published: (2025)

Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization
by: Huang, Audrey, et al.
Published: (2024)

Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics
by: Wu, Runzhe, et al.
Published: (2024)

Harnessing Density Ratios for Online Reinforcement Learning
by: Amortila, Philip, et al.
Published: (2024)

Self-Improvement in Language Models: The Sharpening Mechanism
by: Huang, Audrey, et al.
Published: (2024)

Reject, Resample, Repeat: Understanding Parallel Reasoning in Language Model Inference
by: Golowich, Noah, et al.
Published: (2026)

Finite-Sample Bounds for Adaptive Inverse Reinforcement Learning using Passive Langevin Dynamics
by: Snow, Luke, et al.
Published: (2023)

The Coverage Principle: How Pre-Training Enables Post-Training
by: Chen, Fan, et al.
Published: (2025)

Is a Good Foundation Necessary for Efficient Reinforcement Learning? The Computational Role of the Base Model in Exploration
by: Foster, Dylan J., et al.
Published: (2025)

Next-Latent Prediction Transformers Learn Compact World Models
by: Teoh, Jayden, et al.
Published: (2025)

Inverse Reinforcement Learning using Revealed Preferences and Passive Stochastic Optimization
by: Krishnamurthy, Vikram
Published: (2025)

Malliavin Calculus for Counterfactual Gradient Estimation in Adaptive Inverse Reinforcement Learning
by: Krishnamurthy, Vikram, et al.
Published: (2026)

Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models
by: Song, Yuda, et al.
Published: (2024)

Learning Hidden Markov Models Using Conditional Samples
by: Kakade, Sham M., et al.
Published: (2023)

Maximum Likelihood Reinforcement Learning
by: Tajwar, Fahim, et al.
Published: (2026)

Structured Reinforcement Learning for Incentivized Stochastic Covert Optimization
by: Jain, Adit, et al.
Published: (2024)

Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning
by: Foster, Dylan J., et al.
Published: (2024)

A Unifying View of Coverage in Linear Off-Policy Evaluation
by: Amortila, Philip, et al.
Published: (2026)

Distributionally Robust Inverse Reinforcement Learning for Identifying Multi-Agent Coordinated Sensing
by: Snow, Luke, et al.
Published: (2024)

Accelerating Unbiased LLM Evaluation via Synthetic Feedback
by: Zhou, Zhaoyi, et al.
Published: (2025)

Outcome-based Exploration for LLM Reasoning
by: Song, Yuda, et al.
Published: (2025)

Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization
by: Foster, Jack, et al.
Published: (2023)

Simultaneous Latent State Estimation and Latent Linear Dynamics Discovery from Image Observations
by: Kostin, Nikita
Published: (2025)

Latent Spherical Flow Policy for Reinforcement Learning with Combinatorial Actions
by: Kong, Lingkai, et al.
Published: (2026)

Demonstration-Guided Continual Reinforcement Learning in Dynamic Environments
by: Yang, Xue, et al.
Published: (2025)

Identify Then Project: Contrastive Learning of Latent Dynamics from Partial Observations with Port-Hamiltonian Structure
by: Li, Peilun, et al.
Published: (2026)