Saved in:
| Main Authors: | Song, Yuda, Wu, Lili, Foster, Dylan J., Krishnamurthy, Akshay |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.19269 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Reinforcement Learning under Latent Dynamics: Toward Statistical and Algorithmic Modularity
by: Amortila, Philip, et al.
Published: (2024)
by: Amortila, Philip, et al.
Published: (2024)
Scalable Online Exploration via Coverability
by: Amortila, Philip, et al.
Published: (2024)
by: Amortila, Philip, et al.
Published: (2024)
Representation-Based Exploration for Language Models: From Test-Time to Post-Training
by: Tuyls, Jens, et al.
Published: (2025)
by: Tuyls, Jens, et al.
Published: (2025)
Hybrid Reinforcement Learning from Offline Observation Alone
by: Song, Yuda, et al.
Published: (2024)
by: Song, Yuda, et al.
Published: (2024)
Computational-Statistical Tradeoffs at the Next-Token Prediction Barrier: Autoregressive and Imitation Learning under Misspecification
by: Rohatgi, Dhruv, et al.
Published: (2025)
by: Rohatgi, Dhruv, et al.
Published: (2025)
Learning to Reason with Curriculum I: Provable Benefits of Autocurriculum
by: Rajaraman, Nived, et al.
Published: (2026)
by: Rajaraman, Nived, et al.
Published: (2026)
Can large language models explore in-context?
by: Krishnamurthy, Akshay, et al.
Published: (2024)
by: Krishnamurthy, Akshay, et al.
Published: (2024)
The Role of Environment Access in Agnostic Reinforcement Learning
by: Krishnamurthy, Akshay, et al.
Published: (2025)
by: Krishnamurthy, Akshay, et al.
Published: (2025)
Is Best-of-N the Best of Them? Coverage, Scaling, and Optimality in Inference-Time Alignment
by: Huang, Audrey, et al.
Published: (2025)
by: Huang, Audrey, et al.
Published: (2025)
Mitigating Covariate Shift in Misspecified Regression with Applications to Reinforcement Learning
by: Amortila, Philip, et al.
Published: (2024)
by: Amortila, Philip, et al.
Published: (2024)
Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF
by: Xie, Tengyang, et al.
Published: (2024)
by: Xie, Tengyang, et al.
Published: (2024)
Necessary and Sufficient Oracles: Toward a Computational Taxonomy For Reinforcement Learning
by: Rohatgi, Dhruv, et al.
Published: (2025)
by: Rohatgi, Dhruv, et al.
Published: (2025)
Expanding the Capabilities of Reinforcement Learning via Text Feedback
by: Song, Yuda, et al.
Published: (2026)
by: Song, Yuda, et al.
Published: (2026)
The Power of Resets in Online Reinforcement Learning
by: Mhammedi, Zakaria, et al.
Published: (2024)
by: Mhammedi, Zakaria, et al.
Published: (2024)
To Distill or Decide? Understanding the Algorithmic Trade-off in Partially Observable Reinforcement Learning
by: Song, Yuda, et al.
Published: (2025)
by: Song, Yuda, et al.
Published: (2025)
Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization
by: Huang, Audrey, et al.
Published: (2024)
by: Huang, Audrey, et al.
Published: (2024)
Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics
by: Wu, Runzhe, et al.
Published: (2024)
by: Wu, Runzhe, et al.
Published: (2024)
Harnessing Density Ratios for Online Reinforcement Learning
by: Amortila, Philip, et al.
Published: (2024)
by: Amortila, Philip, et al.
Published: (2024)
Self-Improvement in Language Models: The Sharpening Mechanism
by: Huang, Audrey, et al.
Published: (2024)
by: Huang, Audrey, et al.
Published: (2024)
Reject, Resample, Repeat: Understanding Parallel Reasoning in Language Model Inference
by: Golowich, Noah, et al.
Published: (2026)
by: Golowich, Noah, et al.
Published: (2026)
Finite-Sample Bounds for Adaptive Inverse Reinforcement Learning using Passive Langevin Dynamics
by: Snow, Luke, et al.
Published: (2023)
by: Snow, Luke, et al.
Published: (2023)
The Coverage Principle: How Pre-Training Enables Post-Training
by: Chen, Fan, et al.
Published: (2025)
by: Chen, Fan, et al.
Published: (2025)
Is a Good Foundation Necessary for Efficient Reinforcement Learning? The Computational Role of the Base Model in Exploration
by: Foster, Dylan J., et al.
Published: (2025)
by: Foster, Dylan J., et al.
Published: (2025)
Next-Latent Prediction Transformers Learn Compact World Models
by: Teoh, Jayden, et al.
Published: (2025)
by: Teoh, Jayden, et al.
Published: (2025)
Inverse Reinforcement Learning using Revealed Preferences and Passive Stochastic Optimization
by: Krishnamurthy, Vikram
Published: (2025)
by: Krishnamurthy, Vikram
Published: (2025)
Malliavin Calculus for Counterfactual Gradient Estimation in Adaptive Inverse Reinforcement Learning
by: Krishnamurthy, Vikram, et al.
Published: (2026)
by: Krishnamurthy, Vikram, et al.
Published: (2026)
Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models
by: Song, Yuda, et al.
Published: (2024)
by: Song, Yuda, et al.
Published: (2024)
Learning Hidden Markov Models Using Conditional Samples
by: Kakade, Sham M., et al.
Published: (2023)
by: Kakade, Sham M., et al.
Published: (2023)
Maximum Likelihood Reinforcement Learning
by: Tajwar, Fahim, et al.
Published: (2026)
by: Tajwar, Fahim, et al.
Published: (2026)
Structured Reinforcement Learning for Incentivized Stochastic Covert Optimization
by: Jain, Adit, et al.
Published: (2024)
by: Jain, Adit, et al.
Published: (2024)
Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning
by: Foster, Dylan J., et al.
Published: (2024)
by: Foster, Dylan J., et al.
Published: (2024)
A Unifying View of Coverage in Linear Off-Policy Evaluation
by: Amortila, Philip, et al.
Published: (2026)
by: Amortila, Philip, et al.
Published: (2026)
Distributionally Robust Inverse Reinforcement Learning for Identifying Multi-Agent Coordinated Sensing
by: Snow, Luke, et al.
Published: (2024)
by: Snow, Luke, et al.
Published: (2024)
Accelerating Unbiased LLM Evaluation via Synthetic Feedback
by: Zhou, Zhaoyi, et al.
Published: (2025)
by: Zhou, Zhaoyi, et al.
Published: (2025)
Outcome-based Exploration for LLM Reasoning
by: Song, Yuda, et al.
Published: (2025)
by: Song, Yuda, et al.
Published: (2025)
Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization
by: Foster, Jack, et al.
Published: (2023)
by: Foster, Jack, et al.
Published: (2023)
Simultaneous Latent State Estimation and Latent Linear Dynamics Discovery from Image Observations
by: Kostin, Nikita
Published: (2025)
by: Kostin, Nikita
Published: (2025)
Latent Spherical Flow Policy for Reinforcement Learning with Combinatorial Actions
by: Kong, Lingkai, et al.
Published: (2026)
by: Kong, Lingkai, et al.
Published: (2026)
Demonstration-Guided Continual Reinforcement Learning in Dynamic Environments
by: Yang, Xue, et al.
Published: (2025)
by: Yang, Xue, et al.
Published: (2025)
Identify Then Project: Contrastive Learning of Latent Dynamics from Partial Observations with Port-Hamiltonian Structure
by: Li, Peilun, et al.
Published: (2026)
by: Li, Peilun, et al.
Published: (2026)
Similar Items
-
Reinforcement Learning under Latent Dynamics: Toward Statistical and Algorithmic Modularity
by: Amortila, Philip, et al.
Published: (2024) -
Scalable Online Exploration via Coverability
by: Amortila, Philip, et al.
Published: (2024) -
Representation-Based Exploration for Language Models: From Test-Time to Post-Training
by: Tuyls, Jens, et al.
Published: (2025) -
Hybrid Reinforcement Learning from Offline Observation Alone
by: Song, Yuda, et al.
Published: (2024) -
Computational-Statistical Tradeoffs at the Next-Token Prediction Barrier: Autoregressive and Imitation Learning under Misspecification
by: Rohatgi, Dhruv, et al.
Published: (2025)