:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Qian, Jian, Hu, Haichen, Simchi-Levi, David
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2405.17796
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Model-Based Reinforcement Learning with Double Oracle Efficiency in Policy Optimization and Offline Estimation
by: Hu, Haichen, et al.
Published: (2026)

Interleaved Resampling and Refitting: Data and Compute-Efficient Evaluation of Black-Box Predictors
by: Hu, Haichen, et al.
Published: (2026)

Perturbing the Derivative: Doubly Wild Refitting for Model-Free Evaluation of Opaque Machine Learning Predictors
by: Hu, Haichen, et al.
Published: (2025)

Perturbing the Derivative: Wild Refitting for Model-Free Evaluation of Machine Learning Models under Bregman Losses
by: Hu, Haichen, et al.
Published: (2025)

Pre-Trained AI Model Assisted Online Decision-Making under Missing Covariates: A Theoretical Perspective
by: Hu, Haichen, et al.
Published: (2025)

Contextual Online Decision Making with Infinite-Dimensional Functional Regression
by: Hu, Haichen, et al.
Published: (2025)

Constrained Online Decision-Making: A Unified Framework
by: Hu, Haichen, et al.
Published: (2025)

On the Optimal Regret of Locally Private Linear Contextual Bandit
by: Li, Jiachun, et al.
Published: (2024)

Exploration-Exploitation Tradeoff in Universal Lossy Compression
by: Weinberger, Nir, et al.
Published: (2025)

Neural Exploitation and Exploration of Contextual Bandits
by: Ban, Yikun, et al.
Published: (2023)

From Confounding to Learning: Dynamic Service Fee Pricing on Third-Party Platforms
by: Ai, Rui, et al.
Published: (2025)

Learning to Price with Resource Constraints: From Full Information to Machine-Learned Prices
by: Ao, Ruicheng, et al.
Published: (2025)

Adaptive Estimation and Optimal Control in Offline Contextual MDPs without Stationarity
by: Bhattacharyya, Riddhiman, et al.
Published: (2026)

Sobolev Norm Learning Rates for Conditional Mean Embeddings
by: Talwai, Prem, et al.
Published: (2021)

Improving the Estimation of Lifetime Effects in A/B Testing via Treatment Locality
by: Chen, Shuze, et al.
Published: (2024)

Optimal Adaptive Experimental Design for Estimating Treatment Effect
by: Li, Jiachun, et al.
Published: (2024)

Sample and Oracle Efficient Reinforcement Learning for MDPs with Linearly-Realizable Value Functions
by: Mhammedi, Zakaria
Published: (2024)

Navigating the Exploration-Exploitation Tradeoff in Inference-Time Scaling of Diffusion Models
by: Su, Xun, et al.
Published: (2025)

Exploration Implies Data Augmentation: Reachability and Generalisation in Contextual MDPs
by: Weltevrede, Max, et al.
Published: (2024)

Near-Optimal Regret for Policy Optimization in Contextual MDPs with General Offline Function Approximation
by: Levy, Orin, et al.
Published: (2026)

Taming the Monster Every Context: Complexity Measure and Unified Framework for Offline-Oracle Efficient Contextual Bandits
by: Qin, Hao, et al.
Published: (2026)

Prediction-Guided Active Experiments
by: Ao, Ruicheng, et al.
Published: (2024)

Beyond ATE: Multi-Criteria Design for A/B Testing
by: Li, Jiachun, et al.
Published: (2025)

Regret-Oracle Complexity Tradeoffs in Agnostic Online Learning
by: Attias, Idan, et al.
Published: (2026)

A Simple and Optimal Policy Design with Safety against Heavy-Tailed Risk for Stochastic Bandits
by: Simchi-Levi, David, et al.
Published: (2022)

Partial Identification under Missing Data Using Weak Shadow Variables from Pretrained Models
by: Chen, Hongyu, et al.
Published: (2026)

The Value of Information in Resource-Constrained Pricing
by: Ao, Ruicheng, et al.
Published: (2026)

Privacy Preserving Adaptive Experiment Design
by: Li, Jiachun, et al.
Published: (2024)

Regret Distribution in Stochastic Bandits: Optimal Trade-off between Expectation and Tail Risk
by: Simchi-Levi, David, et al.
Published: (2023)

On the Reliability Limits of LLM-Based Multi-Agent Planning
by: Ao, Ruicheng, et al.
Published: (2026)

ORLoopBench: Solver-in-the-Loop Benchmarks for Self-Correction and Behavioral Rationality in Operations Research
by: Ao, Ruicheng, et al.
Published: (2026)

OptiRepair: Closed-Loop Diagnosis and Repair of Supply Chain Optimization Models with LLM Agents
by: Ao, Ruicheng, et al.
Published: (2026)

Is Pure Exploitation Sufficient in Exogenous MDPs with Linear Function Approximation?
by: Liang, Hao, et al.
Published: (2026)

Exploitation Over Exploration: Unmasking the Bias in Linear Bandit Recommender Offline Evaluation
by: Pires, Pedro R., et al.
Published: (2025)

Offline-Online Reinforcement Learning for Linear Mixture MDPs
by: Zhang, Zhongjun, et al.
Published: (2026)

Blind Network Revenue Management and Bandits with Knapsacks under Limited Switches
by: Simchi-Levi, David, et al.
Published: (2019)

Efficient Model-Free Exploration in Low-Rank MDPs
by: Mhammedi, Zakaria, et al.
Published: (2023)

Sample Complexity Characterization for Linear Contextual MDPs
by: Deng, Junze, et al.
Published: (2024)

Eluder-based Regret for Stochastic Contextual MDPs
by: Levy, Orin, et al.
Published: (2022)

Provable Offline Reinforcement Learning for Structured Cyclic MDPs
by: Lee, Kyungbok, et al.
Published: (2026)