:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Mukherjee, Subhojyoti, Hanna, Josiah P., Nowak, Robert
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2406.02165
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits
by: Mukherjee, Subhojyoti, et al.
Published: (2023)

Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning
by: Mukherjee, Subhojyoti, et al.
Published: (2024)

On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling
by: Corrado, Nicholas E., et al.
Published: (2023)

Off-Policy Evaluation from Logged Human Feedback
by: Bhargava, Aniruddha, et al.
Published: (2024)

An Empirical Study on the Power of Future Prediction in Partially Observable Environments
by: Kwon, Jeongyeol, et al.
Published: (2024)

Centralized Adaptive Sampling for Reliable Co-Training of Independent Multi-Agent Policies
by: Corrado, Nicholas E., et al.
Published: (2025)

Adaptive Exploration for Data-Efficient General Value Function Evaluations
by: Jain, Arushi, et al.
Published: (2024)

Partial Policy Gradients for RL in LLMs
by: Mathur, Puneet, et al.
Published: (2026)

Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation
by: Zhou, Hongyi, et al.
Published: (2025)

Efficient and Interpretable Bandit Algorithms
by: Mukherjee, Subhojyoti, et al.
Published: (2023)

Understanding when Dynamics-Invariant Data Augmentations Benefit Model-Free Reinforcement Learning Updates
by: Corrado, Nicholas E., et al.
Published: (2023)

MDP Planning as Policy Inference
by: Tolpin, David
Published: (2026)

Optimal Design for Human Preference Elicitation
by: Mukherjee, Subhojyoti, et al.
Published: (2024)

SaVe-TAG: LLM-based Interpolation for Long-Tailed Text-Attributed Graphs
by: Wang, Leyao, et al.
Published: (2024)

Agentic Planning with Reasoning for Image Styling via Offline RL
by: Mukherjee, Subhojyoti, et al.
Published: (2026)

Logits are All We Need to Adapt Closed Models
by: Hiranandani, Gaurush, et al.
Published: (2025)

Multi-Objective Alignment of Large Language Models Through Hypervolume Maximization
by: Mukherjee, Subhojyoti, et al.
Published: (2024)

Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling
by: Corrado, Nicholas E., et al.
Published: (2026)

Guided Data Augmentation for Offline Reinforcement Learning and Imitation Learning
by: Corrado, Nicholas E., et al.
Published: (2023)

Sparsely Multimodal Data Fusion
by: Bjorgaard, Josiah
Published: (2024)

Optimal Posterior Sampling for Policy Identification in Tabular Markov Decision Processes
by: Kone, Cyrille, et al.
Published: (2026)

Stable Offline Value Function Learning with Bisimulation-based Representations
by: Pavse, Brahma S., et al.
Published: (2024)

Structured Evaluation of Synthetic Tabular Data
by: Yang, Scott Cheng-Hsin, et al.
Published: (2024)

Experimental Design for Active Transductive Inference in Large Language Models
by: Mukherjee, Subhojyoti, et al.
Published: (2024)

MIDST Challenge at SaTML 2025: Membership Inference over Diffusion-models-based Synthetic Tabular data
by: Shafieinejad, Masoumeh, et al.
Published: (2026)

Safe Reinforcement Learning using Action Projection: Safeguard the Policy or the Environment?
by: Markgraf, Hannah, et al.
Published: (2025)

Learning to Stabilize Online Reinforcement Learning in Unbounded State Spaces
by: Pavse, Brahma S., et al.
Published: (2023)

VeRO: An Evaluation Harness for Agents to Optimize Agents
by: Ursekar, Varun, et al.
Published: (2026)

ICU-Sepsis: A Benchmark MDP Built from Real Medical Data
by: Choudhary, Kartik, et al.
Published: (2024)

Sarcasm Detection in Tweets with BERT and GloVe Embeddings
by: Khatri, Akshay, et al.
Published: (2020)

Offline Policy Evaluation for Reinforcement Learning with Adaptively Collected Data
by: Madhow, Sunil, et al.
Published: (2023)

CoVeR: Conformal Calibration for Versatile and Reliable Autoregressive Next-Token Prediction
by: Chen, Yuzhu, et al.
Published: (2025)

AdvantageFlow: Advantage-Weighted Least Squares for RL in Flow Models
by: Kveton, Branislav, et al.
Published: (2026)

Geometric Re-Analysis of Classical MDP Solving Algorithms
by: Mustafin, Arsenii, et al.
Published: (2025)

Memisis: Orchestrating and Evaluating Synthetic Data for Tabular Health Datasets
by: Nagesh, Nitish, et al.
Published: (2026)

Evaluating Generative Models for Tabular Data: Novel Metrics and Benchmarking
by: Herurkar, Dayananda, et al.
Published: (2025)

A Systematic Evaluation of Generative Models on Tabular Transportation Data
by: Wang, Chengen, et al.
Published: (2025)

FEST: A Unified Framework for Evaluating Synthetic Tabular Data
by: Niu, Weijie, et al.
Published: (2025)

MDP Geometry, Normalization and Reward Balancing Solvers
by: Mustafin, Arsenii, et al.
Published: (2024)

Learning to Reason in LLMs by Expectation Maximization
by: Lee, Junghyun, et al.
Published: (2025)