Saved in:
| Main Authors: | Mukherjee, Subhojyoti, Hanna, Josiah P., Nowak, Robert |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.02165 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits
by: Mukherjee, Subhojyoti, et al.
Published: (2023)
by: Mukherjee, Subhojyoti, et al.
Published: (2023)
Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning
by: Mukherjee, Subhojyoti, et al.
Published: (2024)
by: Mukherjee, Subhojyoti, et al.
Published: (2024)
On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling
by: Corrado, Nicholas E., et al.
Published: (2023)
by: Corrado, Nicholas E., et al.
Published: (2023)
Off-Policy Evaluation from Logged Human Feedback
by: Bhargava, Aniruddha, et al.
Published: (2024)
by: Bhargava, Aniruddha, et al.
Published: (2024)
An Empirical Study on the Power of Future Prediction in Partially Observable Environments
by: Kwon, Jeongyeol, et al.
Published: (2024)
by: Kwon, Jeongyeol, et al.
Published: (2024)
Centralized Adaptive Sampling for Reliable Co-Training of Independent Multi-Agent Policies
by: Corrado, Nicholas E., et al.
Published: (2025)
by: Corrado, Nicholas E., et al.
Published: (2025)
Adaptive Exploration for Data-Efficient General Value Function Evaluations
by: Jain, Arushi, et al.
Published: (2024)
by: Jain, Arushi, et al.
Published: (2024)
Partial Policy Gradients for RL in LLMs
by: Mathur, Puneet, et al.
Published: (2026)
by: Mathur, Puneet, et al.
Published: (2026)
Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation
by: Zhou, Hongyi, et al.
Published: (2025)
by: Zhou, Hongyi, et al.
Published: (2025)
Efficient and Interpretable Bandit Algorithms
by: Mukherjee, Subhojyoti, et al.
Published: (2023)
by: Mukherjee, Subhojyoti, et al.
Published: (2023)
Understanding when Dynamics-Invariant Data Augmentations Benefit Model-Free Reinforcement Learning Updates
by: Corrado, Nicholas E., et al.
Published: (2023)
by: Corrado, Nicholas E., et al.
Published: (2023)
MDP Planning as Policy Inference
by: Tolpin, David
Published: (2026)
by: Tolpin, David
Published: (2026)
Optimal Design for Human Preference Elicitation
by: Mukherjee, Subhojyoti, et al.
Published: (2024)
by: Mukherjee, Subhojyoti, et al.
Published: (2024)
SaVe-TAG: LLM-based Interpolation for Long-Tailed Text-Attributed Graphs
by: Wang, Leyao, et al.
Published: (2024)
by: Wang, Leyao, et al.
Published: (2024)
Agentic Planning with Reasoning for Image Styling via Offline RL
by: Mukherjee, Subhojyoti, et al.
Published: (2026)
by: Mukherjee, Subhojyoti, et al.
Published: (2026)
Logits are All We Need to Adapt Closed Models
by: Hiranandani, Gaurush, et al.
Published: (2025)
by: Hiranandani, Gaurush, et al.
Published: (2025)
Multi-Objective Alignment of Large Language Models Through Hypervolume Maximization
by: Mukherjee, Subhojyoti, et al.
Published: (2024)
by: Mukherjee, Subhojyoti, et al.
Published: (2024)
Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling
by: Corrado, Nicholas E., et al.
Published: (2026)
by: Corrado, Nicholas E., et al.
Published: (2026)
Guided Data Augmentation for Offline Reinforcement Learning and Imitation Learning
by: Corrado, Nicholas E., et al.
Published: (2023)
by: Corrado, Nicholas E., et al.
Published: (2023)
Sparsely Multimodal Data Fusion
by: Bjorgaard, Josiah
Published: (2024)
by: Bjorgaard, Josiah
Published: (2024)
Optimal Posterior Sampling for Policy Identification in Tabular Markov Decision Processes
by: Kone, Cyrille, et al.
Published: (2026)
by: Kone, Cyrille, et al.
Published: (2026)
Stable Offline Value Function Learning with Bisimulation-based Representations
by: Pavse, Brahma S., et al.
Published: (2024)
by: Pavse, Brahma S., et al.
Published: (2024)
Structured Evaluation of Synthetic Tabular Data
by: Yang, Scott Cheng-Hsin, et al.
Published: (2024)
by: Yang, Scott Cheng-Hsin, et al.
Published: (2024)
Experimental Design for Active Transductive Inference in Large Language Models
by: Mukherjee, Subhojyoti, et al.
Published: (2024)
by: Mukherjee, Subhojyoti, et al.
Published: (2024)
MIDST Challenge at SaTML 2025: Membership Inference over Diffusion-models-based Synthetic Tabular data
by: Shafieinejad, Masoumeh, et al.
Published: (2026)
by: Shafieinejad, Masoumeh, et al.
Published: (2026)
Safe Reinforcement Learning using Action Projection: Safeguard the Policy or the Environment?
by: Markgraf, Hannah, et al.
Published: (2025)
by: Markgraf, Hannah, et al.
Published: (2025)
Learning to Stabilize Online Reinforcement Learning in Unbounded State Spaces
by: Pavse, Brahma S., et al.
Published: (2023)
by: Pavse, Brahma S., et al.
Published: (2023)
VeRO: An Evaluation Harness for Agents to Optimize Agents
by: Ursekar, Varun, et al.
Published: (2026)
by: Ursekar, Varun, et al.
Published: (2026)
ICU-Sepsis: A Benchmark MDP Built from Real Medical Data
by: Choudhary, Kartik, et al.
Published: (2024)
by: Choudhary, Kartik, et al.
Published: (2024)
Sarcasm Detection in Tweets with BERT and GloVe Embeddings
by: Khatri, Akshay, et al.
Published: (2020)
by: Khatri, Akshay, et al.
Published: (2020)
Offline Policy Evaluation for Reinforcement Learning with Adaptively Collected Data
by: Madhow, Sunil, et al.
Published: (2023)
by: Madhow, Sunil, et al.
Published: (2023)
CoVeR: Conformal Calibration for Versatile and Reliable Autoregressive Next-Token Prediction
by: Chen, Yuzhu, et al.
Published: (2025)
by: Chen, Yuzhu, et al.
Published: (2025)
AdvantageFlow: Advantage-Weighted Least Squares for RL in Flow Models
by: Kveton, Branislav, et al.
Published: (2026)
by: Kveton, Branislav, et al.
Published: (2026)
Geometric Re-Analysis of Classical MDP Solving Algorithms
by: Mustafin, Arsenii, et al.
Published: (2025)
by: Mustafin, Arsenii, et al.
Published: (2025)
Memisis: Orchestrating and Evaluating Synthetic Data for Tabular Health Datasets
by: Nagesh, Nitish, et al.
Published: (2026)
by: Nagesh, Nitish, et al.
Published: (2026)
Evaluating Generative Models for Tabular Data: Novel Metrics and Benchmarking
by: Herurkar, Dayananda, et al.
Published: (2025)
by: Herurkar, Dayananda, et al.
Published: (2025)
A Systematic Evaluation of Generative Models on Tabular Transportation Data
by: Wang, Chengen, et al.
Published: (2025)
by: Wang, Chengen, et al.
Published: (2025)
FEST: A Unified Framework for Evaluating Synthetic Tabular Data
by: Niu, Weijie, et al.
Published: (2025)
by: Niu, Weijie, et al.
Published: (2025)
MDP Geometry, Normalization and Reward Balancing Solvers
by: Mustafin, Arsenii, et al.
Published: (2024)
by: Mustafin, Arsenii, et al.
Published: (2024)
Learning to Reason in LLMs by Expectation Maximization
by: Lee, Junghyun, et al.
Published: (2025)
by: Lee, Junghyun, et al.
Published: (2025)
Similar Items
-
SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits
by: Mukherjee, Subhojyoti, et al.
Published: (2023) -
Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning
by: Mukherjee, Subhojyoti, et al.
Published: (2024) -
On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling
by: Corrado, Nicholas E., et al.
Published: (2023) -
Off-Policy Evaluation from Logged Human Feedback
by: Bhargava, Aniruddha, et al.
Published: (2024) -
An Empirical Study on the Power of Future Prediction in Partially Observable Environments
by: Kwon, Jeongyeol, et al.
Published: (2024)