:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Landers, Matthew, Killian, Taylor W., Hartvigsen, Thomas, Doryab, Afsaneh
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2601.04441
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SAINT: Attention-Based Policies for Discrete Combinatorial Action Spaces
by: Landers, Matthew, et al.
Published: (2025)

BraVE: Offline Reinforcement Learning for Discrete Combinatorial Action Spaces
by: Landers, Matthew, et al.
Published: (2024)

Coordination Matters: Evaluation of Cooperative Multi-Agent Reinforcement Learning
by: Cardei, Maria Ana, et al.
Published: (2026)

Factorized Deep Q-Network for Cooperative Multi-Agent Reinforcement Learning in Victim Tagging
by: Cardei, Maria Ana, et al.
Published: (2025)

Action-Free Offline-to-Online RL via Discretised State Policies
by: Neggatu, Natinael Solomon, et al.
Published: (2026)

Improving Offline RL by Blending Heuristics
by: Geng, Sinong, et al.
Published: (2023)

Fat-to-Thin Policy Optimization: Offline RL with Sparse Policies
by: Zhu, Lingwei, et al.
Published: (2025)

Dynamic Neighborhood Construction for Structured Large Discrete Action Spaces
by: Akkerman, Fabian, et al.
Published: (2023)

Scalable Offline Model-Based RL with Action Chunks
by: Park, Kwanyoung, et al.
Published: (2025)

Flow Matching for Offline Reinforcement Learning with Discrete Actions
by: Khan, Fairoz Nower, et al.
Published: (2026)

Constrained Discrete Diffusion
by: Cardei, Michael, et al.
Published: (2025)

Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone
by: Mark, Max Sobol, et al.
Published: (2024)

Optimal Single-Policy Sample Complexity and Transient Coverage for Average-Reward Offline RL
by: Zurek, Matthew, et al.
Published: (2025)

Towards Improving Reward Design in RL: A Reward Alignment Metric for RL Practitioners
by: Muslimani, Calarina, et al.
Published: (2025)

HIQL: Offline Goal-Conditioned RL with Latent States as Actions
by: Park, Seohong, et al.
Published: (2023)

Inference Time Policy Optimization for Offline RL with Differentiable World Models
by: Deb, Rohan, et al.
Published: (2026)

An Investigation of Offline Reinforcement Learning in Factorisable Action Spaces
by: Beeson, Alex, et al.
Published: (2024)

DEAS: DEtached value learning with Action Sequence for Scalable Offline RL
by: Kim, Changyeon, et al.
Published: (2025)

GEM: Guided Expectation-Maximization for Behavior-Normalized Candidate Action Selection in Offline RL
by: Wang, Haoyu, et al.
Published: (2026)

Modular Diffusion Policy Training: Decoupling and Recombining Guidance and Diffusion for Offline RL
by: Chen, Zhaoyang, et al.
Published: (2025)

Offline RL for Adaptive Policy Retrieval in Prior Authorization
by: Sharifullin, Ruslan, et al.
Published: (2026)

Actor-Accelerated Policy Dual Averaging for Reinforcement Learning in Continuous Action Spaces
by: Gao, Ji, et al.
Published: (2026)

Solving Continual Offline RL through Selective Weights Activation on Aligned Spaces
by: Hu, Jifeng, et al.
Published: (2024)

Discretizing Continuous Action Space with Unimodal Probability Distributions for On-Policy Reinforcement Learning
by: Zhu, Yuanyang, et al.
Published: (2024)

Stochastic Q-learning for Large Discrete Action Spaces
by: Fourati, Fares, et al.
Published: (2024)

A Hardware-Aware, Per-Layer Methodology for Post-Training Quantization of Large Language Models
by: Killian, Earl
Published: (2026)

Dataset Clustering for Improved Offline Policy Learning
by: Wang, Qiang, et al.
Published: (2024)

Streetwise Agents: Empowering Offline RL Policies to Outsmart Exogenous Stochastic Disturbances in RTC
by: Soni, Aditya, et al.
Published: (2024)

An Empirical Risk Minimization Approach for Offline Inverse RL and Dynamic Discrete Choice Model
by: Kang, Enoch H., et al.
Published: (2025)

Rewarded Region Replay (R3) for Policy Learning with Discrete Action Space
by: Li, Bangzheng, et al.
Published: (2024)

Chain-of-Goals Hierarchical Policy for Long-Horizon Offline Goal-Conditioned RL
by: Choi, Jinwoo, et al.
Published: (2026)

Robust Policy Expansion for Offline-to-Online RL under Diverse Data Corruption
by: He, Longxiang, et al.
Published: (2025)

Budgeting Counterfactual for Offline RL
by: Liu, Yao, et al.
Published: (2023)

Accelerating Energy-Efficient Federated Learning in Cell-Free Networks with Adaptive Quantization
by: Mahmoudi, Afsaneh, et al.
Published: (2024)

SPAARS: Safer RL Policy Alignment through Abstract Exploration and Refined Exploitation of Action Space
by: K, Swaminathan S, et al.
Published: (2026)

Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning
by: Zhang, Tianle, et al.
Published: (2024)

Policy Optimization in Hybrid Discrete-Continuous Action Spaces via Mixed Gradients
by: Alvo, Matias, et al.
Published: (2026)

Accelerating Diffusion Planners in Offline RL via Reward-Aware Consistency Trajectory Distillation
by: Duan, Xintong, et al.
Published: (2025)

Exploiting Structure in Offline Multi-Agent RL: The Benefits of Low Interaction Rank
by: Zhan, Wenhao, et al.
Published: (2024)

Efficient Online RL Fine Tuning with Offline Pre-trained Policy Only
by: Xiao, Wei, et al.
Published: (2025)