Guardado en:
| Autores principales: | He, Longxiang, Shen, Li, Wang, Xueqian |
|---|---|
| Formato: | Preprint |
| Publicado: |
2024
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2405.18187 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
DiffCPS: Diffusion Model based Constrained Policy Search for Offline Reinforcement Learning
por: He, Longxiang, et al.
Publicado: (2023)
por: He, Longxiang, et al.
Publicado: (2023)
IQL-TD-MPC: Implicit Q-Learning for Hierarchical Model Predictive Control
por: Chitnis, Rohan, et al.
Publicado: (2023)
por: Chitnis, Rohan, et al.
Publicado: (2023)
Robust Policy Expansion for Offline-to-Online RL under Diverse Data Corruption
por: He, Longxiang, et al.
Publicado: (2025)
por: He, Longxiang, et al.
Publicado: (2025)
FOSP: Fine-tuning Offline Safe Policy through World Models
por: Cao, Chenyang, et al.
Publicado: (2024)
por: Cao, Chenyang, et al.
Publicado: (2024)
Mitigating the Safety Alignment Tax with Null-Space Constrained Policy Optimization
por: Niu, Yifan, et al.
Publicado: (2025)
por: Niu, Yifan, et al.
Publicado: (2025)
EP-GRPO: Entropy-Progress Aligned Group Relative Policy Optimization with Implicit Process Guidance
por: Yu, Song, et al.
Publicado: (2026)
por: Yu, Song, et al.
Publicado: (2026)
Learning to Align, Aligning to Learn: A Unified Approach for Self-Optimized Alignment
por: Wang, Haowen, et al.
Publicado: (2025)
por: Wang, Haowen, et al.
Publicado: (2025)
Stepwise Alignment for Constrained Language Model Policy Optimization
por: Wachi, Akifumi, et al.
Publicado: (2024)
por: Wachi, Akifumi, et al.
Publicado: (2024)
Aligning Flow Map Policies with Optimal Q-Guidance
por: Ziakas, Christos, et al.
Publicado: (2026)
por: Ziakas, Christos, et al.
Publicado: (2026)
Implicit Q-Learning and SARSA: Liberating Policy Control from Step-Size Calibration
por: Kim, Hwanwoo, et al.
Publicado: (2026)
por: Kim, Hwanwoo, et al.
Publicado: (2026)
GO4Align: Group Optimization for Multi-Task Alignment
por: Shen, Jiayi, et al.
Publicado: (2024)
por: Shen, Jiayi, et al.
Publicado: (2024)
Detector-Evasive LLM Paraphrasing via Constrained Policy Optimization
por: Wang, Mingyi, et al.
Publicado: (2026)
por: Wang, Mingyi, et al.
Publicado: (2026)
Proactive Constrained Policy Optimization with Preemptive Penalty
por: Yang, Ning, et al.
Publicado: (2025)
por: Yang, Ning, et al.
Publicado: (2025)
Double Duality: Variational Primal-Dual Policy Optimization for Constrained Reinforcement Learning
por: Li, Zihao, et al.
Publicado: (2024)
por: Li, Zihao, et al.
Publicado: (2024)
Constrained Policy Optimization with Explicit Behavior Density for Offline Reinforcement Learning
por: Zhang, Jing, et al.
Publicado: (2023)
por: Zhang, Jing, et al.
Publicado: (2023)
Implicit Turn-Wise Policy Optimization for Proactive User-LLM Interaction
por: Wang, Haoyu, et al.
Publicado: (2026)
por: Wang, Haoyu, et al.
Publicado: (2026)
Skip-Connected Policy Optimization for Implicit Advantage
por: Teng, Fengwei, et al.
Publicado: (2026)
por: Teng, Fengwei, et al.
Publicado: (2026)
Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization
por: Xie, Shuo, et al.
Publicado: (2024)
por: Xie, Shuo, et al.
Publicado: (2024)
Learning Generalizable Visuomotor Policy through Dynamics-Alignment
por: Lee, Dohyeok, et al.
Publicado: (2025)
por: Lee, Dohyeok, et al.
Publicado: (2025)
Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization
por: Ding, Shutong, et al.
Publicado: (2024)
por: Ding, Shutong, et al.
Publicado: (2024)
Segment-Aligned Policy Optimization for Multi-Modal Reasoning
por: Gao, Lei, et al.
Publicado: (2026)
por: Gao, Lei, et al.
Publicado: (2026)
Ensuring Semantics in Weights of Implicit Neural Representations through the Implicit Function Theorem
por: Qiu, Tianming, et al.
Publicado: (2026)
por: Qiu, Tianming, et al.
Publicado: (2026)
Implicitly Aligning Humans and Autonomous Agents through Shared Task Abstractions
por: Aroca-Ouellette, Stéphane, et al.
Publicado: (2025)
por: Aroca-Ouellette, Stéphane, et al.
Publicado: (2025)
Aligning the True Semantics: Constrained Decoupling and Distribution Sampling for Cross-Modal Alignment
por: Ma, Xiang, et al.
Publicado: (2026)
por: Ma, Xiang, et al.
Publicado: (2026)
State-wise Constrained Policy Optimization
por: Zhao, Weiye, et al.
Publicado: (2023)
por: Zhao, Weiye, et al.
Publicado: (2023)
Autoregressive Policy Optimization for Constrained Allocation Tasks
por: Winkel, David, et al.
Publicado: (2024)
por: Winkel, David, et al.
Publicado: (2024)
Offline Goal-Conditioned Reinforcement Learning for Safety-Critical Tasks with Recovery Policy
por: Cao, Chenyang, et al.
Publicado: (2024)
por: Cao, Chenyang, et al.
Publicado: (2024)
RePO: Bridging On-Policy Learning and Off-Policy Knowledge through Rephrasing Policy Optimization
por: Xia, Linxuan, et al.
Publicado: (2026)
por: Xia, Linxuan, et al.
Publicado: (2026)
e-COP : Episodic Constrained Optimization of Policies
por: Agnihotri, Akhil, et al.
Publicado: (2024)
por: Agnihotri, Akhil, et al.
Publicado: (2024)
Adaptive Neighborhood-Constrained Q Learning for Offline Reinforcement Learning
por: Mao, Yixiu, et al.
Publicado: (2025)
por: Mao, Yixiu, et al.
Publicado: (2025)
Adaptive Test-Time Compute Allocation for Reasoning LLMs via Constrained Policy Optimization
por: Zhai, Zhiyuan, et al.
Publicado: (2026)
por: Zhai, Zhiyuan, et al.
Publicado: (2026)
Incentivizing Safer Actions in Policy Optimization for Constrained Reinforcement Learning
por: Hazra, Somnath, et al.
Publicado: (2025)
por: Hazra, Somnath, et al.
Publicado: (2025)
Improving Deep Learning Optimization through Constrained Parameter Regularization
por: Franke, Jörg K. H., et al.
Publicado: (2023)
por: Franke, Jörg K. H., et al.
Publicado: (2023)
DFWLayer: Differentiable Frank-Wolfe Optimization Layer
por: Liu, Zixuan, et al.
Publicado: (2023)
por: Liu, Zixuan, et al.
Publicado: (2023)
Forward KL Regularized Preference Optimization for Aligning Diffusion Policies
por: Shan, Zhao, et al.
Publicado: (2024)
por: Shan, Zhao, et al.
Publicado: (2024)
Implicit Regularization in Feedback Alignment Learning Mechanisms for Neural Networks
por: Robertson, Zachary, et al.
Publicado: (2023)
por: Robertson, Zachary, et al.
Publicado: (2023)
Implicit Diffusion: Efficient Optimization through Stochastic Sampling
por: Marion, Pierre, et al.
Publicado: (2024)
por: Marion, Pierre, et al.
Publicado: (2024)
Constrained Policy Optimization with Cantelli-Bounded Value-at-Risk
por: Tangri, Rohan, et al.
Publicado: (2026)
por: Tangri, Rohan, et al.
Publicado: (2026)
Constrained Group Relative Policy Optimization
por: Girgis, Roger, et al.
Publicado: (2026)
por: Girgis, Roger, et al.
Publicado: (2026)
ISEP: Implicit Support Expansion for Offline Reinforcement Learning via Stochastic Policy Optimization
por: Chen, Yifei, et al.
Publicado: (2026)
por: Chen, Yifei, et al.
Publicado: (2026)
Ejemplares similares
-
DiffCPS: Diffusion Model based Constrained Policy Search for Offline Reinforcement Learning
por: He, Longxiang, et al.
Publicado: (2023) -
IQL-TD-MPC: Implicit Q-Learning for Hierarchical Model Predictive Control
por: Chitnis, Rohan, et al.
Publicado: (2023) -
Robust Policy Expansion for Offline-to-Online RL under Diverse Data Corruption
por: He, Longxiang, et al.
Publicado: (2025) -
FOSP: Fine-tuning Offline Safe Policy through World Models
por: Cao, Chenyang, et al.
Publicado: (2024) -
Mitigating the Safety Alignment Tax with Null-Space Constrained Policy Optimization
por: Niu, Yifan, et al.
Publicado: (2025)