:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Dwyer, Madeleine, Sobey, Adam, Chapman, Adriane
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Machine Learning Artificial Intelligence
Online-Zugang:	https://arxiv.org/abs/2509.21282
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning
von: Li, Yuan, et al.
Veröffentlicht: (2026)

From $\log π$ to $π$: Taming Divergence in Soft Clipping via Bilateral Decoupled Decay of Probability Gradient Weight
von: Fu, Xiaoliang, et al.
Veröffentlicht: (2026)

Graph-attention-based Casual Discovery with Trust Region-navigated Clipping Policy Optimization
von: Liu, Shixuan, et al.
Veröffentlicht: (2024)

Clip Your Sequences Fairly: Enforcing Length Fairness for Sequence-Level RL
von: Mao, Hanyi, et al.
Veröffentlicht: (2025)

SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling
von: Gaven, Loris, et al.
Veröffentlicht: (2024)

Soft Deterministic Policy Gradient with Gaussian Smoothing
von: Na, Hyunjun, et al.
Veröffentlicht: (2026)

It's About Time: Temporal References in Emergent Communication
von: Lipinski, Olaf, et al.
Veröffentlicht: (2023)

PPO-Clip Attains Global Optimality: Towards Deeper Understandings of Clipping
von: Huang, Nai-Chieh, et al.
Veröffentlicht: (2023)

What Can You Do When You Have Zero Rewards During RL?
von: Prakash, Jatin, et al.
Veröffentlicht: (2025)

Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning
von: Su, Xuerui, et al.
Veröffentlicht: (2025)

Trust Regions for Explanations via Black-Box Probabilistic Certification
von: Dhurandhar, Amit, et al.
Veröffentlicht: (2024)

Smooth Gate Functions for Soft Advantage Policy Optimization
von: Denisov, Egor, et al.
Veröffentlicht: (2026)

Trust Regions Sell, But Who's Buying? Overlap Geometry as an Alternative Trust Region for Policy Optimization
von: Trivedi, Gaurish, et al.
Veröffentlicht: (2026)

Trust Region Masking for Long-Horizon LLM Reinforcement Learning
von: Li, Yingru, et al.
Veröffentlicht: (2025)

Understanding Fixed Predictions via Confined Regions
von: Lawless, Connor, et al.
Veröffentlicht: (2025)

Offline RL with Smooth OOD Generalization in Convex Hull and its Neighborhood
von: Yao, Qingmao, et al.
Veröffentlicht: (2025)

Trust-Region Adaptive Policy Optimization
von: Su, Mingyu, et al.
Veröffentlicht: (2025)

Revisiting Training Scale: An Empirical Study of Token Count, Power Consumption, and Parameter Efficiency
von: Dwyer, Joe
Veröffentlicht: (2026)

Trust the Batch, On- or Off-Policy: Adaptive Policy Optimization for RL Post-Training
von: Fakoor, Rasool, et al.
Veröffentlicht: (2026)

Attention Smoothing Is All You Need For Unlearning
von: Zade, Saleh Zare, et al.
Veröffentlicht: (2026)

Soft Policy Optimization: Online Off-Policy RL for Sequence Models
von: Cohen, Taco, et al.
Veröffentlicht: (2025)

Trust-Region Behavior Blending for On-Policy Distillation
von: Plyusov, Daniil, et al.
Veröffentlicht: (2026)

Effective Confidence Region Prediction Using Probability Forecasters
von: Lindsay, David, et al.
Veröffentlicht: (2024)

On Entropy Control in LLM-RL Algorithms
von: Shen, Han
Veröffentlicht: (2025)

Token-Efficient RL for LLM Reasoning
von: Lee, Alan, et al.
Veröffentlicht: (2025)

Plan Before You Trade: Inference-Time Optimization for RL Trading Agents
von: Go, Eun, et al.
Veröffentlicht: (2026)

Trust Region Q Adjoint Matching
von: Dong, Yonghoon, et al.
Veröffentlicht: (2026)

Can You Trust an LLM with Your Life-Changing Decision? An Investigation into AI High-Stakes Responses
von: Cahyono, Joshua Adrian, et al.
Veröffentlicht: (2025)

CluCERT: Certifying LLM Robustness via Clustering-Guided Denoising Smoothing
von: Wang, Zixia, et al.
Veröffentlicht: (2025)

Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs
von: Yang, Zhihe, et al.
Veröffentlicht: (2025)

Weight Clipping for Deep Continual and Reinforcement Learning
von: Elsayed, Mohamed, et al.
Veröffentlicht: (2024)

Matrix Low-Rank Trust Region Policy Optimization
von: Rozada, Sergio, et al.
Veröffentlicht: (2024)

Combining LLM decision and RL action selection to improve RL policy for adaptive interventions
von: Karine, Karine, et al.
Veröffentlicht: (2025)

RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$
von: Bhatia, Abhinav, et al.
Veröffentlicht: (2023)

HELENE: Hessian Layer-wise Clipping and Gradient Annealing for Accelerating Fine-tuning LLM with Zeroth-order Optimization
von: Zhao, Huaqin, et al.
Veröffentlicht: (2024)

Can You Break RLVER? Probing Adversarial Robustness of RL-Trained Empathetic Agents
von: K, Deeraj S, et al.
Veröffentlicht: (2026)

Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL
von: Sokar, Ghada, et al.
Veröffentlicht: (2024)

Towards Realistic Guarantees: A Probabilistic Certificate for SmoothLLM
von: Kumarappan, Adarsh, et al.
Veröffentlicht: (2025)

Synthetic Data RL: Task Definition Is All You Need
von: Guo, Yiduo, et al.
Veröffentlicht: (2025)

STO-RL: Offline RL under Sparse Rewards via LLM-Guided Subgoal Temporal Order
von: Gu, Chengyang, et al.
Veröffentlicht: (2026)