:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Padula, Alexander G., Soemers, Dennis J. N. J.
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2410.17126
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

On Designing Effective RL Reward at Training Time for LLM Reasoning
by: Gao, Jiaxuan, et al.
Published: (2024)

FlowRL: Matching Reward Distributions for LLM Reasoning
by: Zhu, Xuekai, et al.
Published: (2025)

Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents
by: Li, Zelong, et al.
Published: (2024)

Blockwise Advantage Estimation for Multi-Objective RL with Verifiable Rewards
by: Pavlenko, Kirill, et al.
Published: (2026)

RewardFlow: Topology-Aware Reward Propagation on State Graphs for Agentic RL with Large Language Models
by: Feng, Xiao, et al.
Published: (2026)

On the Optimal Reasoning Length for RL-Trained Language Models
by: Nohara, Daisuke, et al.
Published: (2026)

$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training
by: Zhou, Jin Peng, et al.
Published: (2025)

ToolRL: Reward is All Tool Learning Needs
by: Qian, Cheng, et al.
Published: (2025)

ROSE: A Reward-Oriented Data Selection Framework for LLM Task-Specific Instruction Tuning
by: Wu, Yang, et al.
Published: (2024)

Exploring Domain Robust Lightweight Reward Models based on Router Mechanism
by: Namgoong, Hyuk, et al.
Published: (2024)

Emergent Representations of Program Semantics in Language Models Trained on Programs
by: Jin, Charles, et al.
Published: (2023)

ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
by: Zhou, Yifei, et al.
Published: (2024)

Look Inward to Explore Outward: Learning Temperature Policy from LLM Internal States via Hierarchical RL
by: Zhou, Yixiao, et al.
Published: (2026)

GLIDE-RL: Grounded Language Instruction through DEmonstration in RL
by: Kharyal, Chaitanya, et al.
Published: (2024)

Feedback Loops With Language Models Drive In-Context Reward Hacking
by: Pan, Alexander, et al.
Published: (2024)

Scaling LLM Multi-turn RL with End-to-end Summarization-based Context Management
by: Lu, Miao, et al.
Published: (2025)

Synthetic Data RL: Task Definition Is All You Need
by: Guo, Yiduo, et al.
Published: (2025)

Mitigating Lost in Multi-turn Conversation via Curriculum RL with Verifiable Accuracy and Abstention Rewards
by: Li, Ming, et al.
Published: (2025)

Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task
by: Li, Kenneth, et al.
Published: (2022)

AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning
by: Xi, Zhiheng, et al.
Published: (2025)

MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent
by: Yu, Hongli, et al.
Published: (2025)

Heterogeneity in Formal Linguistic Competence of Language Models: Is Data the Real Bottleneck?
by: Renduchintala, H S V N S Kowndinya, et al.
Published: (2026)

Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training
by: Liu, Mingjie, et al.
Published: (2025)

BeamPERL: Parameter-Efficient RL with Verifiable Rewards Specializes Compact LLMs for Structured Beam Mechanics Reasoning
by: Hage, Tarjei Paule, et al.
Published: (2026)

Learning to Reason as Action Abstractions with Scalable Mid-Training RL
by: Zhang, Shenao, et al.
Published: (2025)

Ludax: A GPU-Accelerated Domain Specific Language for Board Games
by: Todd, Graham, et al.
Published: (2025)

AIOS Compiler: LLM as Interpreter for Natural Language Programming and Flow Programming of AI Agents
by: Xu, Shuyuan, et al.
Published: (2024)

Multilinguality in LLM-Designed Reward Functions for Restless Bandits: Effects on Task Performance and Fairness
by: Parthasarathy, Ambreesh, et al.
Published: (2025)

Words as Beacons: Guiding RL Agents with High-Level Language Prompts
by: Ruiz-Gonzalez, Unai, et al.
Published: (2024)

SFT-then-RL Outperforms Mixed-Policy Methods for LLM Reasoning
by: Limozin, Alexis, et al.
Published: (2026)

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation
by: Yang, Zhuolin, et al.
Published: (2026)

Automated Rewards via LLM-Generated Progress Functions
by: Sarukkai, Vishnu, et al.
Published: (2024)

Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty
by: Damani, Mehul, et al.
Published: (2025)

ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models
by: Akhauri, Yash, et al.
Published: (2024)

The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement Learning and Large Language Models
by: Pternea, Moschoula, et al.
Published: (2024)

TSR: Trajectory-Search Rollouts for Multi-Turn RL of LLM Agents
by: Djuhera, Aladin, et al.
Published: (2026)

Bridging the Knowledge Void: Inference-time Acquisition of Unfamiliar Programming Languages for Coding Tasks
by: Shen, Chen, et al.
Published: (2026)

Exploring Curriculum Learning for Vision-Language Tasks: A Study on Small-Scale Multimodal Training
by: Saha, Rohan, et al.
Published: (2024)

Lightweight Safety Guardrails via Synthetic Data and RL-guided Adversarial Training
by: Ilin, Aleksei, et al.
Published: (2025)

UserRL: Training Interactive User-Centric Agent via Reinforcement Learning
by: Qian, Cheng, et al.
Published: (2025)