Saved in:
| Main Authors: | Maass, Wolfgang, Janzen, Sabine |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.26657 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Evolving Afferent Architectures: Biologically-inspired Models for Damage-Avoidance Learning
by: Maass, Wolfgang, et al.
Published: (2026)
by: Maass, Wolfgang, et al.
Published: (2026)
Policy Gradients for Cumulative Prospect Theory in Reinforcement Learning
by: Lepel, Olivier, et al.
Published: (2024)
by: Lepel, Olivier, et al.
Published: (2024)
MemPO: Self-Memory Policy Optimization for Long-Horizon Agents
by: Li, Ruoran, et al.
Published: (2026)
by: Li, Ruoran, et al.
Published: (2026)
Cumulative Path-Level Semantic Reasoning for Inductive Knowledge Graph Completion
by: Wang, Jiapu, et al.
Published: (2026)
by: Wang, Jiapu, et al.
Published: (2026)
Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks
by: He, Shuo, et al.
Published: (2026)
by: He, Shuo, et al.
Published: (2026)
Milestone-Guided Policy Learning for Long-Horizon Language Agents
by: Wang, Zixuan, et al.
Published: (2026)
by: Wang, Zixuan, et al.
Published: (2026)
PhGPO: Pheromone-Guided Policy Optimization for Long-Horizon Tool Planning
by: Li, Yu, et al.
Published: (2026)
by: Li, Yu, et al.
Published: (2026)
Prune-OPD: Efficient and Reliable On-Policy Distillation for Long-Horizon Reasoning
by: Yang, Zhicheng, et al.
Published: (2026)
by: Yang, Zhicheng, et al.
Published: (2026)
The Optimal Token Baseline: Variance Reduction for Long-Horizon LLM-RL
by: Li, Yingru, et al.
Published: (2026)
by: Li, Yingru, et al.
Published: (2026)
Learning Bilevel Policies over Symbolic World Models for Long-Horizon Planning
by: Chen, Dillon Z., et al.
Published: (2026)
by: Chen, Dillon Z., et al.
Published: (2026)
Learning to Ball: Composing Policies for Long-Horizon Basketball Moves
by: Xu, Pei, et al.
Published: (2025)
by: Xu, Pei, et al.
Published: (2025)
Implementing Cumulative Functions with Generalized Cumulative Constraints
by: Schaus, Pierre, et al.
Published: (2025)
by: Schaus, Pierre, et al.
Published: (2025)
Chain-of-Goals Hierarchical Policy for Long-Horizon Offline Goal-Conditioned RL
by: Choi, Jinwoo, et al.
Published: (2026)
by: Choi, Jinwoo, et al.
Published: (2026)
Incorporating Metabolic Information into LLMs for Anomaly Detection in Clinical Time-Series
by: Rahman, Maxx Richard, et al.
Published: (2024)
by: Rahman, Maxx Richard, et al.
Published: (2024)
Revisiting LQR Control from the Perspective of Receding-Horizon Policy Gradient
by: Zhang, Xiangyuan, et al.
Published: (2023)
by: Zhang, Xiangyuan, et al.
Published: (2023)
Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes
by: Bai, Qinbo, et al.
Published: (2023)
by: Bai, Qinbo, et al.
Published: (2023)
Post-Training with Policy Gradients: Optimality and the Base Model Barrier
by: Mousavi-Hosseini, Alireza, et al.
Published: (2026)
by: Mousavi-Hosseini, Alireza, et al.
Published: (2026)
Learning General Parameterized Policies for Infinite Horizon Average Reward Constrained MDPs via Primal-Dual Policy Gradient Algorithm
by: Bai, Qinbo, et al.
Published: (2024)
by: Bai, Qinbo, et al.
Published: (2024)
Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent Baseline
by: Meng, Wenjia, et al.
Published: (2024)
by: Meng, Wenjia, et al.
Published: (2024)
UrbanAI 2025 Challenge: Linear vs Transformer Models for Long-Horizon Exogenous Temperature Forecasting
by: Gokhman, Ruslan
Published: (2025)
by: Gokhman, Ruslan
Published: (2025)
$\boldsymbol{f}$-OPD: Stabilizing Long-Horizon On-Policy Distillation with Freshness-Aware Control
by: Chen, Xianwei, et al.
Published: (2026)
by: Chen, Xianwei, et al.
Published: (2026)
ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL Problems
by: Cherepanov, Egor, et al.
Published: (2025)
by: Cherepanov, Egor, et al.
Published: (2025)
Beyond Policy Optimization: A Data Curation Flywheel for Sparse-Reward Long-Horizon Planning
by: Wang, Yutong, et al.
Published: (2025)
by: Wang, Yutong, et al.
Published: (2025)
HorizonBench: Long-Horizon Personalization with Evolving Preferences
by: Li, Shuyue Stella, et al.
Published: (2026)
by: Li, Shuyue Stella, et al.
Published: (2026)
Performative Policy Gradient: Optimality in Performative Reinforcement Learning
by: Basu, Debabrota, et al.
Published: (2025)
by: Basu, Debabrota, et al.
Published: (2025)
Trust the PRoC3S: Solving Long-Horizon Robotics Problems with LLMs and Constraint Satisfaction
by: Curtis, Aidan, et al.
Published: (2024)
by: Curtis, Aidan, et al.
Published: (2024)
DiGAN: Diffusion-Guided Attention Network for Early Alzheimer's Disease Detection
by: Rahman, Maxx Richard, et al.
Published: (2026)
by: Rahman, Maxx Richard, et al.
Published: (2026)
On the Global Optimality of Policy Gradient Methods in General Utility Reinforcement Learning
by: Barakat, Anas, et al.
Published: (2024)
by: Barakat, Anas, et al.
Published: (2024)
Verifiable Benchmarking of Long-Horizon Spatial Biology
by: Diks, Ian, et al.
Published: (2026)
by: Diks, Ian, et al.
Published: (2026)
Rethinking Importance Sampling in LLM Policy Optimization: A Cumulative Token Perspective
by: Zhang, Yuheng, et al.
Published: (2026)
by: Zhang, Yuheng, et al.
Published: (2026)
LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents
by: Lu, Yijun, et al.
Published: (2026)
by: Lu, Yijun, et al.
Published: (2026)
UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios
by: Luo, Haotian, et al.
Published: (2025)
by: Luo, Haotian, et al.
Published: (2025)
HiMem: Hierarchical Long-Term Memory for LLM Long-Horizon Agents
by: Zhang, Ningning, et al.
Published: (2026)
by: Zhang, Ningning, et al.
Published: (2026)
Learning Long-Horizon Predictions for Quadrotor Dynamics
by: Rao, Pratyaksh Prabhav, et al.
Published: (2024)
by: Rao, Pratyaksh Prabhav, et al.
Published: (2024)
STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks
by: Lobo, ELita, et al.
Published: (2026)
by: Lobo, ELita, et al.
Published: (2026)
LEAD: Breaking the No-Recovery Bottleneck in Long-Horizon Reasoning
by: Pushkin, Denys, et al.
Published: (2026)
by: Pushkin, Denys, et al.
Published: (2026)
Double Horizon Model-Based Policy Optimization
by: Kubo, Akihiro, et al.
Published: (2025)
by: Kubo, Akihiro, et al.
Published: (2025)
Real-Time Long Horizon Air Quality Forecasting via Group-Relative Policy Optimization
by: Kang, Inha, et al.
Published: (2025)
by: Kang, Inha, et al.
Published: (2025)
CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?
by: Chen, Haolin, et al.
Published: (2026)
by: Chen, Haolin, et al.
Published: (2026)
LongCoT: Benchmarking Long-Horizon Chain-of-Thought Reasoning
by: Motwani, Sumeet Ramesh, et al.
Published: (2026)
by: Motwani, Sumeet Ramesh, et al.
Published: (2026)
Similar Items
-
Evolving Afferent Architectures: Biologically-inspired Models for Damage-Avoidance Learning
by: Maass, Wolfgang, et al.
Published: (2026) -
Policy Gradients for Cumulative Prospect Theory in Reinforcement Learning
by: Lepel, Olivier, et al.
Published: (2024) -
MemPO: Self-Memory Policy Optimization for Long-Horizon Agents
by: Li, Ruoran, et al.
Published: (2026) -
Cumulative Path-Level Semantic Reasoning for Inductive Knowledge Graph Completion
by: Wang, Jiapu, et al.
Published: (2026) -
Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks
by: He, Shuo, et al.
Published: (2026)