Saved in:
| Main Authors: | Li, Chusen, Liu, Zhou, Zhou, Shuigeng, Zhang, Wentao |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.28699 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Adaptive Stopping for Multi-Turn LLM Reasoning
by: Zhou, Xiaofan, et al.
Published: (2026)
by: Zhou, Xiaofan, et al.
Published: (2026)
Not All Turns Matter: Credit Assignment for Multi-Turn Jailbreaking
by: He, Zhida, et al.
Published: (2026)
by: He, Zhida, et al.
Published: (2026)
Multi-level Advantage Credit Assignment for Cooperative Multi-Agent Reinforcement Learning
by: Zhao, Xutong, et al.
Published: (2025)
by: Zhao, Xutong, et al.
Published: (2025)
TRACER: Trajectory Risk Aggregation for Critical Episodes in Agentic Reasoning
by: Tayebati, Sina, et al.
Published: (2026)
by: Tayebati, Sina, et al.
Published: (2026)
TRACER: Trace-Based Adaptive Cost-Efficient Routing for LLM Classification
by: Rida, Adam
Published: (2026)
by: Rida, Adam
Published: (2026)
Proximity-Based Multi-Turn Optimization: Practical Credit Assignment for LLM Agent Training
by: Fang, Yangyi, et al.
Published: (2026)
by: Fang, Yangyi, et al.
Published: (2026)
Imagination-Limited Q-Learning for Offline Reinforcement Learning
by: Liu, Wenhui, et al.
Published: (2025)
by: Liu, Wenhui, et al.
Published: (2025)
Not All Thoughts are Generated Equal: Efficient LLM Reasoning via Multi-Turn Reinforcement Learning
by: Ning, Yansong, et al.
Published: (2025)
by: Ning, Yansong, et al.
Published: (2025)
Exact Is Easier: Credit Assignment for Cooperative LLM Agents
by: Chen, Yanjun, et al.
Published: (2026)
by: Chen, Yanjun, et al.
Published: (2026)
TRACER: Persistent Regularization for Robust Multimodal Finetuning
by: Asadollahzadeh, Hesam, et al.
Published: (2026)
by: Asadollahzadeh, Hesam, et al.
Published: (2026)
LAMARL: LLM-Aided Multi-Agent Reinforcement Learning for Cooperative Policy Generation
by: Zhu, Guobin, et al.
Published: (2025)
by: Zhu, Guobin, et al.
Published: (2025)
A Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning
by: Liu, Licheng, et al.
Published: (2025)
by: Liu, Licheng, et al.
Published: (2025)
VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents
by: Wang, Kangrui, et al.
Published: (2025)
by: Wang, Kangrui, et al.
Published: (2025)
Discovering Process-Outcome Credit in Multi-Step LLM Reasoning
by: Wang, Xiangwei, et al.
Published: (2026)
by: Wang, Xiangwei, et al.
Published: (2026)
scI2CL: Effectively Integrating Single-cell Multi-omics by Intra- and Inter-omics Contrastive Learning
by: Liu, Wuchao, et al.
Published: (2025)
by: Liu, Wuchao, et al.
Published: (2025)
An LLM-Powered Cooperative Framework for Large-Scale Multi-Vehicle Navigation
by: Zhou, Yuping, et al.
Published: (2025)
by: Zhou, Yuping, et al.
Published: (2025)
Fairness Amidst Non-IID Graph Data: A Literature Review
by: Zhang, Wenbin, et al.
Published: (2022)
by: Zhang, Wenbin, et al.
Published: (2022)
One Pass for All: A Discrete Diffusion Model for Knowledge Graph Triple Set Prediction
by: Guan, Jihong, et al.
Published: (2026)
by: Guan, Jihong, et al.
Published: (2026)
Shapley-Coop: Credit Assignment for Emergent Cooperation in Self-Interested LLM Agents
by: Hua, Yun, et al.
Published: (2025)
by: Hua, Yun, et al.
Published: (2025)
Search-Based Credit Assignment for Offline Preference-Based Reinforcement Learning
by: Gao, Xiancheng, et al.
Published: (2025)
by: Gao, Xiancheng, et al.
Published: (2025)
From Reasoning Chains to Verifiable Subproblems: Curriculum Reinforcement Learning Enables Credit Assignment for LLM Reasoning
by: Jiang, Xitai, et al.
Published: (2026)
by: Jiang, Xitai, et al.
Published: (2026)
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning
by: Wang, Zihan, et al.
Published: (2025)
by: Wang, Zihan, et al.
Published: (2025)
MathMixup: Boosting LLM Mathematical Reasoning with Difficulty-Controllable Data Synthesis and Curriculum Learning
by: Li, Xuchen, et al.
Published: (2026)
by: Li, Xuchen, et al.
Published: (2026)
GRPO-$λ$: Credit Assignment improves LLM Reasoning
by: Parthasarathi, Prasanna, et al.
Published: (2025)
by: Parthasarathi, Prasanna, et al.
Published: (2025)
AIR: Unifying Individual and Collective Exploration in Cooperative Multi-Agent Reinforcement Learning
by: Zhou, Guangchong, et al.
Published: (2024)
by: Zhou, Guangchong, et al.
Published: (2024)
Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning
by: Qu, Yun, et al.
Published: (2024)
by: Qu, Yun, et al.
Published: (2024)
MURPHY: Feedback-Aware GRPO with Retrospective Credit Assignment for Multi-Turn Code Generation
by: Ekbote, Chanakya, et al.
Published: (2025)
by: Ekbote, Chanakya, et al.
Published: (2025)
Credit Where It is Due: Cross-Modality Connectivity Drives Precise Reinforcement Learning for MLLM Reasoning
by: Jiao, Zhengbo, et al.
Published: (2026)
by: Jiao, Zhengbo, et al.
Published: (2026)
LLM-Driven Multi-Turn Task-Oriented Dialogue Synthesis for Realistic Reasoning
by: Zhu, Yu, et al.
Published: (2026)
by: Zhu, Yu, et al.
Published: (2026)
Reasoning without Regret
by: Chitra, Tarun
Published: (2025)
by: Chitra, Tarun
Published: (2025)
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning
by: Zhou, Yang, et al.
Published: (2025)
by: Zhou, Yang, et al.
Published: (2025)
OPPO: Bayesian Value Recursion for Token-Level Credit Assignment in LLM Reasoning
by: Li, Yu, et al.
Published: (2026)
by: Li, Yu, et al.
Published: (2026)
FlowRL: Matching Reward Distributions for LLM Reasoning
by: Zhu, Xuekai, et al.
Published: (2025)
by: Zhu, Xuekai, et al.
Published: (2025)
Evolutionary Enhanced Multi-Agent Reinforcement Learning for Cooperative Air Combat
by: Li, Chengwei, et al.
Published: (2026)
by: Li, Chengwei, et al.
Published: (2026)
Drift-Bench: Diagnosing Cooperative Breakdowns in LLM Agents under Input Faults via Multi-Turn Interaction
by: Bao, Han, et al.
Published: (2026)
by: Bao, Han, et al.
Published: (2026)
Breaking Contextual Inertia: Reinforcement Learning with Single-Turn Anchors for Stable Multi-Turn Interaction
by: Chen, Xingwu, et al.
Published: (2026)
by: Chen, Xingwu, et al.
Published: (2026)
MMCR: Advancing Visual Language Model in Multimodal Multi-Turn Contextual Reasoning
by: Yan, Dawei, et al.
Published: (2025)
by: Yan, Dawei, et al.
Published: (2025)
Nucleolus Credit Assignment for Effective Coalitions in Multi-agent Reinforcement Learning
by: Li, Yugu, et al.
Published: (2025)
by: Li, Yugu, et al.
Published: (2025)
AdaMamba: Adaptive Frequency-Gated Mamba for Long-Term Time Series Forecasting
by: Jiang, Xudong, et al.
Published: (2026)
by: Jiang, Xudong, et al.
Published: (2026)
AEM: Adaptive Entropy Modulation for Multi-Turn Agentic Reinforcement Learning
by: Zhao, Haotian, et al.
Published: (2026)
by: Zhao, Haotian, et al.
Published: (2026)
Similar Items
-
Adaptive Stopping for Multi-Turn LLM Reasoning
by: Zhou, Xiaofan, et al.
Published: (2026) -
Not All Turns Matter: Credit Assignment for Multi-Turn Jailbreaking
by: He, Zhida, et al.
Published: (2026) -
Multi-level Advantage Credit Assignment for Cooperative Multi-Agent Reinforcement Learning
by: Zhao, Xutong, et al.
Published: (2025) -
TRACER: Trajectory Risk Aggregation for Critical Episodes in Agentic Reasoning
by: Tayebati, Sina, et al.
Published: (2026) -
TRACER: Trace-Based Adaptive Cost-Efficient Routing for LLM Classification
by: Rida, Adam
Published: (2026)