Saved in:
| Main Authors: | Zhang, Zhen, Song, Kaiqiang, Wang, Xun, Hu, Yebowen, Yan, Weixiang, Zhao, Chenyang, Zou, Henry Peng, Deng, Haoyun, Indurthi, Sathish Reddy, Liu, Shujian, Ma, Simin, Wang, Xiaoyang, Wang, Xin Eric, Wang, Song |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.12268 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Communication to Completion: Modeling Collaborative Workflows with Intelligent Multi-Agent Communication
by: Lu, Yiming, et al.
Published: (2025)
by: Lu, Yiming, et al.
Published: (2025)
Aligning Multilingual Reasoning with Verifiable Semantics from a High-Resource Expert Model
by: Faisal, Fahim, et al.
Published: (2025)
by: Faisal, Fahim, et al.
Published: (2025)
TCIA: A Task-Centric Instruction Augmentation Method for Instruction Finetuning
by: Ma, Simin, et al.
Published: (2025)
by: Ma, Simin, et al.
Published: (2025)
Complex Logical Instruction Generation
by: Zhang, Mian, et al.
Published: (2025)
by: Zhang, Mian, et al.
Published: (2025)
Beyond Monolithic Rewards: A Hybrid and Multi-Aspect Reward Optimization for MLLM Alignment
by: Gulhane, Radha, et al.
Published: (2025)
by: Gulhane, Radha, et al.
Published: (2025)
LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries
by: Yin, Ming, et al.
Published: (2025)
by: Yin, Ming, et al.
Published: (2025)
WPO: Enhancing RLHF with Weighted Preference Optimization
by: Zhou, Wenxuan, et al.
Published: (2024)
by: Zhou, Wenxuan, et al.
Published: (2024)
Improving Multilingual Instruction Finetuning via Linguistically Natural and Diverse Datasets
by: Indurthi, Sathish Reddy, et al.
Published: (2024)
by: Indurthi, Sathish Reddy, et al.
Published: (2024)
Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
by: Wu, Tong, et al.
Published: (2024)
by: Wu, Tong, et al.
Published: (2024)
MUSIC: MUlti-Step Instruction Contrast for Multi-Turn Reward Models
by: Li, Wenzhe, et al.
Published: (2025)
by: Li, Wenzhe, et al.
Published: (2025)
SportsMetrics: Blending Text and Numerical Data to Understand Information Fusion in LLMs
by: Hu, Yebowen, et al.
Published: (2024)
by: Hu, Yebowen, et al.
Published: (2024)
Can Large Language Models do Analytical Reasoning?
by: Hu, Yebowen, et al.
Published: (2024)
by: Hu, Yebowen, et al.
Published: (2024)
When Reasoning Meets Information Aggregation: A Case Study with Sports Narratives
by: Hu, Yebowen, et al.
Published: (2024)
by: Hu, Yebowen, et al.
Published: (2024)
TurnBench-MS: A Benchmark for Evaluating Multi-Turn, Multi-Step Reasoning in Large Language Models
by: Zhang, Yiran, et al.
Published: (2025)
by: Zhang, Yiran, et al.
Published: (2025)
InFoBench: Evaluating Instruction Following Ability in Large Language Models
by: Qin, Yiwei, et al.
Published: (2024)
by: Qin, Yiwei, et al.
Published: (2024)
Empowering Multi-Turn Tool-Integrated Agentic Reasoning with Group Turn Policy Optimization
by: Ding, Yifeng, et al.
Published: (2025)
by: Ding, Yifeng, et al.
Published: (2025)
Eliciting Behaviors in Multi-Turn Conversations
by: Huang, Jing, et al.
Published: (2025)
by: Huang, Jing, et al.
Published: (2025)
ToolACE-MT: Non-Autoregressive Generation for Agentic Multi-Turn Interaction
by: Zeng, Xingshan, et al.
Published: (2025)
by: Zeng, Xingshan, et al.
Published: (2025)
Experience-Evolving Multi-Turn Tool-Use Agent with Hybrid Episodic-Procedural Memory
by: Li, Sijia, et al.
Published: (2025)
by: Li, Sijia, et al.
Published: (2025)
SLEA-RL: Step-Level Experience Augmented Reinforcement Learning for Multi-Turn Agentic Training
by: Wang, Prince Zizhuang, et al.
Published: (2026)
by: Wang, Prince Zizhuang, et al.
Published: (2026)
Rethinking Stateful Tool Use in Multi-Turn Dialogues: Benchmarks and Challenges
by: Wang, Hongru, et al.
Published: (2025)
by: Wang, Hongru, et al.
Published: (2025)
SPECTRUM: Speaker-Enhanced Pre-Training for Long Dialogue Summarization
by: Cho, Sangwoo, et al.
Published: (2024)
by: Cho, Sangwoo, et al.
Published: (2024)
Multi-Turn Code Generation Through Single-Step Rewards
by: Jain, Arnav Kumar, et al.
Published: (2025)
by: Jain, Arnav Kumar, et al.
Published: (2025)
RC-GRPO: Reward-Conditioned Group Relative Policy Optimization for Multi-Turn Tool Calling Agents
by: Zhong, Haitian, et al.
Published: (2026)
by: Zhong, Haitian, et al.
Published: (2026)
Proactive Guidance of Multi-Turn Conversation in Industrial Search
by: Li, Xiaoyu, et al.
Published: (2025)
by: Li, Xiaoyu, et al.
Published: (2025)
AlphaQuanter: An End-to-End Tool-Augmented Agentic Reinforcement Learning Framework for Stock Trading
by: Deng, Zheye, et al.
Published: (2025)
by: Deng, Zheye, et al.
Published: (2025)
StepTool: Enhancing Multi-Step Tool Usage in LLMs via Step-Grained Reinforcement Learning
by: Yu, Yuanqing, et al.
Published: (2024)
by: Yu, Yuanqing, et al.
Published: (2024)
ToolRM: Towards Agentic Tool-Use Reward Modeling
by: Li, Renhao, et al.
Published: (2025)
by: Li, Renhao, et al.
Published: (2025)
Agentic Reinforcement Learning with Implicit Step Rewards
by: Liu, Xiaoqian, et al.
Published: (2025)
by: Liu, Xiaoqian, et al.
Published: (2025)
Training LLMs for Multi-Step Tool Orchestration with Constrained Data Synthesis and Graduated Rewards
by: Jiayang, Cheng, et al.
Published: (2026)
by: Jiayang, Cheng, et al.
Published: (2026)
Polarity Calibration for Opinion Summarization
by: Lei, Yuanyuan, et al.
Published: (2024)
by: Lei, Yuanyuan, et al.
Published: (2024)
A Versatile Multimodal Agent for Multimedia Content Generation
by: Zhang, Daoan, et al.
Published: (2026)
by: Zhang, Daoan, et al.
Published: (2026)
MTSQL-R1: Towards Long-Horizon Multi-Turn Text-to-SQL via Agentic Training
by: Guo, Taicheng, et al.
Published: (2025)
by: Guo, Taicheng, et al.
Published: (2025)
ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning
by: Ding, Shengyuan, et al.
Published: (2025)
by: Ding, Shengyuan, et al.
Published: (2025)
ComAgent: Multi-LLM based Agentic AI Empowered Intelligent Wireless Networks
by: Li, Haoyun, et al.
Published: (2026)
by: Li, Haoyun, et al.
Published: (2026)
MUPA: Towards Multi-Path Agentic Reasoning for Grounded Video Question Answering
by: Dang, Jisheng, et al.
Published: (2025)
by: Dang, Jisheng, et al.
Published: (2025)
Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use
by: Agarwal, Aradhye, et al.
Published: (2026)
by: Agarwal, Aradhye, et al.
Published: (2026)
Multi-Turn Reinforcement Learning for Tool-Calling Agents with Iterative Reward Calibration
by: Modecrua, Wachiravit, et al.
Published: (2026)
by: Modecrua, Wachiravit, et al.
Published: (2026)
T1: A Tool-Oriented Conversational Dataset for Multi-Turn Agentic Planning
by: Chakraborty, Amartya, et al.
Published: (2025)
by: Chakraborty, Amartya, et al.
Published: (2025)
TAMTRL: Teacher-Aligned Reward Reshaping for Multi-Turn Reinforcement Learning in Long-Context Compression
by: Wang, Li, et al.
Published: (2026)
by: Wang, Li, et al.
Published: (2026)
Similar Items
-
Communication to Completion: Modeling Collaborative Workflows with Intelligent Multi-Agent Communication
by: Lu, Yiming, et al.
Published: (2025) -
Aligning Multilingual Reasoning with Verifiable Semantics from a High-Resource Expert Model
by: Faisal, Fahim, et al.
Published: (2025) -
TCIA: A Task-Centric Instruction Augmentation Method for Instruction Finetuning
by: Ma, Simin, et al.
Published: (2025) -
Complex Logical Instruction Generation
by: Zhang, Mian, et al.
Published: (2025) -
Beyond Monolithic Rewards: A Hybrid and Multi-Aspect Reward Optimization for MLLM Alignment
by: Gulhane, Radha, et al.
Published: (2025)