Saved in:
| Main Authors: | Chen, Zhengyu, Yang, Jinluan, Xiao, Teng, Zhou, Ruochen, Zhang, Luan, Xi, Xiangyu, Shi, Xiaowei, Wang, Wei, Wang, Jinggang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.11184 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
From Mathematical Reasoning to Code: Generalization of Process Reward Models in Test-Time Scaling
by: Chen, Zhengyu, et al.
Published: (2025)
by: Chen, Zhengyu, et al.
Published: (2025)
LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning
by: Wang, Jianing, et al.
Published: (2026)
by: Wang, Jianing, et al.
Published: (2026)
AdaptR1: Reinforcement Learning Based Adaptive Interleaved Thinking in Multi-hop Question Answering
by: Wang, Yuxin, et al.
Published: (2026)
by: Wang, Yuxin, et al.
Published: (2026)
TopoCurate:Modeling Interaction Topology for Tool-Use Agent Training
by: Yang, Jinluan, et al.
Published: (2026)
by: Yang, Jinluan, et al.
Published: (2026)
Adaptive Tool Generation with Models as Tools and Reinforcement Learning
by: Wang, Chenpeng, et al.
Published: (2025)
by: Wang, Chenpeng, et al.
Published: (2025)
Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation
by: Guo, Ziyu, et al.
Published: (2025)
by: Guo, Ziyu, et al.
Published: (2025)
Research on the Integration of Embodied Intelligence and Reinforcement Learning in Textual Domains
by: Wang, Haonan, et al.
Published: (2025)
by: Wang, Haonan, et al.
Published: (2025)
More Vulnerable than You Think: On the Stability of Tool-Integrated LLM Agents
by: Xiong, Weimin, et al.
Published: (2025)
by: Xiong, Weimin, et al.
Published: (2025)
PruneTIR: Inference-Time Tool Call Pruning for Effective yet Efficient Tool-Integrated Reasoning
by: Zhang, Luan, et al.
Published: (2026)
by: Zhang, Luan, et al.
Published: (2026)
Thinking-while-speaking: A Controlled, Interleaved Reasoning Method for Real-Time Speech Generation
by: Du, Xuan, et al.
Published: (2026)
by: Du, Xuan, et al.
Published: (2026)
Does Learning Mathematical Problem-Solving Generalize to Broader Reasoning?
by: Zhou, Ruochen, et al.
Published: (2025)
by: Zhou, Ruochen, et al.
Published: (2025)
SIGHT: Reinforcement Learning with Self-Evidence and Information-Gain Diverse Branching for Search Agent
by: Zhong, Wenlin, et al.
Published: (2026)
by: Zhong, Wenlin, et al.
Published: (2026)
IRIS: Interleaved Reinforcement with Incremental Staged Curriculum for Cross-Lingual Mathematical Reasoning
by: Gupta, Navya, et al.
Published: (2026)
by: Gupta, Navya, et al.
Published: (2026)
Interleaved Reasoning for Large Language Models via Reinforcement Learning
by: Xie, Roy, et al.
Published: (2025)
by: Xie, Roy, et al.
Published: (2025)
On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning
by: Li, Zhaoyi, et al.
Published: (2026)
by: Li, Zhaoyi, et al.
Published: (2026)
Scaling Medical Reasoning Verification via Tool-Integrated Reinforcement Learning
by: Zhang, Hang, et al.
Published: (2026)
by: Zhang, Hang, et al.
Published: (2026)
Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs
by: Lu, Meng, et al.
Published: (2025)
by: Lu, Meng, et al.
Published: (2025)
ToolDreamer: Instilling LLM Reasoning Into Tool Retrievers
by: Sengupta, Saptarshi, et al.
Published: (2025)
by: Sengupta, Saptarshi, et al.
Published: (2025)
Teaching Thinking Models to Reason with Tools: A Full-Pipeline Recipe for Tool-Integrated Reasoning
by: Cheng, Qianjia, et al.
Published: (2026)
by: Cheng, Qianjia, et al.
Published: (2026)
When Domains Interact: Asymmetric and Order-Sensitive Cross-Domain Effects in Reinforcement Learning for Reasoning
by: Yang, Wang, et al.
Published: (2026)
by: Yang, Wang, et al.
Published: (2026)
Think-J: Learning to Think for Generative LLM-as-a-Judge
by: Huang, Hui, et al.
Published: (2025)
by: Huang, Hui, et al.
Published: (2025)
ToolPlanner: A Tool Augmented LLM for Multi Granularity Instructions with Path Planning and Feedback
by: Wu, Qinzhuo, et al.
Published: (2024)
by: Wu, Qinzhuo, et al.
Published: (2024)
SkillCraft: Can LLM Agents Learn to Use Tools Skillfully?
by: Chen, Shiqi, et al.
Published: (2026)
by: Chen, Shiqi, et al.
Published: (2026)
SIRI: Scaling Iterative Reinforcement Learning with Interleaved Compression
by: Wen, Haoming, et al.
Published: (2025)
by: Wen, Haoming, et al.
Published: (2025)
Ground Every Sentence: Improving Retrieval-Augmented LLMs with Interleaved Reference-Claim Generation
by: Xia, Sirui, et al.
Published: (2024)
by: Xia, Sirui, et al.
Published: (2024)
AutoTIR: Autonomous Tools Integrated Reasoning via Reinforcement Learning
by: Wei, Yifan, et al.
Published: (2025)
by: Wei, Yifan, et al.
Published: (2025)
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
by: Tian, Changyao, et al.
Published: (2024)
by: Tian, Changyao, et al.
Published: (2024)
Detecting AI-Generated Texts in Cross-Domains
by: Zhou, You, et al.
Published: (2024)
by: Zhou, You, et al.
Published: (2024)
Interleaved Scene Graphs for Interleaved Text-and-Image Generation Assessment
by: Chen, Dongping, et al.
Published: (2024)
by: Chen, Dongping, et al.
Published: (2024)
Domain Generalization via Causal Adjustment for Cross-Domain Sentiment Analysis
by: Wang, Siyin, et al.
Published: (2024)
by: Wang, Siyin, et al.
Published: (2024)
SQL-Trail: Multi-Turn Reinforcement Learning with Interleaved Feedback for Text-to-SQL
by: Hua, Harper, et al.
Published: (2026)
by: Hua, Harper, et al.
Published: (2026)
SampleMix: A Sample-wise Pre-training Data Mixing Strategey by Coordinating Data Quality and Diversity
by: Xi, Xiangyu, et al.
Published: (2025)
by: Xi, Xiangyu, et al.
Published: (2025)
MobileIPL: Enhancing Mobile Agents Thinking Process via Iterative Preference Learning
by: Huang, Kun, et al.
Published: (2025)
by: Huang, Kun, et al.
Published: (2025)
Embedding Domain Knowledge for Large Language Models via Reinforcement Learning from Augmented Generation
by: Nie, Chaojun, et al.
Published: (2025)
by: Nie, Chaojun, et al.
Published: (2025)
Leveraging Invariant Principle for Heterophilic Graph Structure Distribution Shifts
by: Yang, Jinluan, et al.
Published: (2024)
by: Yang, Jinluan, et al.
Published: (2024)
Auto-ABSA: Cross-Domain Aspect Detection and Sentiment Analysis Using Auxiliary Sentences
by: Wang, Teng, et al.
Published: (2022)
by: Wang, Teng, et al.
Published: (2022)
Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning
by: Seed, ByteDance, et al.
Published: (2025)
by: Seed, ByteDance, et al.
Published: (2025)
Discovering Invariant Neighborhood Patterns for Heterophilic Graphs
by: Yang, Jinluan, et al.
Published: (2024)
by: Yang, Jinluan, et al.
Published: (2024)
CroCoSum: A Benchmark Dataset for Cross-Lingual Code-Switched Summarization
by: Zhang, Ruochen, et al.
Published: (2023)
by: Zhang, Ruochen, et al.
Published: (2023)
Cross-Lingual Interleaving for Speech Language Models
by: Moumen, Adel, et al.
Published: (2025)
by: Moumen, Adel, et al.
Published: (2025)
Similar Items
-
From Mathematical Reasoning to Code: Generalization of Process Reward Models in Test-Time Scaling
by: Chen, Zhengyu, et al.
Published: (2025) -
LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning
by: Wang, Jianing, et al.
Published: (2026) -
AdaptR1: Reinforcement Learning Based Adaptive Interleaved Thinking in Multi-hop Question Answering
by: Wang, Yuxin, et al.
Published: (2026) -
TopoCurate:Modeling Interaction Topology for Tool-Use Agent Training
by: Yang, Jinluan, et al.
Published: (2026) -
Adaptive Tool Generation with Models as Tools and Reinforcement Learning
by: Wang, Chenpeng, et al.
Published: (2025)