Saved in:
| Main Authors: | Guo, Weiyang, Shi, Zesheng, Zhao, Liye, Ma, Jiayuan, Zhu, Zeen, He, Junxian, Zhang, Min, Li, Jing |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.09455 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Backdoors in RLVR: Jailbreak Backdoors in LLMs From Verifiable Reward
by: Guo, Weiyang, et al.
Published: (2026)
by: Guo, Weiyang, et al.
Published: (2026)
PruneTIR: Inference-Time Tool Call Pruning for Effective yet Efficient Tool-Integrated Reasoning
by: Zhang, Luan, et al.
Published: (2026)
by: Zhang, Luan, et al.
Published: (2026)
Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools
by: Wu, Junde, et al.
Published: (2025)
by: Wu, Junde, et al.
Published: (2025)
MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching
by: Qu, Changle, et al.
Published: (2026)
by: Qu, Changle, et al.
Published: (2026)
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
by: Zeng, Weihao, et al.
Published: (2024)
by: Zeng, Weihao, et al.
Published: (2024)
Jailbreak-R1: Exploring the Jailbreak Capabilities of LLMs via Reinforcement Learning
by: Guo, Weiyang, et al.
Published: (2025)
by: Guo, Weiyang, et al.
Published: (2025)
Skill Weaving: Efficient LLM Improvement via Modular Skillpacks
by: Li, Zhuo, et al.
Published: (2026)
by: Li, Zhuo, et al.
Published: (2026)
DGRO: Enhancing LLM Reasoning via Exploration-Exploitation Control and Reward Variance Management
by: Su, Xuerui, et al.
Published: (2025)
by: Su, Xuerui, et al.
Published: (2025)
CharTool: Tool-Integrated Visual Reasoning for Chart Understanding
by: Zhang, Situo, et al.
Published: (2026)
by: Zhang, Situo, et al.
Published: (2026)
Non-myopic Generation of Language Models for Reasoning and Planning
by: Ma, Chang, et al.
Published: (2024)
by: Ma, Chang, et al.
Published: (2024)
Safety Alignment via Constrained Knowledge Unlearning
by: Shi, Zesheng, et al.
Published: (2025)
by: Shi, Zesheng, et al.
Published: (2025)
DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use
by: Chen, Aili, et al.
Published: (2026)
by: Chen, Aili, et al.
Published: (2026)
Dr. RTL: Autonomous Agentic RTL Optimization through Tool-Grounded Self-Improvement
by: Fang, Wenji, et al.
Published: (2026)
by: Fang, Wenji, et al.
Published: (2026)
JT-DA: Enhancing Data Analysis with Tool-Integrated Table Reasoning Large Language Models
by: Chi, Ce, et al.
Published: (2025)
by: Chi, Ce, et al.
Published: (2025)
On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation
by: Huang, Kexin, et al.
Published: (2026)
by: Huang, Kexin, et al.
Published: (2026)
Guided by Trajectories: Repairing and Rewarding Tool-Use Trajectories for Tool-Integrated Reasoning
by: Gong, Siyu, et al.
Published: (2026)
by: Gong, Siyu, et al.
Published: (2026)
Dissecting Tool-Integrated Reasoning: An Empirical Study and Analysis
by: Zhao, Yufeng, et al.
Published: (2025)
by: Zhao, Yufeng, et al.
Published: (2025)
Team-Based Self-Play With Dual Adaptive Weighting for Fine-Tuning LLMs
by: Li, Wu, et al.
Published: (2026)
by: Li, Wu, et al.
Published: (2026)
S$^2$-MLLM: Boosting Spatial Reasoning Capability of MLLMs for 3D Visual Grounding with Structural Guidance
by: Xu, Beining, et al.
Published: (2025)
by: Xu, Beining, et al.
Published: (2025)
MTSA: Multi-turn Safety Alignment for LLMs through Multi-round Red-teaming
by: Guo, Weiyang, et al.
Published: (2025)
by: Guo, Weiyang, et al.
Published: (2025)
AnomalyClaw: A Universal Visual Anomaly Detection Agent via Tool-Grounded Refutation
by: Jiang, Xi, et al.
Published: (2026)
by: Jiang, Xi, et al.
Published: (2026)
Personalizing LLMs with Binary Feedback: A Preference-Corrected Optimization Framework
by: Ma, Xilai, et al.
Published: (2026)
by: Ma, Xilai, et al.
Published: (2026)
MindWatcher: Toward Smarter Multimodal Tool-Integrated Reasoning
by: Chen, Jiawei, et al.
Published: (2025)
by: Chen, Jiawei, et al.
Published: (2025)
MedCoAct: Confidence-Aware Multi-Agent Collaboration for Complete Clinical Decision
by: Zheng, Hongjie, et al.
Published: (2025)
by: Zheng, Hongjie, et al.
Published: (2025)
Faithful-First Reasoning, Planning, and Acting for Multimodal LLMs
by: Li, Junxian, et al.
Published: (2025)
by: Li, Junxian, et al.
Published: (2025)
From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning
by: Huang, Yuzhen, et al.
Published: (2025)
by: Huang, Yuzhen, et al.
Published: (2025)
Amortized Reasoning Tree Search: Decoupling Proposal and Decision in Large Language Models
by: Hong, Zesheng, et al.
Published: (2026)
by: Hong, Zesheng, et al.
Published: (2026)
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction
by: Li, Junlong, et al.
Published: (2025)
by: Li, Junlong, et al.
Published: (2025)
Evolutionary Discovery of Heuristic Policies for Traffic Signal Control
by: Wang, Ruibing, et al.
Published: (2025)
by: Wang, Ruibing, et al.
Published: (2025)
Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models
by: Chen, Zhipeng, et al.
Published: (2025)
by: Chen, Zhipeng, et al.
Published: (2025)
EigentSearch-Q+: Enhancing Deep Research Agents with Structured Reasoning Tools
by: Zhang, Boer, et al.
Published: (2026)
by: Zhang, Boer, et al.
Published: (2026)
Understanding Tool-Integrated Reasoning
by: Lin, Heng, et al.
Published: (2025)
by: Lin, Heng, et al.
Published: (2025)
PORTool: Importance-Aware Policy Optimization with Rewarded Tree for Multi-Tool-Integrated Reasoning
by: Wu, Feijie, et al.
Published: (2025)
by: Wu, Feijie, et al.
Published: (2025)
Evolving from Tool User to Creator via Training-Free Experience Reuse in Multimodal Reasoning
by: Shen, Xintian, et al.
Published: (2026)
by: Shen, Xintian, et al.
Published: (2026)
DeepTool: Scaling Interleaved Deliberation in Tool-Integrated Reasoning via Process-Supervised Reinforcement Learning
by: He, Yang, et al.
Published: (2026)
by: He, Yang, et al.
Published: (2026)
CAREAgent: Clinical Agent with Structured Reasoning and Tool-Integrated for Order Generation
by: Hou, Ruihui, et al.
Published: (2026)
by: Hou, Ruihui, et al.
Published: (2026)
ToolMind Technical Report: A Large-Scale, Reasoning-Enhanced Tool-Use Dataset
by: Yang, Chen, et al.
Published: (2025)
by: Yang, Chen, et al.
Published: (2025)
JudgeSQL: Reasoning over SQL Candidates with Weighted Consensus Tournament
by: Bai, Jiayuan, et al.
Published: (2025)
by: Bai, Jiayuan, et al.
Published: (2025)
LOCA-bench: Benchmarking Language Agents Under Controllable and Extreme Context Growth
by: Zeng, Weihao, et al.
Published: (2026)
by: Zeng, Weihao, et al.
Published: (2026)
User-Aware Conditional Generative Total Correlation Learning for Multi-Modal Recommendation
by: Du, Jing, et al.
Published: (2026)
by: Du, Jing, et al.
Published: (2026)
Similar Items
-
Backdoors in RLVR: Jailbreak Backdoors in LLMs From Verifiable Reward
by: Guo, Weiyang, et al.
Published: (2026) -
PruneTIR: Inference-Time Tool Call Pruning for Effective yet Efficient Tool-Integrated Reasoning
by: Zhang, Luan, et al.
Published: (2026) -
Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools
by: Wu, Junde, et al.
Published: (2025) -
MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching
by: Qu, Changle, et al.
Published: (2026) -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
by: Zeng, Weihao, et al.
Published: (2024)