Guardado en:
| Autores principales: | Feng, Zihao, Wang, Xiaoxue, Wu, Bowen, Cao, Hailong, Zhao, Tiejun, Yu, Qun, Wang, Baoxun |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2509.14718 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Improving Generalization in Intent Detection: GRPO with Reward-Based Curriculum Sampling
por: Feng, Zihao, et al.
Publicado: (2025)
por: Feng, Zihao, et al.
Publicado: (2025)
Empowering LLMs in Task-Oriented Dialogues: A Domain-Independent Multi-Agent Framework and Fine-Tuning Strategy
por: Feng, Zihao, et al.
Publicado: (2025)
por: Feng, Zihao, et al.
Publicado: (2025)
ToolRL: Reward is All Tool Learning Needs
por: Qian, Cheng, et al.
Publicado: (2025)
por: Qian, Cheng, et al.
Publicado: (2025)
AdaCuRL: Adaptive Curriculum Reinforcement Learning with Invalid Sample Mitigation and Historical Revisiting
por: Li, Renda, et al.
Publicado: (2025)
por: Li, Renda, et al.
Publicado: (2025)
DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training
por: Wang, Zhenting, et al.
Publicado: (2025)
por: Wang, Zhenting, et al.
Publicado: (2025)
RAIDEN-R1: Improving Role-awareness of LLMs via GRPO with Verifiable Reward
por: Wang, Zongsheng, et al.
Publicado: (2025)
por: Wang, Zongsheng, et al.
Publicado: (2025)
ToolACE-R: Model-aware Iterative Training and Adaptive Refinement for Tool Learning
por: Zeng, Xingshan, et al.
Publicado: (2025)
por: Zeng, Xingshan, et al.
Publicado: (2025)
ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning
por: Lin, Zihan, et al.
Publicado: (2026)
por: Lin, Zihan, et al.
Publicado: (2026)
Divide-Then-Aggregate: An Efficient Tool Learning Method via Parallel Tool Invocation
por: Zhu, Dongsheng, et al.
Publicado: (2025)
por: Zhu, Dongsheng, et al.
Publicado: (2025)
Learning to Reason as Action Abstractions with Scalable Mid-Training RL
por: Zhang, Shenao, et al.
Publicado: (2025)
por: Zhang, Shenao, et al.
Publicado: (2025)
Enhancing Large Language Models'Machine Translation via Dynamic Focus Anchoring
por: Ding, Qiuyu, et al.
Publicado: (2025)
por: Ding, Qiuyu, et al.
Publicado: (2025)
Tool Learning with Foundation Models
por: Qin, Yujia, et al.
Publicado: (2023)
por: Qin, Yujia, et al.
Publicado: (2023)
Speculative Decoding Meets Quantization: Compatibility Evaluation and Hierarchical Framework Design
por: Zhang, Yudi, et al.
Publicado: (2025)
por: Zhang, Yudi, et al.
Publicado: (2025)
ToolExpander: Extending the Frontiers of Tool-Using Reinforcement Learning to Weak LLMs
por: Chen, Fu, et al.
Publicado: (2025)
por: Chen, Fu, et al.
Publicado: (2025)
AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning
por: Zou, Jiaru, et al.
Publicado: (2025)
por: Zou, Jiaru, et al.
Publicado: (2025)
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error
por: Wang, Boshi, et al.
Publicado: (2024)
por: Wang, Boshi, et al.
Publicado: (2024)
iTool: Reinforced Fine-Tuning with Dynamic Deficiency Calibration for Advanced Tool Use
por: Zeng, Yirong, et al.
Publicado: (2025)
por: Zeng, Yirong, et al.
Publicado: (2025)
MetaTool: Facilitating Large Language Models to Master Tools with Meta-task Augmentation
por: Wang, Xiaohan, et al.
Publicado: (2024)
por: Wang, Xiaohan, et al.
Publicado: (2024)
Learning Harmonized Representations for Speculative Sampling
por: Zhang, Lefan, et al.
Publicado: (2024)
por: Zhang, Lefan, et al.
Publicado: (2024)
DISA: Offline Importance Sampling for Distribution-Matching LLM-RL
por: Wang, Shaobo, et al.
Publicado: (2026)
por: Wang, Shaobo, et al.
Publicado: (2026)
Cross-Domain Bilingual Lexicon Induction via Pretrained Language Models
por: Ding, Qiuyu, et al.
Publicado: (2025)
por: Ding, Qiuyu, et al.
Publicado: (2025)
VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models
por: Jiang, Guochao, et al.
Publicado: (2025)
por: Jiang, Guochao, et al.
Publicado: (2025)
Interpersonal Memory Matters: A New Task for Proactive Dialogue Utilizing Conversational History
por: Wu, Bowen, et al.
Publicado: (2025)
por: Wu, Bowen, et al.
Publicado: (2025)
ToolACE: Winning the Points of LLM Function Calling
por: Liu, Weiwen, et al.
Publicado: (2024)
por: Liu, Weiwen, et al.
Publicado: (2024)
CL4KGE: A Curriculum Learning Method for Knowledge Graph Embedding
por: Liu, Yang, et al.
Publicado: (2024)
por: Liu, Yang, et al.
Publicado: (2024)
Are Tools Always Beneficial? Learning to Invoke Tools Adaptively for Dual-Mode Multimodal LLM Reasoning
por: Ma, Qinghe, et al.
Publicado: (2026)
por: Ma, Qinghe, et al.
Publicado: (2026)
Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning
por: Xu, Ran, et al.
Publicado: (2025)
por: Xu, Ran, et al.
Publicado: (2025)
Mirage or Method? How Model-Task Alignment Induces Divergent RL Conclusions
por: Wu, Haoze, et al.
Publicado: (2025)
por: Wu, Haoze, et al.
Publicado: (2025)
EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL
por: Xu, Minrui, et al.
Publicado: (2026)
por: Xu, Minrui, et al.
Publicado: (2026)
Reinforcement Learning for Tool-Integrated Interleaved Thinking towards Cross-Domain Generalization
por: Chen, Zhengyu, et al.
Publicado: (2025)
por: Chen, Zhengyu, et al.
Publicado: (2025)
OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning
por: Lu, Pan, et al.
Publicado: (2025)
por: Lu, Pan, et al.
Publicado: (2025)
ToolACE-MT: Non-Autoregressive Generation for Agentic Multi-Turn Interaction
por: Zeng, Xingshan, et al.
Publicado: (2025)
por: Zeng, Xingshan, et al.
Publicado: (2025)
Generalizable End-to-End Tool-Use RL with Synthetic CodeGym
por: Du, Weihua, et al.
Publicado: (2025)
por: Du, Weihua, et al.
Publicado: (2025)
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use
por: Goldie, Anna, et al.
Publicado: (2025)
por: Goldie, Anna, et al.
Publicado: (2025)
When Sharpening Becomes Collapse: Sampling Bias and Semantic Coupling in RL with Verifiable Rewards
por: Fan, Mingyuan, et al.
Publicado: (2026)
por: Fan, Mingyuan, et al.
Publicado: (2026)
PromptAL: Sample-Aware Dynamic Soft Prompts for Few-Shot Active Learning
por: Xiang, Hui, et al.
Publicado: (2025)
por: Xiang, Hui, et al.
Publicado: (2025)
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning
por: Dong, Guanting, et al.
Publicado: (2025)
por: Dong, Guanting, et al.
Publicado: (2025)
Demystifying Reinforcement Learning for Long-Horizon Tool-Using Agents: A Comprehensive Recipe
por: Wu, Xixi, et al.
Publicado: (2026)
por: Wu, Xixi, et al.
Publicado: (2026)
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings
por: Hao, Shibo, et al.
Publicado: (2023)
por: Hao, Shibo, et al.
Publicado: (2023)
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
por: Shi, Taiwei, et al.
Publicado: (2025)
por: Shi, Taiwei, et al.
Publicado: (2025)
Ejemplares similares
-
Improving Generalization in Intent Detection: GRPO with Reward-Based Curriculum Sampling
por: Feng, Zihao, et al.
Publicado: (2025) -
Empowering LLMs in Task-Oriented Dialogues: A Domain-Independent Multi-Agent Framework and Fine-Tuning Strategy
por: Feng, Zihao, et al.
Publicado: (2025) -
ToolRL: Reward is All Tool Learning Needs
por: Qian, Cheng, et al.
Publicado: (2025) -
AdaCuRL: Adaptive Curriculum Reinforcement Learning with Invalid Sample Mitigation and Historical Revisiting
por: Li, Renda, et al.
Publicado: (2025) -
DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training
por: Wang, Zhenting, et al.
Publicado: (2025)