Saved in:
| Main Authors: | Wan, Yang, Cao, Zheng, Zhang, Zhenhao, Zeng, Zhengwen, Shen, Shuheng, Meng, Changhua, Zhu, Linchao |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.03664 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Mitigating Lost in Multi-turn Conversation via Curriculum RL with Verifiable Accuracy and Abstention Rewards
by: Li, Ming, et al.
Published: (2025)
by: Li, Ming, et al.
Published: (2025)
MVP: Multiple View Prediction Improves GUI Grounding
by: Zhang, Yunzhu, et al.
Published: (2025)
by: Zhang, Yunzhu, et al.
Published: (2025)
Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflection
by: Suo, Yucheng, et al.
Published: (2025)
by: Suo, Yucheng, et al.
Published: (2025)
Revisiting Modularity Maximization for Graph Clustering: A Contrastive Learning Perspective
by: Liu, Yunfei, et al.
Published: (2024)
by: Liu, Yunfei, et al.
Published: (2024)
Unified Generation and Self-Verification for Vision-Language Models via Advantage Decoupled Preference Optimization
by: Qiu, Xinyu, et al.
Published: (2026)
by: Qiu, Xinyu, et al.
Published: (2026)
Verifier-Free RL for LLMs via Intrinsic Gradient-Norm Reward
by: Wen, Xuexiang, et al.
Published: (2026)
by: Wen, Xuexiang, et al.
Published: (2026)
Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data
by: Zhao, Shuai, et al.
Published: (2025)
by: Zhao, Shuai, et al.
Published: (2025)
Mitigating Exposure Bias in Score-Based Generation of Molecular Conformations
by: Wang, Sijia, et al.
Published: (2024)
by: Wang, Sijia, et al.
Published: (2024)
Mitigating Relative Over-Generalization in Multi-Agent Reinforcement Learning
by: Zhu, Ting, et al.
Published: (2024)
by: Zhu, Ting, et al.
Published: (2024)
UI-Venus-1.5 Technical Report
by: Venus Team, et al.
Published: (2026)
by: Venus Team, et al.
Published: (2026)
Disentangling Shared and Task-Specific Representations from Multi-Modal Clinical Data
by: Lyu, He, et al.
Published: (2026)
by: Lyu, He, et al.
Published: (2026)
Hybrid Focal and Full-Range Attention Based Graph Transformers
by: Zhu, Minhong, et al.
Published: (2023)
by: Zhu, Minhong, et al.
Published: (2023)
Experience-Evolving Multi-Turn Tool-Use Agent with Hybrid Episodic-Procedural Memory
by: Li, Sijia, et al.
Published: (2025)
by: Li, Sijia, et al.
Published: (2025)
Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty
by: Hahn, Meera, et al.
Published: (2024)
by: Hahn, Meera, et al.
Published: (2024)
Multi-Turn Multi-Modal Question Clarification for Enhanced Conversational Understanding
by: Ramezan, Kimia, et al.
Published: (2025)
by: Ramezan, Kimia, et al.
Published: (2025)
RiemannGL: Riemannian Geometry Changes Graph Deep Learning
by: Sun, Li, et al.
Published: (2026)
by: Sun, Li, et al.
Published: (2026)
A Graph is Worth 1-bit Spikes: When Graph Contrastive Learning Meets Spiking Neural Networks
by: Li, Jintang, et al.
Published: (2023)
by: Li, Jintang, et al.
Published: (2023)
ATPO: Adaptive Tree Policy Optimization for Multi-Turn Medical Dialogue
by: Cao, Ruike, et al.
Published: (2026)
by: Cao, Ruike, et al.
Published: (2026)
Group Relative Knowledge Distillation: Learning from Teacher's Relational Inductive Bias
by: Li, Chao, et al.
Published: (2025)
by: Li, Chao, et al.
Published: (2025)
Turn-based Multi-Agent Reinforcement Learning Model Checking
by: Gross, Dennis
Published: (2025)
by: Gross, Dennis
Published: (2025)
When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs
by: Zeng, Yifan, et al.
Published: (2026)
by: Zeng, Yifan, et al.
Published: (2026)
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning
by: Xi, Zhiheng, et al.
Published: (2025)
by: Xi, Zhiheng, et al.
Published: (2025)
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning
by: Wang, Zihan, et al.
Published: (2025)
by: Wang, Zihan, et al.
Published: (2025)
Not All Turns Matter: Credit Assignment for Multi-Turn Jailbreaking
by: He, Zhida, et al.
Published: (2026)
by: He, Zhida, et al.
Published: (2026)
TSR: Trajectory-Search Rollouts for Multi-Turn RL of LLM Agents
by: Djuhera, Aladin, et al.
Published: (2026)
by: Djuhera, Aladin, et al.
Published: (2026)
Gym-Anything: Turn any Software into an Agent Environment
by: Aggarwal, Pranjal, et al.
Published: (2026)
by: Aggarwal, Pranjal, et al.
Published: (2026)
The Reasoning Trap: How Enhancing LLM Reasoning Amplifies Tool Hallucination
by: Yin, Chenlong, et al.
Published: (2025)
by: Yin, Chenlong, et al.
Published: (2025)
Not All Turns Are Equally Hard: Adaptive Thinking Budgets For Efficient Multi-Turn Reasoning
by: Jali, Neharika, et al.
Published: (2026)
by: Jali, Neharika, et al.
Published: (2026)
Mitigating Overthinking in Large Reasoning Models via Difficulty-aware Reinforcement Learning
by: Wan, Qian, et al.
Published: (2026)
by: Wan, Qian, et al.
Published: (2026)
GUI-G$^2$: Gaussian Reward Modeling for GUI Grounding
by: Tang, Fei, et al.
Published: (2025)
by: Tang, Fei, et al.
Published: (2025)
Turning Black Box into White Box: Dataset Distillation Leaks
by: Chen, Huajie, et al.
Published: (2026)
by: Chen, Huajie, et al.
Published: (2026)
APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay
by: Prabhakar, Akshara, et al.
Published: (2025)
by: Prabhakar, Akshara, et al.
Published: (2025)
How Do LLMs Persuade? Linear Probes Can Uncover Persuasion Dynamics in Multi-Turn Conversations
by: Jaipersaud, Brandon, et al.
Published: (2025)
by: Jaipersaud, Brandon, et al.
Published: (2025)
A Unified Perspective for Loss-Oriented Imbalanced Learning via Localization
by: Wang, Zitai, et al.
Published: (2023)
by: Wang, Zitai, et al.
Published: (2023)
OnGoal: Tracking and Visualizing Conversational Goals in Multi-Turn Dialogue with Large Language Models
by: Coscia, Adam, et al.
Published: (2025)
by: Coscia, Adam, et al.
Published: (2025)
AIM: Attributing, Interpreting, Mitigating Data Unfairness
by: Liu, Zhining, et al.
Published: (2024)
by: Liu, Zhining, et al.
Published: (2024)
Stabilizing Off-Policy Training for Long-Horizon LLM Agent via Turn-Level Importance Sampling and Clipping-Triggered Normalization
by: Li, Chenliang, et al.
Published: (2025)
by: Li, Chenliang, et al.
Published: (2025)
Automating Deception: Scalable Multi-Turn LLM Jailbreaks
by: Kumarappan, Adarsh, et al.
Published: (2025)
by: Kumarappan, Adarsh, et al.
Published: (2025)
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
by: Zhou, Yifei, et al.
Published: (2024)
by: Zhou, Yifei, et al.
Published: (2024)
TED: Turn Emphasis with Dialogue Feature Attention for Emotion Recognition in Conversation
by: Ono, Junya, et al.
Published: (2025)
by: Ono, Junya, et al.
Published: (2025)
Similar Items
-
Mitigating Lost in Multi-turn Conversation via Curriculum RL with Verifiable Accuracy and Abstention Rewards
by: Li, Ming, et al.
Published: (2025) -
MVP: Multiple View Prediction Improves GUI Grounding
by: Zhang, Yunzhu, et al.
Published: (2025) -
Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflection
by: Suo, Yucheng, et al.
Published: (2025) -
Revisiting Modularity Maximization for Graph Clustering: A Contrastive Learning Perspective
by: Liu, Yunfei, et al.
Published: (2024) -
Unified Generation and Self-Verification for Vision-Language Models via Advantage Decoupled Preference Optimization
by: Qiu, Xinyu, et al.
Published: (2026)