Saved in:
| Main Authors: | Wu, Jiahao, Lu, Ning, Liu, Shengcai, Wang, Kun, Yang, Yanting, Qing, Li, Tang, Ke |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.25184 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Backdoor Graph Condensation
by: Wu, Jiahao, et al.
Published: (2024)
by: Wu, Jiahao, et al.
Published: (2024)
Policy and World Modeling Co-Training for Language Agents
by: Lu, Ning, et al.
Published: (2026)
by: Lu, Ning, et al.
Published: (2026)
Small Generalizable Prompt Predictive Models Can Steer Efficient RL Post-Training of Large Reasoning Models
by: Qu, Yun, et al.
Published: (2026)
by: Qu, Yun, et al.
Published: (2026)
Learning-Zone Energy: Online Data Selection for Efficient RL Post-Training
by: Cui, Peng, et al.
Published: (2026)
by: Cui, Peng, et al.
Published: (2026)
Large Language Models can be Guided to Evade AI-Generated Text Detection
by: Lu, Ning, et al.
Published: (2023)
by: Lu, Ning, et al.
Published: (2023)
GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL
by: Yang, Rui, et al.
Published: (2026)
by: Yang, Rui, et al.
Published: (2026)
TF-DCon: Leveraging Large Language Models (LLMs) to Empower Training-Free Dataset Condensation for Content-Based Recommendation
by: Wu, Jiahao, et al.
Published: (2023)
by: Wu, Jiahao, et al.
Published: (2023)
TRON: Targeted Rule-Verifiable Online Environments for Visual Reasoning RL
by: Yang, Tianze, et al.
Published: (2026)
by: Yang, Tianze, et al.
Published: (2026)
Safe Delta: Consistently Preserving Safety when Fine-Tuning LLMs on Diverse Datasets
by: Lu, Ning, et al.
Published: (2025)
by: Lu, Ning, et al.
Published: (2025)
Neural QAOA$^{2}$: Differentiable Joint Graph Partitioning and Parameter Initialization for Quantum Combinatorial Optimization
by: Zheng, Zubin, et al.
Published: (2026)
by: Zheng, Zubin, et al.
Published: (2026)
LLM-Driven Instance-Specific Heuristic Generation and Selection
by: Zhang, Shaofeng, et al.
Published: (2025)
by: Zhang, Shaofeng, et al.
Published: (2025)
Less is More: Understanding Word-level Textual Adversarial Attack via n-gram Frequency Descend
by: Lu, Ning, et al.
Published: (2023)
by: Lu, Ning, et al.
Published: (2023)
Causal Consistency Regularization: Training Verifiably Sensitive Reasoning in Large Language Models
by: Akter, Sanjeda, et al.
Published: (2025)
by: Akter, Sanjeda, et al.
Published: (2025)
Is PRM Necessary? Problem-Solving RL Implicitly Induces PRM Capability in LLMs
by: Feng, Zhangying, et al.
Published: (2025)
by: Feng, Zhangying, et al.
Published: (2025)
Prompt-Driven Low-Altitude Edge Intelligence: Modular Agents and Generative Reasoning
by: You, Jiahao, et al.
Published: (2026)
by: You, Jiahao, et al.
Published: (2026)
Flow-GRPO: Training Flow Matching Models via Online RL
by: Liu, Jie, et al.
Published: (2025)
by: Liu, Jie, et al.
Published: (2025)
SortedRL: Accelerating RL Training for LLMs through Online Length-Aware Scheduling
by: Zhang, Yiqi, et al.
Published: (2026)
by: Zhang, Yiqi, et al.
Published: (2026)
SemDiff: Generating Natural Unrestricted Adversarial Examples via Semantic Attributes Optimization in Diffusion Models
by: Dai, Zeyu, et al.
Published: (2025)
by: Dai, Zeyu, et al.
Published: (2025)
MindSpeed RL: Distributed Dataflow for Scalable and Efficient RL Training on Ascend NPU Cluster
by: Feng, Laingjun, et al.
Published: (2025)
by: Feng, Laingjun, et al.
Published: (2025)
How to Compress KV Cache in RL Post-Training? Shadow Mask Distillation for Memory-Efficient Alignment
by: Zhu, Rui, et al.
Published: (2026)
by: Zhu, Rui, et al.
Published: (2026)
SkyRL-Agent: Efficient RL Training for Multi-turn LLM Agent
by: Cao, Shiyi, et al.
Published: (2025)
by: Cao, Shiyi, et al.
Published: (2025)
Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter
by: Hu, Qinghao, et al.
Published: (2025)
by: Hu, Qinghao, et al.
Published: (2025)
On the Optimal Reasoning Length for RL-Trained Language Models
by: Nohara, Daisuke, et al.
Published: (2026)
by: Nohara, Daisuke, et al.
Published: (2026)
Rethinking Reasoning Quality in Large Language Models through Enhanced Chain-of-Thought via RL
by: He, Haoyang, et al.
Published: (2025)
by: He, Haoyang, et al.
Published: (2025)
Entropy-Gradient Inversion: Moving Toward Internal Mechanism of Large Reasoning Models
by: Yang, Junyao, et al.
Published: (2026)
by: Yang, Junyao, et al.
Published: (2026)
Ancestral Mamba: Enhancing Selective Discriminant Space Model with Online Visual Prototype Learning for Efficient and Robust Discriminant Approach
by: Qin, Jiahao, et al.
Published: (2025)
by: Qin, Jiahao, et al.
Published: (2025)
On Designing Effective RL Reward at Training Time for LLM Reasoning
by: Gao, Jiaxuan, et al.
Published: (2024)
by: Gao, Jiaxuan, et al.
Published: (2024)
LlamaRL: A Distributed Asynchronous Reinforcement Learning Framework for Efficient Large-scale LLM Training
by: Wu, Bo, et al.
Published: (2025)
by: Wu, Bo, et al.
Published: (2025)
Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models?
by: Qu, Yun, et al.
Published: (2025)
by: Qu, Yun, et al.
Published: (2025)
Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models
by: Zhang, Jun, et al.
Published: (2025)
by: Zhang, Jun, et al.
Published: (2025)
SAMEdge: An Edge-cloud Video Analytics Architecture for the Segment Anything Model
by: Lu, Rui, et al.
Published: (2024)
by: Lu, Rui, et al.
Published: (2024)
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
by: Chen, Mengzhao, et al.
Published: (2024)
by: Chen, Mengzhao, et al.
Published: (2024)
Verifying Large Language Models' Reasoning Paths via Correlation Matrix Rank
by: Liu, Jiayu, et al.
Published: (2025)
by: Liu, Jiayu, et al.
Published: (2025)
TrainVerify: Equivalence-Based Verification for Distributed LLM Training
by: Lu, Yunchi, et al.
Published: (2025)
by: Lu, Yunchi, et al.
Published: (2025)
Angles Don't Lie: Unlocking Training-Efficient RL Through the Model's Own Signals
by: Wang, Qinsi, et al.
Published: (2025)
by: Wang, Qinsi, et al.
Published: (2025)
Know What You Know: Metacognitive Entropy Calibration for Verifiable RL Reasoning
by: Zhao, Qiannian, et al.
Published: (2026)
by: Zhao, Qiannian, et al.
Published: (2026)
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
by: Nakamoto, Mitsuhiko, et al.
Published: (2023)
by: Nakamoto, Mitsuhiko, et al.
Published: (2023)
AHD Agent: Agentic Reinforcement Learning for Automatic Heuristic Design
by: Lv, Haoze, et al.
Published: (2026)
by: Lv, Haoze, et al.
Published: (2026)
Native Reasoning Models: Training Language Models to Reason on Unverifiable Data
by: Wang, Yuanfu, et al.
Published: (2026)
by: Wang, Yuanfu, et al.
Published: (2026)
Revisiting Model Interpolation for Efficient Reasoning
by: Wu, Taiqiang, et al.
Published: (2025)
by: Wu, Taiqiang, et al.
Published: (2025)
Similar Items
-
Backdoor Graph Condensation
by: Wu, Jiahao, et al.
Published: (2024) -
Policy and World Modeling Co-Training for Language Agents
by: Lu, Ning, et al.
Published: (2026) -
Small Generalizable Prompt Predictive Models Can Steer Efficient RL Post-Training of Large Reasoning Models
by: Qu, Yun, et al.
Published: (2026) -
Learning-Zone Energy: Online Data Selection for Efficient RL Post-Training
by: Cui, Peng, et al.
Published: (2026) -
Large Language Models can be Guided to Evade AI-Generated Text Detection
by: Lu, Ning, et al.
Published: (2023)