Saved in:
| Main Authors: | Zhang, Xiaoyun, Yuan, Xiaojian, Huang, Di, You, Wang, Hu, Chen, Ruan, Jingqing, Jian, Ai, Chen, Kejiang, Hu, Xing |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.10959 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Reasoner for Real-World Event Detection: Scaling Reinforcement Learning via Adaptive Perplexity-Aware Sampling Strategy
by: Zhang, Xiaoyun, et al.
Published: (2025)
by: Zhang, Xiaoyun, et al.
Published: (2025)
PaTaRM: Bridging Pairwise and Pointwise Signals via Preference-Aware Task-Adaptive Reward Modeling
by: Jian, Ai, et al.
Published: (2025)
by: Jian, Ai, et al.
Published: (2025)
Explainable Reinforcement Learning via a Causal World Model
by: Yu, Zhongwei, et al.
Published: (2023)
by: Yu, Zhongwei, et al.
Published: (2023)
TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas
by: Jian, Ai, et al.
Published: (2026)
by: Jian, Ai, et al.
Published: (2026)
Prefix Guidance: A Steering Wheel for Large Language Models to Defend Against Jailbreak Attacks
by: Zhao, Jiawei, et al.
Published: (2024)
by: Zhao, Jiawei, et al.
Published: (2024)
Learning Causal Dynamics Models in Object-Oriented Environments
by: Yu, Zhongwei, et al.
Published: (2024)
by: Yu, Zhongwei, et al.
Published: (2024)
When to Continue Thinking: Adaptive Thinking Mode Switching for Efficient Reasoning
by: Zhang, Xiaoyun, et al.
Published: (2025)
by: Zhang, Xiaoyun, et al.
Published: (2025)
MSARS: A Meta-Learning and Reinforcement Learning Framework for SLO Resource Allocation and Adaptive Scaling for Microservices
by: Hu, Kan, et al.
Published: (2024)
by: Hu, Kan, et al.
Published: (2024)
Unlock the Potential of Fine-grained LLM Serving via Dynamic Module Scaling
by: Wu, Jingfeng, et al.
Published: (2025)
by: Wu, Jingfeng, et al.
Published: (2025)
Revisiting Entropy in Reinforcement Learning for Large Reasoning Models
by: Jin, Renren, et al.
Published: (2025)
by: Jin, Renren, et al.
Published: (2025)
A Closer Look at Machine Unlearning for Large Language Models
by: Yuan, Xiaojian, et al.
Published: (2024)
by: Yuan, Xiaojian, et al.
Published: (2024)
Silent Guardian: Protecting Text from Malicious Exploitation by Large Language Models
by: Zhao, Jiawei, et al.
Published: (2023)
by: Zhao, Jiawei, et al.
Published: (2023)
Learning Top-k Subtask Planning Tree based on Discriminative Representation Pre-training for Decision Making
by: Ruan, Jingqing, et al.
Published: (2023)
by: Ruan, Jingqing, et al.
Published: (2023)
The Relationship Between Grip Strength and Cognitive Impairment: Evidence From NHANES 2011–2014
by: Wenyi Nie, et al.
Published: (2025)
by: Wenyi Nie, et al.
Published: (2025)
Unlocking High‐Concentration PET Upcycling via Site‐Decoupled Copper Catalysis
by: Chuan Gang, et al.
Published: (2025)
by: Chuan Gang, et al.
Published: (2025)
The Exploration-Exploitation Dilemma Revisited: An Entropy Perspective
by: Yan, Renye, et al.
Published: (2024)
by: Yan, Renye, et al.
Published: (2024)
Harmonizing Dense and Sparse Signals in Multi-turn RL: Dual-Horizon Credit Assignment for Industrial Sales Agents
by: Yang, Haojin, et al.
Published: (2026)
by: Yang, Haojin, et al.
Published: (2026)
TD3-Sched: Learning to Orchestrate Container-based Cloud-Edge Resources via Distributed Reinforcement Learning
by: Song, Shengye, et al.
Published: (2025)
by: Song, Shengye, et al.
Published: (2025)
Decomposing the Entropy-Performance Exchange: The Missing Keys to Unlocking Effective Reinforcement Learning
by: Deng, Jia, et al.
Published: (2025)
by: Deng, Jia, et al.
Published: (2025)
On the Vulnerability of Text Sanitization
by: Tong, Meng, et al.
Published: (2024)
by: Tong, Meng, et al.
Published: (2024)
McKean-Vlasov SDEs with Singular Coefficients and Distribution Dependent Noise: Well-posedness and Regularity
by: Huang, Xing
Published: (2023)
by: Huang, Xing
Published: (2023)
Revisiting Cross-Architecture Distillation: Adaptive Dual-Teacher Transfer for Lightweight Video Models
by: Peng, Ying, et al.
Published: (2025)
by: Peng, Ying, et al.
Published: (2025)
Memory-Augmented LLM-based Multi-Agent System for Automated Feature Generation on Tabular Data
by: Dong, Fengxian, et al.
Published: (2026)
by: Dong, Fengxian, et al.
Published: (2026)
SAES-SVD: Self-Adaptive Suppression of Accumulated and Local Errors for SVD-based LLM Compression
by: Hu, Xing, et al.
Published: (2026)
by: Hu, Xing, et al.
Published: (2026)
Predicting LLM Output Length via Entropy-Guided Representations
by: Xie, Huanyi, et al.
Published: (2026)
by: Xie, Huanyi, et al.
Published: (2026)
Rationality Measurement and Theory for Reinforcement Learning Agents
by: Qian, Kejiang, et al.
Published: (2026)
by: Qian, Kejiang, et al.
Published: (2026)
ToTRL: Unlock LLM Tree-of-Thoughts Reasoning Potential through Puzzles Solving
by: Wu, Haoyuan, et al.
Published: (2025)
by: Wu, Haoyuan, et al.
Published: (2025)
Unlocking the Potential of the RUBY Reporter System: How to Address Its Challenges in Plant‐Environment Interaction Research?
by: Zijian Hu, et al.
Published: (2025)
by: Zijian Hu, et al.
Published: (2025)
Safety Alignment of Large Language Models via Contrasting Safe and Harmful Distributions
by: Zhang, Xiaoyun, et al.
Published: (2024)
by: Zhang, Xiaoyun, et al.
Published: (2024)
State Entropy Regularization for Robust Reinforcement Learning
by: Ashlag, Yonatan, et al.
Published: (2025)
by: Ashlag, Yonatan, et al.
Published: (2025)
Supervised Fine-Tuning Needs to Unlock the Potential of Token Priority
by: Shen, Zhanming, et al.
Published: (2026)
by: Shen, Zhanming, et al.
Published: (2026)
Revisiting Data Augmentation in Deep Reinforcement Learning
by: Hu, Jianshu, et al.
Published: (2024)
by: Hu, Jianshu, et al.
Published: (2024)
Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning
by: Ma, Hao, et al.
Published: (2024)
by: Ma, Hao, et al.
Published: (2024)
Learning to Compress: Unlocking the Potential of Large Language Models for Text Representation
by: Zhang, Yeqin, et al.
Published: (2025)
by: Zhang, Yeqin, et al.
Published: (2025)
GHPO: Adaptive Guidance for Stable and Efficient LLM Reinforcement Learning
by: Liu, Ziru, et al.
Published: (2025)
by: Liu, Ziru, et al.
Published: (2025)
GELATO: Generative Entropy- and Lyapunov-based Adaptive Token Offloading for Device-Edge Speculative LLM Inference
by: Tang, Zengzipeng, et al.
Published: (2026)
by: Tang, Zengzipeng, et al.
Published: (2026)
Set‐membership state estimation for complex networks with chance constraints under multi‐modal deception attacks
by: Miaomiao Shi, et al.
Published: (2024)
by: Miaomiao Shi, et al.
Published: (2024)
StepFun-Formalizer: Unlocking the Autoformalization Potential of LLMs through Knowledge-Reasoning Fusion
by: Wu, Yutong, et al.
Published: (2025)
by: Wu, Yutong, et al.
Published: (2025)
SHAPE: Stage-aware Hierarchical Advantage via Potential Estimation for LLM Reasoning
by: Ai, Zhengyang, et al.
Published: (2026)
by: Ai, Zhengyang, et al.
Published: (2026)
SCOPE: Signal-Calibrated On-Policy Distillation Enhancement with Dual-Path Adaptive Weighting
by: Zheng, Binbin, et al.
Published: (2026)
by: Zheng, Binbin, et al.
Published: (2026)
Similar Items
-
Reasoner for Real-World Event Detection: Scaling Reinforcement Learning via Adaptive Perplexity-Aware Sampling Strategy
by: Zhang, Xiaoyun, et al.
Published: (2025) -
PaTaRM: Bridging Pairwise and Pointwise Signals via Preference-Aware Task-Adaptive Reward Modeling
by: Jian, Ai, et al.
Published: (2025) -
Explainable Reinforcement Learning via a Causal World Model
by: Yu, Zhongwei, et al.
Published: (2023) -
TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas
by: Jian, Ai, et al.
Published: (2026) -
Prefix Guidance: A Steering Wheel for Large Language Models to Defend Against Jailbreak Attacks
by: Zhao, Jiawei, et al.
Published: (2024)