:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Xiaoyun, Yuan, Xiaojian, Huang, Di, You, Wang, Hu, Chen, Ruan, Jingqing, Jian, Ai, Chen, Kejiang, Hu, Xing
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2510.10959
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Reasoner for Real-World Event Detection: Scaling Reinforcement Learning via Adaptive Perplexity-Aware Sampling Strategy
by: Zhang, Xiaoyun, et al.
Published: (2025)

PaTaRM: Bridging Pairwise and Pointwise Signals via Preference-Aware Task-Adaptive Reward Modeling
by: Jian, Ai, et al.
Published: (2025)

Explainable Reinforcement Learning via a Causal World Model
by: Yu, Zhongwei, et al.
Published: (2023)

TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas
by: Jian, Ai, et al.
Published: (2026)

Prefix Guidance: A Steering Wheel for Large Language Models to Defend Against Jailbreak Attacks
by: Zhao, Jiawei, et al.
Published: (2024)

Learning Causal Dynamics Models in Object-Oriented Environments
by: Yu, Zhongwei, et al.
Published: (2024)

When to Continue Thinking: Adaptive Thinking Mode Switching for Efficient Reasoning
by: Zhang, Xiaoyun, et al.
Published: (2025)

MSARS: A Meta-Learning and Reinforcement Learning Framework for SLO Resource Allocation and Adaptive Scaling for Microservices
by: Hu, Kan, et al.
Published: (2024)

Unlock the Potential of Fine-grained LLM Serving via Dynamic Module Scaling
by: Wu, Jingfeng, et al.
Published: (2025)

Revisiting Entropy in Reinforcement Learning for Large Reasoning Models
by: Jin, Renren, et al.
Published: (2025)

A Closer Look at Machine Unlearning for Large Language Models
by: Yuan, Xiaojian, et al.
Published: (2024)

Silent Guardian: Protecting Text from Malicious Exploitation by Large Language Models
by: Zhao, Jiawei, et al.
Published: (2023)

Learning Top-k Subtask Planning Tree based on Discriminative Representation Pre-training for Decision Making
by: Ruan, Jingqing, et al.
Published: (2023)

The Relationship Between Grip Strength and Cognitive Impairment: Evidence From NHANES 2011–2014
by: Wenyi Nie, et al.
Published: (2025)

Unlocking High‐Concentration PET Upcycling via Site‐Decoupled Copper Catalysis
by: Chuan Gang, et al.
Published: (2025)

The Exploration-Exploitation Dilemma Revisited: An Entropy Perspective
by: Yan, Renye, et al.
Published: (2024)

Harmonizing Dense and Sparse Signals in Multi-turn RL: Dual-Horizon Credit Assignment for Industrial Sales Agents
by: Yang, Haojin, et al.
Published: (2026)

TD3-Sched: Learning to Orchestrate Container-based Cloud-Edge Resources via Distributed Reinforcement Learning
by: Song, Shengye, et al.
Published: (2025)

Decomposing the Entropy-Performance Exchange: The Missing Keys to Unlocking Effective Reinforcement Learning
by: Deng, Jia, et al.
Published: (2025)

On the Vulnerability of Text Sanitization
by: Tong, Meng, et al.
Published: (2024)

McKean-Vlasov SDEs with Singular Coefficients and Distribution Dependent Noise: Well-posedness and Regularity
by: Huang, Xing
Published: (2023)

Revisiting Cross-Architecture Distillation: Adaptive Dual-Teacher Transfer for Lightweight Video Models
by: Peng, Ying, et al.
Published: (2025)

Memory-Augmented LLM-based Multi-Agent System for Automated Feature Generation on Tabular Data
by: Dong, Fengxian, et al.
Published: (2026)

SAES-SVD: Self-Adaptive Suppression of Accumulated and Local Errors for SVD-based LLM Compression
by: Hu, Xing, et al.
Published: (2026)

Predicting LLM Output Length via Entropy-Guided Representations
by: Xie, Huanyi, et al.
Published: (2026)

Rationality Measurement and Theory for Reinforcement Learning Agents
by: Qian, Kejiang, et al.
Published: (2026)

ToTRL: Unlock LLM Tree-of-Thoughts Reasoning Potential through Puzzles Solving
by: Wu, Haoyuan, et al.
Published: (2025)

Unlocking the Potential of the RUBY Reporter System: How to Address Its Challenges in Plant‐Environment Interaction Research?
by: Zijian Hu, et al.
Published: (2025)

Safety Alignment of Large Language Models via Contrasting Safe and Harmful Distributions
by: Zhang, Xiaoyun, et al.
Published: (2024)

State Entropy Regularization for Robust Reinforcement Learning
by: Ashlag, Yonatan, et al.
Published: (2025)

Supervised Fine-Tuning Needs to Unlock the Potential of Token Priority
by: Shen, Zhanming, et al.
Published: (2026)

Revisiting Data Augmentation in Deep Reinforcement Learning
by: Hu, Jianshu, et al.
Published: (2024)

Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning
by: Ma, Hao, et al.
Published: (2024)

Learning to Compress: Unlocking the Potential of Large Language Models for Text Representation
by: Zhang, Yeqin, et al.
Published: (2025)

GHPO: Adaptive Guidance for Stable and Efficient LLM Reinforcement Learning
by: Liu, Ziru, et al.
Published: (2025)

GELATO: Generative Entropy- and Lyapunov-based Adaptive Token Offloading for Device-Edge Speculative LLM Inference
by: Tang, Zengzipeng, et al.
Published: (2026)

Set‐membership state estimation for complex networks with chance constraints under multi‐modal deception attacks
by: Miaomiao Shi, et al.
Published: (2024)

StepFun-Formalizer: Unlocking the Autoformalization Potential of LLMs through Knowledge-Reasoning Fusion
by: Wu, Yutong, et al.
Published: (2025)

SHAPE: Stage-aware Hierarchical Advantage via Potential Estimation for LLM Reasoning
by: Ai, Zhengyang, et al.
Published: (2026)

SCOPE: Signal-Calibrated On-Policy Distillation Enhancement with Dual-Path Adaptive Weighting
by: Zheng, Binbin, et al.
Published: (2026)