Saved in:
| Main Authors: | Wang, Haoyu, Ma, Guozheng, Cui, Shugang, Kong, Yilun, Luo, Haotian, Shen, Li, Gao, Mengya, Wu, Yichao, Wang, Xiaogang, Tao, Dacheng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.21754 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer
by: Kong, Yilun, et al.
Published: (2025)
by: Kong, Yilun, et al.
Published: (2025)
What Makes Value Learning Efficient in Residual Reinforcement Learning?
by: Ma, Guozheng, et al.
Published: (2026)
by: Ma, Guozheng, et al.
Published: (2026)
Rethinking the Role of Dynamic Sparse Training for Scalable Deep Reinforcement Learning
by: Ma, Guozheng, et al.
Published: (2025)
by: Ma, Guozheng, et al.
Published: (2025)
Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning
by: Ma, Guozheng, et al.
Published: (2025)
by: Ma, Guozheng, et al.
Published: (2025)
A Comprehensive Survey of Data Augmentation in Visual Reinforcement Learning
by: Ma, Guozheng, et al.
Published: (2022)
by: Ma, Guozheng, et al.
Published: (2022)
Towards Reliable Medical LLMs: Benchmarking and Enhancing Confidence Estimation of Large Language Models in Medical Consultation
by: Ren, Zhiyao, et al.
Published: (2026)
by: Ren, Zhiyao, et al.
Published: (2026)
A Simple "Motivation" Can Enhance Reinforcement Finetuning of Large Reasoning Models
by: Zhang, Junjie, et al.
Published: (2025)
by: Zhang, Junjie, et al.
Published: (2025)
Revisiting Plasticity in Visual Reinforcement Learning: Data, Modules and Training Stages
by: Ma, Guozheng, et al.
Published: (2023)
by: Ma, Guozheng, et al.
Published: (2023)
Concept-Guided Backdoor Attack on Vision Language Models
by: Shen, Haoyu, et al.
Published: (2025)
by: Shen, Haoyu, et al.
Published: (2025)
ELBA-Bench: An Efficient Learning Backdoor Attacks Benchmark for Large Language Models
by: Liu, Xuxu, et al.
Published: (2025)
by: Liu, Xuxu, et al.
Published: (2025)
STRIDE: Learnable Stepwise Language Feedback for LLM Reasoning
by: Zhang, Junjie, et al.
Published: (2026)
by: Zhang, Junjie, et al.
Published: (2026)
Safety Reasoning with Guidelines
by: Wang, Haoyu, et al.
Published: (2025)
by: Wang, Haoyu, et al.
Published: (2025)
Panacea: Mitigating Harmful Fine-tuning for Large Language Models via Post-fine-tuning Perturbation
by: Wang, Yibo, et al.
Published: (2025)
by: Wang, Yibo, et al.
Published: (2025)
Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training
by: Huang, Junqin, et al.
Published: (2024)
by: Huang, Junqin, et al.
Published: (2024)
OOP: Object-Oriented Programming Evaluation Benchmark for Large Language Models
by: Wang, Shuai, et al.
Published: (2024)
by: Wang, Shuai, et al.
Published: (2024)
Decentralized Clinical Trials in the Era of Real‐World Evidence: A Critical Assessment of Recent Experiences
by: Hongwei Wang, et al.
Published: (2025)
by: Hongwei Wang, et al.
Published: (2025)
The Role of miR‐124‐3p/UHRF1 in NaAsO 2 ‐Induced Apoptosis of LX‐2 Cells via DNMT1/SOCS1
by: Mengyao Zhang, et al.
Published: (2025)
by: Mengyao Zhang, et al.
Published: (2025)
QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning
by: Kong, Yilun, et al.
Published: (2024)
by: Kong, Yilun, et al.
Published: (2024)
SELF-EMO: Emotional Self-Evolution from Recognition to Consistent Expression
by: Zhang, Shaowei, et al.
Published: (2026)
by: Zhang, Shaowei, et al.
Published: (2026)
Ada-R1: Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization
by: Luo, Haotian, et al.
Published: (2025)
by: Luo, Haotian, et al.
Published: (2025)
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
by: Luo, Haotian, et al.
Published: (2025)
by: Luo, Haotian, et al.
Published: (2025)
The effect of air pollution and genetic susceptibility on systemic lupus erythematosus: comment on the article by Xing et al
by: Na Wang, et al.
Published: (2024)
by: Na Wang, et al.
Published: (2024)
Little ones can do big things: Small molecule inhibitors target PTPN2/PTPN1 for tumor immunotherapy
by: Junyu Wang, et al.
Published: (2024)
by: Junyu Wang, et al.
Published: (2024)
Paternal microbiota impacts offspring: health risks and reproductive insights
by: Junyu Wang, et al.
Published: (2024)
by: Junyu Wang, et al.
Published: (2024)
HiRegEx: Interactive Visual Query and Exploration of Multivariate Hierarchical Data
by: Li, Guozheng, et al.
Published: (2024)
by: Li, Guozheng, et al.
Published: (2024)
Adaptive Defense against Harmful Fine-Tuning for Large Language Models via Bayesian Data Scheduler
by: Hu, Zixuan, et al.
Published: (2025)
by: Hu, Zixuan, et al.
Published: (2025)
Two nonfinitely based additively idempotent semirings of order four
by: Yue, Mengya, et al.
Published: (2026)
by: Yue, Mengya, et al.
Published: (2026)
Haibu Mathematical-Medical Intelligent Agent:Enhancing Large Language Model Reliability in Medical Tasks via Verifiable Reasoning Chains
by: Zhang, Yilun, et al.
Published: (2025)
by: Zhang, Yilun, et al.
Published: (2025)
VideoTIR: Accurate Understanding for Long Videos with Efficient Tool-Integrated Reasoning
by: Gao, Zhe, et al.
Published: (2026)
by: Gao, Zhe, et al.
Published: (2026)
Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models
by: Wang, Wenbin, et al.
Published: (2024)
by: Wang, Wenbin, et al.
Published: (2024)
CredID: Credible Multi-Bit Watermark for Large Language Models Identification
by: Jiang, Haoyu, et al.
Published: (2024)
by: Jiang, Haoyu, et al.
Published: (2024)
Plasticine: Accelerating Research in Plasticity-Motivated Deep Reinforcement Learning
by: Yuan, Mingqi, et al.
Published: (2025)
by: Yuan, Mingqi, et al.
Published: (2025)
Catching Up Yet Still Falling Behind: Sources, Heterogeneity, and Implications of the Modest Female Educational Disadvantage in Rural China
by: Wensong Shen
Published: (2025)
by: Wensong Shen
Published: (2025)
Hi-Map: Hierarchical Factorized Radiance Field for High-Fidelity Monocular Dense Mapping
by: Hua, Tongyan, et al.
Published: (2024)
by: Hua, Tongyan, et al.
Published: (2024)
"Don't Fall Behind": A Unified Framework of Dynastic Survival, Two-Stage Belief Error, and the Modern Involution Trap
by: Yang, Dong
Published: (2025)
by: Yang, Dong
Published: (2025)
On the Evolution of Federated Post-Training Large Language Models: A Model Accessibility View
by: Guo, Tao, et al.
Published: (2025)
by: Guo, Tao, et al.
Published: (2025)
To Save Mobile Crowdsourcing from Cheap-talk: A Game Theoretic Learning Approach
by: Hao, Shugang, et al.
Published: (2023)
by: Hao, Shugang, et al.
Published: (2023)
Algorithm Design for Continual Learning in IoT Networks
by: Hao, Shugang, et al.
Published: (2024)
by: Hao, Shugang, et al.
Published: (2024)
Online Learning from Strategic Human Feedback in LLM Fine-Tuning
by: Hao, Shugang, et al.
Published: (2024)
by: Hao, Shugang, et al.
Published: (2024)
Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models
by: Lu, Qingyu, et al.
Published: (2023)
by: Lu, Qingyu, et al.
Published: (2023)
Similar Items
-
Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer
by: Kong, Yilun, et al.
Published: (2025) -
What Makes Value Learning Efficient in Residual Reinforcement Learning?
by: Ma, Guozheng, et al.
Published: (2026) -
Rethinking the Role of Dynamic Sparse Training for Scalable Deep Reinforcement Learning
by: Ma, Guozheng, et al.
Published: (2025) -
Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning
by: Ma, Guozheng, et al.
Published: (2025) -
A Comprehensive Survey of Data Augmentation in Visual Reinforcement Learning
by: Ma, Guozheng, et al.
Published: (2022)