:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Hao, Gu, Hao, Piao, Hongming, Gong, Kaixiong, Ye, Yuxiao, Yue, Xiangyu, Han, Sirui, Guo, Yike, Wu, Dapeng
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Computation and Language
Online Access:	https://arxiv.org/abs/2602.02244
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

TinyThinker: Distilling Reasoning through Coarse-to-Fine Knowledge Internalization with Self-Reflection
by: Piao, Shengmin, et al.
Published: (2024)

Basic Reading Distillation
by: Zhou, Zhi, et al.
Published: (2025)

Supervised Fine-Tuning as Inverse Reinforcement Learning
by: Sun, Hao
Published: (2024)

Rotation-Preserving Supervised Fine-Tuning
by: Jin, Hangzhan, et al.
Published: (2026)

Staying Healthy While You Are Pregnant
Published: (2025)

On-Policy Supervised Fine-Tuning for Efficient Reasoning
by: Zhao, Anhao, et al.
Published: (2026)

Reassessing the Role of Supervised Fine-Tuning: An Empirical Study in VLM Reasoning
by: Yu, Yongcan, et al.
Published: (2025)

Adaptive Federated Fine-Tuning of Self-Supervised Speech Representations
by: Guo, Xin, et al.
Published: (2026)

Semantic Loss Guided Data Efficient Supervised Fine Tuning for Safe Responses in LLMs
by: Lu, Yuxiao, et al.
Published: (2024)

Understanding Overadaptation in Supervised Fine-Tuning: The Role of Ensemble Methods
by: Hao, Yifan, et al.
Published: (2025)

Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
by: Zhang, Yiyuan, et al.
Published: (2024)

Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting
by: Diao, Muxi, et al.
Published: (2026)

Closed-Loop Supervised Fine-Tuning of Tokenized Traffic Models
by: Zhang, Zhejun, et al.
Published: (2024)

Learning to Stay Safe: Adaptive Regularization Against Safety Degradation during Fine-Tuning
by: Goel, Jyotin, et al.
Published: (2026)

Asymmetric Advantage Modulation Calibrates Entropy Dynamics in RLVR
by: Gu, Hengrui, et al.
Published: (2026)

OGLS-SD: On-Policy Self-Distillation with Outcome-Guided Logit Steering for LLM Reasoning
by: Yang, Yuxiao, et al.
Published: (2026)

On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning
by: Li, Zhaoyi, et al.
Published: (2026)

Video-R1: Reinforcing Video Reasoning in MLLMs
by: Feng, Kaituo, et al.
Published: (2025)

Preserving Diversity in Supervised Fine-Tuning of Large Language Models
by: Li, Ziniu, et al.
Published: (2024)

Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging
by: Li, Lujun, et al.
Published: (2025)

Supervised Fine-Tuning Needs to Unlock the Potential of Token Priority
by: Shen, Zhanming, et al.
Published: (2026)

BIFRÖST: 3D-Aware Image compositing with Language Instructions
by: Li, Lingxiao, et al.
Published: (2024)

How to Stay Curious while Avoiding Noisy TVs using Aleatoric Uncertainty Estimation
by: Mavor-Parker, Augustine N., et al.
Published: (2021)

Fine-Tuning Robot Policies While Maintaining User Privacy
by: Christie, Benjamin A., et al.
Published: (2025)

Preserving Multilingual Quality While Tuning Query Encoder on English Only
by: Vasilyev, Oleg, et al.
Published: (2024)

Token Cleaning: Fine-Grained Data Selection for LLM Supervised Fine-Tuning
by: Pang, Jinlong, et al.
Published: (2025)

InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
by: Jiang, Liming, et al.
Published: (2025)

Policy Gradient with Adaptive Entropy Annealing for Continual Fine-Tuning
by: Zhang, Yaqian, et al.
Published: (2026)

A Layer-wise Analysis of Supervised Fine-Tuning
by: Zhao, Qinghua, et al.
Published: (2026)

Quaff: Quantized Parameter-Efficient Fine-Tuning under Outlier Spatial Stability Hypothesis
by: Huang, Hong, et al.
Published: (2025)

EGAD: Entropy-Guided Adaptive Distillation for Token-Level Knowledge Transfer
by: Zhang, Hao, et al.
Published: (2026)

Natural Language Fine-Tuning
by: Liu, Jia, et al.
Published: (2024)

Self-Supervised On-Policy Distillation for Reasoning Language Models
by: Tan, Zhiquan, et al.
Published: (2026)

Pioneering Reliable Assessment in Text-to-Image Knowledge Editing: Leveraging a Fine-Grained Dataset and an Innovative Criterion
by: Gu, Hengrui, et al.
Published: (2024)

Reasoning While Recommending: Entropy-Guided Latent Reasoning in Generative Re-ranking Models
by: Zhang, Changshuo
Published: (2026)

M-GRPO: Stabilizing Self-Supervised Reinforcement Learning for Large Language Models with Momentum-Anchored Policy Optimization
by: Bai, Bizhe, et al.
Published: (2025)

Remote Training in Task-Oriented Communication: Supervised or Self-Supervised with Fine-Tuning?
by: Li, Hongru, et al.
Published: (2025)

Reasoning Model Unlearning: Forgetting Traces, Not Just Answers, While Preserving Reasoning Skills
by: Wang, Changsheng, et al.
Published: (2025)

QFFT, Question-Free Fine-Tuning for Adaptive Reasoning
by: Liu, Wanlong, et al.
Published: (2025)

RoaD: Rollouts as Demonstrations for Closed-Loop Supervised Fine-Tuning of Autonomous Driving Policies
by: Garcia-Cobo, Guillermo, et al.
Published: (2025)