Saved in:
| Main Authors: | Lin, Guang, Tu, Shikui, Xu, Lei |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2606.01220 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Boosting Efficiency in Task-Agnostic Exploration through Causal Knowledge
by: Yang, Yupei, et al.
Published: (2024)
by: Yang, Yupei, et al.
Published: (2024)
THFlow: A Temporally Hierarchical Flow Matching Framework for 3D Peptide Design
by: Huang, Dengdeng, et al.
Published: (2025)
by: Huang, Dengdeng, et al.
Published: (2025)
DLPO: Diffusion Model Loss-Guided Reinforcement Learning for Fine-Tuning Text-to-Speech Diffusion Models
by: Chen, Jingyi, et al.
Published: (2024)
by: Chen, Jingyi, et al.
Published: (2024)
Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning
by: Zhao, Hanyang, et al.
Published: (2025)
by: Zhao, Hanyang, et al.
Published: (2025)
Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling
by: Huang, Zeyu, et al.
Published: (2025)
by: Huang, Zeyu, et al.
Published: (2025)
ESSAM: A Novel Competitive Evolution Strategies Approach to Reinforcement Learning for Memory Efficient LLMs Fine-Tuning
by: Sun, Zhishen, et al.
Published: (2026)
by: Sun, Zhishen, et al.
Published: (2026)
Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models
by: Uehara, Masatoshi, et al.
Published: (2024)
by: Uehara, Masatoshi, et al.
Published: (2024)
Thompson Sampling via Fine-Tuning of LLMs
by: Menet, Nicolas, et al.
Published: (2025)
by: Menet, Nicolas, et al.
Published: (2025)
Fine-Tuning Diffusion Models via Intermediate Distribution Shaping
by: Anil, Gautham Govind, et al.
Published: (2025)
by: Anil, Gautham Govind, et al.
Published: (2025)
Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review
by: Uehara, Masatoshi, et al.
Published: (2024)
by: Uehara, Masatoshi, et al.
Published: (2024)
FZOO: Fast Zeroth-Order Optimizer for Fine-Tuning Large Language Models towards Adam-Scale Speed
by: Dang, Sizhe, et al.
Published: (2025)
by: Dang, Sizhe, et al.
Published: (2025)
dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning
by: Chen, Shirui, et al.
Published: (2025)
by: Chen, Shirui, et al.
Published: (2025)
A Survey on Parameter-Efficient Fine-Tuning for Foundation Models in Federated Learning
by: Bian, Jieming, et al.
Published: (2025)
by: Bian, Jieming, et al.
Published: (2025)
Diffusion-DICE: In-Sample Diffusion Guidance for Offline Reinforcement Learning
by: Mao, Liyuan, et al.
Published: (2024)
by: Mao, Liyuan, et al.
Published: (2024)
Enhancing Reinforcement Learning Fine-Tuning with an Online Refiner
by: Ma, Hao, et al.
Published: (2026)
by: Ma, Hao, et al.
Published: (2026)
Reinforcement Learning Fine-Tunes a Sparse Subnetwork in Large Language Models
by: Balashov, Andrii
Published: (2025)
by: Balashov, Andrii
Published: (2025)
Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct
by: Zheng, Haoyang, et al.
Published: (2025)
by: Zheng, Haoyang, et al.
Published: (2025)
Reinforcement Learning Fine-Tuning Enhances Activation Intensity and Diversity in the Internal Circuitry of LLMs
by: Zhang, Honglin, et al.
Published: (2025)
by: Zhang, Honglin, et al.
Published: (2025)
RPO:Reinforcement Fine-Tuning with Partial Reasoning Optimization
by: Yi, Hongzhu, et al.
Published: (2026)
by: Yi, Hongzhu, et al.
Published: (2026)
Reward Sharpness-Aware Fine-Tuning for Diffusion Models
by: Kim, Kwanyoung, et al.
Published: (2026)
by: Kim, Kwanyoung, et al.
Published: (2026)
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs
by: Zhao, Yanxiao, et al.
Published: (2025)
by: Zhao, Yanxiao, et al.
Published: (2025)
SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning
by: Fu, Yuqian, et al.
Published: (2025)
by: Fu, Yuqian, et al.
Published: (2025)
Efficient Differentially Private Fine-Tuning of LLMs via Reinforcement Learning
by: Khadangi, Afshin, et al.
Published: (2025)
by: Khadangi, Afshin, et al.
Published: (2025)
Supervised Fine-Tuning as Inverse Reinforcement Learning
by: Sun, Hao
Published: (2024)
by: Sun, Hao
Published: (2024)
RIFT: Repurposing Negative Samples via Reward-Informed Fine-Tuning
by: Liu, Zehua, et al.
Published: (2026)
by: Liu, Zehua, et al.
Published: (2026)
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
by: Ye, Kai, et al.
Published: (2025)
by: Ye, Kai, et al.
Published: (2025)
On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models
by: Wang, Shumin, et al.
Published: (2026)
by: Wang, Shumin, et al.
Published: (2026)
Adaptive Policy Selection and Fine-Tuning under Interaction Budgets for Offline-to-Online Reinforcement Learning
by: Bozkurt, Alper Kamil, et al.
Published: (2026)
by: Bozkurt, Alper Kamil, et al.
Published: (2026)
Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved)
by: Qin, Chongli, et al.
Published: (2025)
by: Qin, Chongli, et al.
Published: (2025)
Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design
by: Wang, Chenyu, et al.
Published: (2024)
by: Wang, Chenyu, et al.
Published: (2024)
DiFFPO: Training Diffusion LLMs to Reason Fast and Furious via Reinforcement Learning
by: Zhao, Hanyang, et al.
Published: (2025)
by: Zhao, Hanyang, et al.
Published: (2025)
iTool: Reinforced Fine-Tuning with Dynamic Deficiency Calibration for Advanced Tool Use
by: Zeng, Yirong, et al.
Published: (2025)
by: Zeng, Yirong, et al.
Published: (2025)
Towards Fast Safe Online Reinforcement Learning via Policy Finetuning
by: Chen, Keru, et al.
Published: (2024)
by: Chen, Keru, et al.
Published: (2024)
Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions
by: Ma, Lu, et al.
Published: (2025)
by: Ma, Lu, et al.
Published: (2025)
Fast Sampling via Discrete Non-Markov Diffusion Models with Predetermined Transition Time
by: Chen, Zixiang, et al.
Published: (2023)
by: Chen, Zixiang, et al.
Published: (2023)
FRoD: Full-Rank Efficient Fine-Tuning with Rotational Degrees for Fast Convergence
by: Wan, Guoan, et al.
Published: (2025)
by: Wan, Guoan, et al.
Published: (2025)
Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting
by: Sanyal, Sunny, et al.
Published: (2025)
by: Sanyal, Sunny, et al.
Published: (2025)
Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized Control
by: Uehara, Masatoshi, et al.
Published: (2024)
by: Uehara, Masatoshi, et al.
Published: (2024)
Diffusion Fine-Tuning via Reparameterized Policy Gradient of the Soft Q-Function
by: Kang, Hyeongyu, et al.
Published: (2025)
by: Kang, Hyeongyu, et al.
Published: (2025)
Internalizing Curriculum Judgment for LLM Reinforcement Fine-Tuning
by: Zheng, Han, et al.
Published: (2026)
by: Zheng, Han, et al.
Published: (2026)
Similar Items
-
Boosting Efficiency in Task-Agnostic Exploration through Causal Knowledge
by: Yang, Yupei, et al.
Published: (2024) -
THFlow: A Temporally Hierarchical Flow Matching Framework for 3D Peptide Design
by: Huang, Dengdeng, et al.
Published: (2025) -
DLPO: Diffusion Model Loss-Guided Reinforcement Learning for Fine-Tuning Text-to-Speech Diffusion Models
by: Chen, Jingyi, et al.
Published: (2024) -
Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning
by: Zhao, Hanyang, et al.
Published: (2025) -
Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling
by: Huang, Zeyu, et al.
Published: (2025)