:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Strozzi, Igor
Format:	Preprint
Veröffentlicht:	2026
Schlagworte:	Machine Learning
Online-Zugang:	https://arxiv.org/abs/2605.11907
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

mSFT: Addressing Dataset Mixtures Overfitting Heterogeneously in Multi-task SFT
von: Koh, Woosung, et al.
Veröffentlicht: (2026)

TMS: Trajectory-Mixed Supervision for Reward-Free, On-Policy SFT
von: Khan, Rana Muhammad Shahroz, et al.
Veröffentlicht: (2026)

Debunk the Myth of SFT Generalization
von: Lin, Xiaofeng, et al.
Veröffentlicht: (2025)

What Do Agents Learn from Trajectory-SFT: Semantics or Interfaces?
von: Gu, Weizheng, et al.
Veröffentlicht: (2026)

Quagmires in SFT-RL Post-Training: When High SFT Scores Mislead and What to Use Instead
von: Kang, Feiyang, et al.
Veröffentlicht: (2025)

Crowd-SFT: Crowdsourcing for LLM Alignment
von: Sotiropoulos, Alex, et al.
Veröffentlicht: (2025)

Crafting Reversible SFT Behaviors in Large Language Models
von: Lin, Yuping, et al.
Veröffentlicht: (2026)

Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment
von: Li, Jiaxiang, et al.
Veröffentlicht: (2024)

A Three-Dimensional SFT with Sparse Columns
von: Salo, Ville, et al.
Veröffentlicht: (2025)

Simplified SFT moduli spaces for Legendrian links
von: Avdek, Russell
Veröffentlicht: (2021)

SFT covers for actions of the first Grigorchuk group
von: Grigorchuk, Rostislav, et al.
Veröffentlicht: (2024)

Automatically Generating Numerous Context-Driven SFT Data for LLMs across Diverse Granularity
von: Quan, Shanghaoran
Veröffentlicht: (2024)

Symbolic Momentum Conservation and Curvature Entanglement in a Recursive Universe: A Force-Based Reality Framework (SFT-FBRF-8) Eighth Research Paper of the SFT-FBRF Series
von: JANAKARAJ, SIVARAM
Veröffentlicht: (2025)

A landscape of contact manifolds via rational SFT
von: Moreno, Agustin, et al.
Veröffentlicht: (2020)

RL makes MLLMs see better than SFT
von: Song, Junha, et al.
Veröffentlicht: (2025)

RL Fine-Tuning Heals OOD Forgetting in SFT
von: Jin, Hangzhan, et al.
Veröffentlicht: (2025)

SFT for ASD: A systemic intervention for neurodiverse families
von: Anthony Pennant
Veröffentlicht: (2024)

Continual SFT Matches Multimodal RLHF with Negative Supervision
von: Zhu, Ke, et al.
Veröffentlicht: (2024)

Learning to Adapt SFT Data for Better Reasoning Generalization
von: Sun, Lisong, et al.
Veröffentlicht: (2026)

On Countable SFT Covers of Sparse Multidimensional Shift Spaces
von: Törmä, Ilkka
Veröffentlicht: (2024)

Qwen3.5-Omni Technical Report
von: Qwen Team
Veröffentlicht: (2026)

An Empirical Study of SFT-DPO Interaction and Parameterization in Small Language Models
von: Feng, Yuming, et al.
Veröffentlicht: (2026)

R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model
von: Zhou, Hengguang, et al.
Veröffentlicht: (2025)

Decoupling KL and Trajectories: A Unified Perspective for SFT, DAgger, Offline RL, and OPD in LLM Distillation
von: Zhao, Anhao, et al.
Veröffentlicht: (2026)

SED-SFT: Selectively Encouraging Diversity in Supervised Fine-Tuning
von: Chen, Yijie, et al.
Veröffentlicht: (2026)

Empowering Lightweight MLLMs with Reasoning via Long CoT SFT
von: Ou, Linyu, et al.
Veröffentlicht: (2025)

Bridging SFT and RL: Dynamic Policy Optimization for Robust Reasoning
von: Zhu, Taojie, et al.
Veröffentlicht: (2026)

SFT-then-RL Outperforms Mixed-Policy Methods for LLM Reasoning
von: Limozin, Alexis, et al.
Veröffentlicht: (2026)

Reconciling Contradictory Views on the Effectiveness of SFT in LLMs: An Interaction Perspective
von: Zhang, Junpeng, et al.
Veröffentlicht: (2026)

DynamixSFT: Dynamic Mixture Optimization of Instruction Tuning Collections
von: Shin, Haebin, et al.
Veröffentlicht: (2025)

On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification
von: Wu, Yongliang, et al.
Veröffentlicht: (2025)

Comments on resolution of nonassociativity in SFT- an example from axioms of BCFT-
von: Matsuo Yutaka
Veröffentlicht: (2002)

Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning
von: Hong, Joey, et al.
Veröffentlicht: (2024)

SFT-GO: Supervised Fine-Tuning with Group Optimization for Large Language Models
von: Kim, Gyuhak, et al.
Veröffentlicht: (2025)

Blockwise SFT for Diffusion Language Models: Reconciling Bidirectional Attention and Autoregressive Decoding
von: Sun, Bowen, et al.
Veröffentlicht: (2025)

Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning
von: Qiu, Haibo, et al.
Veröffentlicht: (2025)

Bridging SFT and DPO for Diffusion Model Alignment with Self-Sampling Preference Optimization
von: Zhang, Daoan, et al.
Veröffentlicht: (2024)

Memorize Theorems, Not Instances: Probing SFT Generalization through Mathematical Reasoning
von: Peng, Ruiying, et al.
Veröffentlicht: (2026)

Gradients Must Earn Their Influence: Unifying SFT with Generalized Entropic Objectives
von: Wang, Zecheng, et al.
Veröffentlicht: (2026)

RLSR: Reinforcement Learning with Supervised Reward Outperforms SFT in Instruction Following
von: Wang, Zhichao, et al.
Veröffentlicht: (2025)