:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Gu, Weizheng, Li, Chengze, Yu, Zhuohao, Sun, Mengyuan, Yang, Zhibang, Wang, Wei, Jia, Hongrui, Zhang, Shikun, Ye, Wei
Natura:	Preprint
Pubblicazione:	2026
Soggetti:	Machine Learning
Accesso online:	https://arxiv.org/abs/2602.01611
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

SteerRM: Debiasing Reward Models via Sparse Autoencoders
di: Sun, Mengyuan, et al.
Pubblicazione: (2026)

SAEMark: Steering Personalized Multilingual LLM Watermarks with Sparse Autoencoders
di: Yu, Zhuohao, et al.
Pubblicazione: (2025)

Reasoning Through Execution: Unifying Process and Outcome Rewards for Code Generation
di: Yu, Zhuohao, et al.
Pubblicazione: (2024)

RewardAnything: Generalizable Principle-Following Reward Models
di: Yu, Zhuohao, et al.
Pubblicazione: (2025)

Learning Structure-Semantic Evolution Trajectories for Graph Domain Adaptation
di: Chen, Wei, et al.
Pubblicazione: (2026)

KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models
di: Yu, Zhuohao, et al.
Pubblicazione: (2024)

Quagmires in SFT-RL Post-Training: When High SFT Scores Mislead and What to Use Instead
di: Kang, Feiyang, et al.
Pubblicazione: (2025)

TMS: Trajectory-Mixed Supervision for Reward-Free, On-Policy SFT
di: Khan, Rana Muhammad Shahroz, et al.
Pubblicazione: (2026)

On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification
di: Wu, Yongliang, et al.
Pubblicazione: (2025)

Hyper-STTN: Hypergraph Augmented Spatial-Temporal Transformer Network for Trajectory Prediction
di: Wang, Weizheng, et al.
Pubblicazione: (2024)

PepThink-R1: LLM for Interpretable Cyclic Peptide Optimization with CoT SFT and Reinforcement Learning
di: Wang, Ruheng, et al.
Pubblicazione: (2025)

Mitigating Spurious Correlations with Causal Logit Perturbation
di: Zhou, Xiaoling, et al.
Pubblicazione: (2025)

GAC: Noise-Aware Adaptive Mixing for Hybrid SFT-RL Post-Training
di: Hu, Yuelin, et al.
Pubblicazione: (2026)

What Do Latent Action Models Actually Learn?
di: Zhang, Chuheng, et al.
Pubblicazione: (2025)

ARC-STAR: Auditable Post-Hoc Correction for PDE Foundation Models
di: Li, Chengze, et al.
Pubblicazione: (2026)

Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence-Difficulty Alignment Perspective
di: Kong, Deyang, et al.
Pubblicazione: (2025)

SFT-GO: Supervised Fine-Tuning with Group Optimization for Large Language Models
di: Kim, Gyuhak, et al.
Pubblicazione: (2025)

Offline Reinforcement Learning: Role of State Aggregation and Trajectory Data
di: Jia, Zeyu, et al.
Pubblicazione: (2024)

An update to PYRO-NN: A Python Library for Differentiable CT Operators
di: Schneider, Linda-Sophie, et al.
Pubblicazione: (2025)

Mitigating Visual Context Degradation in Large Multimodal Models: A Training-Free Decoupled Agentic Framework
di: Jia, Hongrui, et al.
Pubblicazione: (2025)

Climbing the Ladder of Reasoning: What LLMs Can-and Still Can't-Solve after SFT?
di: Sun, Yiyou, et al.
Pubblicazione: (2025)

Patch the Distribution Mismatch: RL Rewriting Agent for Stable Off-Policy SFT
di: Wang, Jiacheng, et al.
Pubblicazione: (2026)

From Curiosity to Caution: Mitigating Reward Hacking for Best-of-N with Pessimism
di: Yu, Zhuohao, et al.
Pubblicazione: (2026)

Debunk the Myth of SFT Generalization
di: Lin, Xiaofeng, et al.
Pubblicazione: (2025)

Bridging Global Intent with Local Details: A Hierarchical Representation Approach for Semantic Validation in Text-to-SQL
di: Qiu, Rihong, et al.
Pubblicazione: (2025)

Procedural-skill SFT across capacity tiers: A W-Shaped pre-SFT Trajectory and Regime-Asymmetric Mechanism on 0.8B-4B Qwen3.5 Models
di: Strozzi, Igor
Pubblicazione: (2026)

Spectral Heterogeneous Graph Convolutions via Positive Noncommutative Polynomials
di: He, Mingguo, et al.
Pubblicazione: (2023)

mSFT: Addressing Dataset Mixtures Overfitting Heterogeneously in Multi-task SFT
di: Koh, Woosung, et al.
Pubblicazione: (2026)

Learning Wavelet-Sparse FDK for 3D Cone-Beam CT Reconstruction
di: Sun, Yipeng, et al.
Pubblicazione: (2025)

Task-Oriented Low-Label Semantic Communication With Self-Supervised Learning
di: Gu, Run, et al.
Pubblicazione: (2025)

Why Does RL Generalize Better Than SFT? A Data-Centric Perspective on VLM Post-Training
di: Lu, Aojun, et al.
Pubblicazione: (2026)

PatchAD: A Lightweight Patch-based MLP-Mixer for Time Series Anomaly Detection
di: Zhong, Zhijie, et al.
Pubblicazione: (2024)

Model Generalization on Text Attribute Graphs: Principles with Large Language Models
di: Wang, Haoyu, et al.
Pubblicazione: (2025)

Continual SFT Matches Multimodal RLHF with Negative Supervision
di: Zhu, Ke, et al.
Pubblicazione: (2024)

Learning by Doing: An Online Causal Reinforcement Learning Framework with Causal-Aware Policy
di: Cai, Ruichu, et al.
Pubblicazione: (2024)

Enhancing In-Context Learning via Implicit Demonstration Augmentation
di: Zhou, Xiaoling, et al.
Pubblicazione: (2024)

Bridging SFT and RL: Dynamic Policy Optimization for Robust Reasoning
di: Zhu, Taojie, et al.
Pubblicazione: (2026)

AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy
di: Liu, Zihan, et al.
Pubblicazione: (2025)

PLATONT: Learning a Platonic Representation for Unified Network Tomography
di: Du, Chengze, et al.
Pubblicazione: (2025)

Logarithmic Regret for Online KL-Regularized Reinforcement Learning
di: Zhao, Heyang, et al.
Pubblicazione: (2025)