Saved in:
| Main Authors: | Hou, Wenjin, Peng, Shangpin, Wang, Weinong, Ruan, Zheng, Zhang, Yue, Zhou, Zhenglin, Gao, Mingqi, Chen, Yifei, Wang, Kaiqi, Yang, Hongming, Zhang, Chengquan, Tian, Zhuotao, Hu, Han, Yang, Yi, Wu, Fei, Fan, Hehe |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.03677 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Uni-DPO: A Unified Paradigm for Dynamic Preference Optimization of LLMs
by: Peng, Shangpin, et al.
Published: (2025)
by: Peng, Shangpin, et al.
Published: (2025)
DiffusionOPD: A Unified Perspective of On-Policy Distillation in Diffusion Models
by: Li, Quanhao, et al.
Published: (2026)
by: Li, Quanhao, et al.
Published: (2026)
Mitigating Object Hallucinations via Sentence-Level Early Intervention
by: Peng, Shangpin, et al.
Published: (2025)
by: Peng, Shangpin, et al.
Published: (2025)
MAD-OPD: Breaking the Ceiling in On-Policy Distillation via Multi-Agent Debate
by: Wang, Jianze, et al.
Published: (2026)
by: Wang, Jianze, et al.
Published: (2026)
Prune-OPD: Efficient and Reliable On-Policy Distillation for Long-Horizon Reasoning
by: Yang, Zhicheng, et al.
Published: (2026)
by: Yang, Zhicheng, et al.
Published: (2026)
Draft-OPD: On-Policy Distillation for Speculative Draft Models
by: Lei, Haodi, et al.
Published: (2026)
by: Lei, Haodi, et al.
Published: (2026)
Incentivizing Generative Zero-Shot Learning via Outcome-Reward Reinforcement Learning with Visual Cues
by: Hou, Wenjin, et al.
Published: (2026)
by: Hou, Wenjin, et al.
Published: (2026)
OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification
by: Zhou, Yuhang, et al.
Published: (2026)
by: Zhou, Yuhang, et al.
Published: (2026)
OPD+: Rethinking the Advantage Design for On-Policy Distillation
by: Zhao, Hanyang, et al.
Published: (2026)
by: Zhao, Hanyang, et al.
Published: (2026)
Adversarial Dual On-Policy Distillation from Expressive Teacher
by: Wan, Zhenglin, et al.
Published: (2026)
by: Wan, Zhenglin, et al.
Published: (2026)
Flow-OPD: On-Policy Distillation for Flow Matching Models
by: Fang, Zhen, et al.
Published: (2026)
by: Fang, Zhen, et al.
Published: (2026)
Safe, or Simply Incapable? Rethinking Safety Evaluation for Phone-Use Agents
by: Tang, Zhengyang, et al.
Published: (2026)
by: Tang, Zhengyang, et al.
Published: (2026)
DP-OPD: Differentially Private On-Policy Distillation for Language Models
by: Khadem, Fatemeh, et al.
Published: (2026)
by: Khadem, Fatemeh, et al.
Published: (2026)
$\boldsymbol{f}$-OPD: Stabilizing Long-Horizon On-Policy Distillation with Freshness-Aware Control
by: Chen, Xianwei, et al.
Published: (2026)
by: Chen, Xianwei, et al.
Published: (2026)
Deepfake Detection Generalization with Diffusion Noise
by: Qi, Hongyuan, et al.
Published: (2026)
by: Qi, Hongyuan, et al.
Published: (2026)
Decoupling KL and Trajectories: A Unified Perspective for SFT, DAgger, Offline RL, and OPD in LLM Distillation
by: Zhao, Anhao, et al.
Published: (2026)
by: Zhao, Anhao, et al.
Published: (2026)
HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting
by: Zhou, Zhenglin, et al.
Published: (2024)
by: Zhou, Zhenglin, et al.
Published: (2024)
EDGE-OPD: Internalizing Privileged Context with Evidence Guided On-Policy Distillation
by: Lazaridis, Aristotelis, et al.
Published: (2026)
by: Lazaridis, Aristotelis, et al.
Published: (2026)
Prompt-Aware Controllable Shadow Removal
by: Chen, Kerui, et al.
Published: (2025)
by: Chen, Kerui, et al.
Published: (2025)
ZeroMamba: Exploring Visual State Space Model for Zero-Shot Learning
by: Hou, Wenjin, et al.
Published: (2024)
by: Hou, Wenjin, et al.
Published: (2024)
Translution: Unifying Self-attention and Convolution for Adaptive and Relative Modeling
by: Fan, Hehe, et al.
Published: (2025)
by: Fan, Hehe, et al.
Published: (2025)
Depictor : Topic‐Guided Opinion Summarization for Product Reviews With Dual‐Perspective Topic Modeling
by: Yanyue Zhang, et al.
Published: (2026)
by: Yanyue Zhang, et al.
Published: (2026)
X-OPD: Cross-Modal On-Policy Distillation for Capability Alignment in Speech LLMs
by: Cao, Di, et al.
Published: (2026)
by: Cao, Di, et al.
Published: (2026)
PhoneWorld: Scaling Phone-Use Agent Environments
by: Tang, Zhengyang, et al.
Published: (2026)
by: Tang, Zhengyang, et al.
Published: (2026)
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
by: Li, Yaxuan, et al.
Published: (2026)
by: Li, Yaxuan, et al.
Published: (2026)
Uni3D-MoE: Scalable Multimodal 3D Scene Understanding via Mixture of Experts
by: Zhang, Yue, et al.
Published: (2025)
by: Zhang, Yue, et al.
Published: (2025)
Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation
by: Wu, Yecheng, et al.
Published: (2026)
by: Wu, Yecheng, et al.
Published: (2026)
VLA-OPD: Bridging Offline SFT and Online RL for Vision-Language-Action Models via On-Policy Distillation
by: Zhong, Zhide, et al.
Published: (2026)
by: Zhong, Zhide, et al.
Published: (2026)
Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction
by: Wang, Yifei, et al.
Published: (2025)
by: Wang, Yifei, et al.
Published: (2025)
UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping
by: Wang, Wenbo, et al.
Published: (2024)
by: Wang, Wenbo, et al.
Published: (2024)
Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation
by: Yuan, Qianhao, et al.
Published: (2026)
by: Yuan, Qianhao, et al.
Published: (2026)
Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion
by: Zhou, Zhenglin, et al.
Published: (2025)
by: Zhou, Zhenglin, et al.
Published: (2025)
Enhancing MLLM Spatial Understanding via Active 3D Scene Exploration for Multi-Perspective Reasoning
by: Chen, Jiahua, et al.
Published: (2026)
by: Chen, Jiahua, et al.
Published: (2026)
CF-VLA: Efficient Coarse-to-Fine Action Generation for Vision-Language-Action Policies
by: Du, Fan, et al.
Published: (2026)
by: Du, Fan, et al.
Published: (2026)
ITS3D: Inference-Time Scaling for Text-Guided 3D Diffusion Models
by: Zhou, Zhenglin, et al.
Published: (2025)
by: Zhou, Zhenglin, et al.
Published: (2025)
DreamDPO: Aligning Text-to-3D Generation with Human Preferences via Direct Preference Optimization
by: Zhou, Zhenglin, et al.
Published: (2025)
by: Zhou, Zhenglin, et al.
Published: (2025)
ANO: A Principled Approach to Robust Policy Optimization
by: Zhang, Yiheng, et al.
Published: (2026)
by: Zhang, Yiheng, et al.
Published: (2026)
Unified Language-driven Zero-shot Domain Adaptation
by: Yang, Senqiao, et al.
Published: (2024)
by: Yang, Senqiao, et al.
Published: (2024)
Seeing Is Believing? A Benchmark for Multimodal Large Language Models on Visual Illusions and Anomalies
by: Hou, Wenjin, et al.
Published: (2026)
by: Hou, Wenjin, et al.
Published: (2026)
SedarEval: Automated Evaluation using Self-Adaptive Rubrics
by: Fan, Zhiyuan, et al.
Published: (2025)
by: Fan, Zhiyuan, et al.
Published: (2025)
Similar Items
-
Uni-DPO: A Unified Paradigm for Dynamic Preference Optimization of LLMs
by: Peng, Shangpin, et al.
Published: (2025) -
DiffusionOPD: A Unified Perspective of On-Policy Distillation in Diffusion Models
by: Li, Quanhao, et al.
Published: (2026) -
Mitigating Object Hallucinations via Sentence-Level Early Intervention
by: Peng, Shangpin, et al.
Published: (2025) -
MAD-OPD: Breaking the Ceiling in On-Policy Distillation via Multi-Agent Debate
by: Wang, Jianze, et al.
Published: (2026) -
Prune-OPD: Efficient and Reliable On-Policy Distillation for Long-Horizon Reasoning
by: Yang, Zhicheng, et al.
Published: (2026)