Saved in:
| Main Authors: | Qiu, Longtian, Ning, Shan, Zhang, Chuyu, Sun, Jiaxuan, He, Xuming |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.00623 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
by: Qiu, Longtian, et al.
Published: (2024)
by: Qiu, Longtian, et al.
Published: (2024)
OmniDPO: A Preference Optimization Framework to Address Omni-Modal Hallucination
by: Chen, Junzhe, et al.
Published: (2025)
by: Chen, Junzhe, et al.
Published: (2025)
WikiCLIP: An Efficient Contrastive Baseline for Open-domain Visual Entity Recognition
by: Ning, Shan, et al.
Published: (2026)
by: Ning, Shan, et al.
Published: (2026)
NoisyGRPO: Incentivizing Multimodal CoT Reasoning via Noise Injection and Bayesian Estimation
by: Qiu, Longtian, et al.
Published: (2025)
by: Qiu, Longtian, et al.
Published: (2025)
Wiki-R1: Incentivizing Multimodal Reasoning for Knowledge-based VQA via Data and Sampling Curriculum
by: Ning, Shan, et al.
Published: (2026)
by: Ning, Shan, et al.
Published: (2026)
Difficulty-Based Preference Data Selection by DPO Implicit Reward Gap
by: Qi, Xuan, et al.
Published: (2025)
by: Qi, Xuan, et al.
Published: (2025)
Freeze and Cluster: A Simple Baseline for Rehearsal-Free Continual Category Discovery
by: Zhang, Chuyu, et al.
Published: (2025)
by: Zhang, Chuyu, et al.
Published: (2025)
$β$-DPO: Direct Preference Optimization with Dynamic $β$
by: Wu, Junkang, et al.
Published: (2024)
by: Wu, Junkang, et al.
Published: (2024)
Decoding by Perturbation: Mitigating MLLM Hallucinations via Dynamic Textual Perturbation
by: Jia, Sihang, et al.
Published: (2026)
by: Jia, Sihang, et al.
Published: (2026)
EnerBridge-DPO: Energy-Guided Protein Inverse Folding with Markov Bridges and Direct Preference Optimization
by: Rong, Dingyi, et al.
Published: (2025)
by: Rong, Dingyi, et al.
Published: (2025)
TUR-DPO: Topology- and Uncertainty-Aware Direct Preference Optimization
by: Abdullah, Abdulhady Abas, et al.
Published: (2026)
by: Abdullah, Abdulhady Abas, et al.
Published: (2026)
AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization
by: Wu, Junkang, et al.
Published: (2024)
by: Wu, Junkang, et al.
Published: (2024)
V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization
by: Xie, Yuxi, et al.
Published: (2024)
by: Xie, Yuxi, et al.
Published: (2024)
2D-Curri-DPO: Two-Dimensional Curriculum Learning for Direct Preference Optimization
by: Li, Mengyang, et al.
Published: (2025)
by: Li, Mengyang, et al.
Published: (2025)
2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
by: Li, Shilong, et al.
Published: (2024)
by: Li, Shilong, et al.
Published: (2024)
C2-DPO: Constrained Controlled Direct Preference Optimization
by: Asadi, Kavosh, et al.
Published: (2025)
by: Asadi, Kavosh, et al.
Published: (2025)
AnnoDPO: Protein Functional Annotation Learning with Direct Preference Optimization
by: Jiang, Zixuan, et al.
Published: (2025)
by: Jiang, Zixuan, et al.
Published: (2025)
Online DPO: Online Direct Preference Optimization with Fast-Slow Chasing
by: Qi, Biqing, et al.
Published: (2024)
by: Qi, Biqing, et al.
Published: (2024)
Multi-Preference Optimization: Generalizing DPO via Set-Level Contrasts
by: Gupta, Taneesh, et al.
Published: (2024)
by: Gupta, Taneesh, et al.
Published: (2024)
$ξ$-DPO: Direct Preference Optimization via Ratio Reward Margin
by: Fan, Zhengyuan, et al.
Published: (2026)
by: Fan, Zhengyuan, et al.
Published: (2026)
ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment
by: Lin, Xiaoqiang, et al.
Published: (2025)
by: Lin, Xiaoqiang, et al.
Published: (2025)
Graph Unlearning Meets Influence-aware Negative Preference Optimization
by: Chen, Qiang, et al.
Published: (2025)
by: Chen, Qiang, et al.
Published: (2025)
Mitigating Hallucinations in Multimodal LLMs via Object-aware Preference Optimization
by: Compagnoni, Alberto, et al.
Published: (2025)
by: Compagnoni, Alberto, et al.
Published: (2025)
DPO Meets PPO: Reinforced Token Optimization for RLHF
by: Zhong, Han, et al.
Published: (2024)
by: Zhong, Han, et al.
Published: (2024)
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
by: Liu, Ziyu, et al.
Published: (2024)
by: Liu, Ziyu, et al.
Published: (2024)
PKG-DPO: Optimizing Domain-Specific AI systems with Physics Knowledge Graphs and Direct Preference Optimization
by: Kulkarni, Nitin Nagesh, et al.
Published: (2025)
by: Kulkarni, Nitin Nagesh, et al.
Published: (2025)
RealDPO: Real or Not Real, that is the Preference
by: Cheng, Guo, et al.
Published: (2025)
by: Cheng, Guo, et al.
Published: (2025)
SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety
by: Kim, Geon-Hyeong, et al.
Published: (2025)
by: Kim, Geon-Hyeong, et al.
Published: (2025)
Margin Adaptive DPO: Leveraging Reward Model for Granular Control in Preference Optimization
by: Rho, Hyung Gyu
Published: (2025)
by: Rho, Hyung Gyu
Published: (2025)
Sem-DPO: Mitigating Semantic Inconsistency in Preference Optimization for Prompt Engineering
by: Mohamed, Anas, et al.
Published: (2025)
by: Mohamed, Anas, et al.
Published: (2025)
Just Say What You Want: Only-prompting Self-rewarding Online Preference Optimization
by: Xu, Ruijie, et al.
Published: (2024)
by: Xu, Ruijie, et al.
Published: (2024)
Teaching with Lies: Curriculum DPO on Synthetic Negatives for Hallucination Detection
by: Pandit, Shrey, et al.
Published: (2025)
by: Pandit, Shrey, et al.
Published: (2025)
PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization
by: Zhang, Yangsong, et al.
Published: (2026)
by: Zhang, Yangsong, et al.
Published: (2026)
Uni-DPO: A Unified Paradigm for Dynamic Preference Optimization of LLMs
by: Peng, Shangpin, et al.
Published: (2025)
by: Peng, Shangpin, et al.
Published: (2025)
mDPO: Conditional Preference Optimization for Multimodal Large Language Models
by: Wang, Fei, et al.
Published: (2024)
by: Wang, Fei, et al.
Published: (2024)
Rethinking DPO: The Role of Rejected Responses in Preference Misalignment
by: Cho, Jay Hyeon, et al.
Published: (2025)
by: Cho, Jay Hyeon, et al.
Published: (2025)
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
by: Lai, Xin, et al.
Published: (2024)
by: Lai, Xin, et al.
Published: (2024)
Reg-DPO: SFT-Regularized Direct Preference Optimization with GT-Pair for Improving Video Generation
by: Du, Jie, et al.
Published: (2025)
by: Du, Jie, et al.
Published: (2025)
VideoDPO: Omni-Preference Alignment for Video Diffusion Generation
by: Liu, Runtao, et al.
Published: (2024)
by: Liu, Runtao, et al.
Published: (2024)
RePPL: Recalibrating Perplexity by Uncertainty in Semantic Propagation and Language Generation for Explainable QA Hallucination Detection
by: Huang, Yiming, et al.
Published: (2025)
by: Huang, Yiming, et al.
Published: (2025)
Similar Items
-
Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
by: Qiu, Longtian, et al.
Published: (2024) -
OmniDPO: A Preference Optimization Framework to Address Omni-Modal Hallucination
by: Chen, Junzhe, et al.
Published: (2025) -
WikiCLIP: An Efficient Contrastive Baseline for Open-domain Visual Entity Recognition
by: Ning, Shan, et al.
Published: (2026) -
NoisyGRPO: Incentivizing Multimodal CoT Reasoning via Noise Injection and Bayesian Estimation
by: Qiu, Longtian, et al.
Published: (2025) -
Wiki-R1: Incentivizing Multimodal Reasoning for Knowledge-based VQA via Data and Sampling Curriculum
by: Ning, Shan, et al.
Published: (2026)