Saved in:
Bibliographic Details
Main Authors: Zhu, Bingwen, Jiang, Yudong, Xu, Baohan, Yang, Siqian, Yin, Mingyu, Wu, Yidi, Sun, Huyang, Wu, Zuxuan
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2504.10044
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908418097283072
author Zhu, Bingwen
Jiang, Yudong
Xu, Baohan
Yang, Siqian
Yin, Mingyu
Wu, Yidi
Sun, Huyang
Wu, Zuxuan
author_facet Zhu, Bingwen
Jiang, Yudong
Xu, Baohan
Yang, Siqian
Yin, Mingyu
Wu, Yidi
Sun, Huyang
Wu, Zuxuan
contents Anime video generation faces significant challenges due to the scarcity of anime data and unusual motion patterns, leading to issues such as motion distortion and flickering artifacts, which result in misalignment with human preferences. Existing reward models, designed primarily for real-world videos, fail to capture the unique appearance and consistency requirements of anime. In this work, we propose a pipeline to enhance anime video generation by leveraging human feedback for better alignment. Specifically, we construct the first multi-dimensional reward dataset for anime videos, comprising 30k human-annotated samples that incorporating human preferences for both visual appearance and visual consistency. Based on this, we develop AnimeReward, a powerful reward model that employs specialized vision-language models for different evaluation dimensions to guide preference alignment. Furthermore, we introduce Gap-Aware Preference Optimization (GAPO), a novel training method that explicitly incorporates preference gaps into the optimization process, enhancing alignment performance and efficiency. Extensive experiment results show that AnimeReward outperforms existing reward models, and the inclusion of GAPO leads to superior alignment in both quantitative benchmarks and human evaluations, demonstrating the effectiveness of our pipeline in enhancing anime video quality. Our code and dataset are publicly available at https://github.com/bilibili/Index-anisora.
format Preprint
id arxiv_https___arxiv_org_abs_2504_10044
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Aligning Anime Video Generation with Human Feedback
Zhu, Bingwen
Jiang, Yudong
Xu, Baohan
Yang, Siqian
Yin, Mingyu
Wu, Yidi
Sun, Huyang
Wu, Zuxuan
Computer Vision and Pattern Recognition
Anime video generation faces significant challenges due to the scarcity of anime data and unusual motion patterns, leading to issues such as motion distortion and flickering artifacts, which result in misalignment with human preferences. Existing reward models, designed primarily for real-world videos, fail to capture the unique appearance and consistency requirements of anime. In this work, we propose a pipeline to enhance anime video generation by leveraging human feedback for better alignment. Specifically, we construct the first multi-dimensional reward dataset for anime videos, comprising 30k human-annotated samples that incorporating human preferences for both visual appearance and visual consistency. Based on this, we develop AnimeReward, a powerful reward model that employs specialized vision-language models for different evaluation dimensions to guide preference alignment. Furthermore, we introduce Gap-Aware Preference Optimization (GAPO), a novel training method that explicitly incorporates preference gaps into the optimization process, enhancing alignment performance and efficiency. Extensive experiment results show that AnimeReward outperforms existing reward models, and the inclusion of GAPO leads to superior alignment in both quantitative benchmarks and human evaluations, demonstrating the effectiveness of our pipeline in enhancing anime video quality. Our code and dataset are publicly available at https://github.com/bilibili/Index-anisora.
title Aligning Anime Video Generation with Human Feedback
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2504.10044