Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhu, Bingwen, Jiang, Yudong, Xu, Baohan, Yang, Siqian, Yin, Mingyu, Wu, Yidi, Sun, Huyang, Wu, Zuxuan
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2504.10044
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908418097283072
author	Zhu, Bingwen Jiang, Yudong Xu, Baohan Yang, Siqian Yin, Mingyu Wu, Yidi Sun, Huyang Wu, Zuxuan
author_facet	Zhu, Bingwen Jiang, Yudong Xu, Baohan Yang, Siqian Yin, Mingyu Wu, Yidi Sun, Huyang Wu, Zuxuan
contents	Anime video generation faces significant challenges due to the scarcity of anime data and unusual motion patterns, leading to issues such as motion distortion and flickering artifacts, which result in misalignment with human preferences. Existing reward models, designed primarily for real-world videos, fail to capture the unique appearance and consistency requirements of anime. In this work, we propose a pipeline to enhance anime video generation by leveraging human feedback for better alignment. Specifically, we construct the first multi-dimensional reward dataset for anime videos, comprising 30k human-annotated samples that incorporating human preferences for both visual appearance and visual consistency. Based on this, we develop AnimeReward, a powerful reward model that employs specialized vision-language models for different evaluation dimensions to guide preference alignment. Furthermore, we introduce Gap-Aware Preference Optimization (GAPO), a novel training method that explicitly incorporates preference gaps into the optimization process, enhancing alignment performance and efficiency. Extensive experiment results show that AnimeReward outperforms existing reward models, and the inclusion of GAPO leads to superior alignment in both quantitative benchmarks and human evaluations, demonstrating the effectiveness of our pipeline in enhancing anime video quality. Our code and dataset are publicly available at https://github.com/bilibili/Index-anisora.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_10044
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Aligning Anime Video Generation with Human Feedback Zhu, Bingwen Jiang, Yudong Xu, Baohan Yang, Siqian Yin, Mingyu Wu, Yidi Sun, Huyang Wu, Zuxuan Computer Vision and Pattern Recognition Anime video generation faces significant challenges due to the scarcity of anime data and unusual motion patterns, leading to issues such as motion distortion and flickering artifacts, which result in misalignment with human preferences. Existing reward models, designed primarily for real-world videos, fail to capture the unique appearance and consistency requirements of anime. In this work, we propose a pipeline to enhance anime video generation by leveraging human feedback for better alignment. Specifically, we construct the first multi-dimensional reward dataset for anime videos, comprising 30k human-annotated samples that incorporating human preferences for both visual appearance and visual consistency. Based on this, we develop AnimeReward, a powerful reward model that employs specialized vision-language models for different evaluation dimensions to guide preference alignment. Furthermore, we introduce Gap-Aware Preference Optimization (GAPO), a novel training method that explicitly incorporates preference gaps into the optimization process, enhancing alignment performance and efficiency. Extensive experiment results show that AnimeReward outperforms existing reward models, and the inclusion of GAPO leads to superior alignment in both quantitative benchmarks and human evaluations, demonstrating the effectiveness of our pipeline in enhancing anime video quality. Our code and dataset are publicly available at https://github.com/bilibili/Index-anisora.
title	Aligning Anime Video Generation with Human Feedback
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2504.10044

Similar Items