:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhu, Bingwen, Jiang, Yudong, Xu, Baohan, Yang, Siqian, Yin, Mingyu, Wu, Yidi, Sun, Huyang, Wu, Zuxuan
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2504.10044
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

AniSora: Exploring the Frontiers of Animation Video Generation in the Sora Era
by: Jiang, Yudong, et al.
Published: (2024)

GenRec: Unifying Video Generation and Recognition with Diffusion Models
by: Weng, Zejia, et al.
Published: (2024)

Preference Score Distillation: Leveraging 2D Rewards to Align Text-to-3D Generation with Human Preference
by: Leng, Jiaqi, et al.
Published: (2026)

AnimeDL-2M: Million-Scale AI-Generated Anime Image Detection and Localization in Diffusion Era
by: Zhu, Chenyang, et al.
Published: (2025)

OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
by: Wang, Junke, et al.
Published: (2024)

AnimeAdapter: Fine-grained and Consistent Zero-shot Anime Character Generation
by: Han, Yixuan
Published: (2026)

E-comIQ-ZH: A Human-Aligned Dataset and Benchmark for Fine-Grained Evaluation of E-commerce Posters with Chain-of-Thought
by: Sun, Meiqi, et al.
Published: (2026)

VMBench: A Benchmark for Perception-Aligned Video Motion Generation
by: Ling, Xinran, et al.
Published: (2025)

GEditBench v2: A Human-Aligned Benchmark for General Image Editing
by: Jiang, Zhangqi, et al.
Published: (2026)

OmniVid: A Generative Framework for Universal Video Understanding
by: Wang, Junke, et al.
Published: (2024)

WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors
by: Wu, Keming, et al.
Published: (2026)

Seg-R1: Segmentation Can Be Surprisingly Simple with Reinforcement Learning
by: You, Zuyao, et al.
Published: (2025)

AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation
by: Qiu, Lu, et al.
Published: (2025)

Learning Accurate Segmentation Purely from Self-Supervision
by: You, Zuyao, et al.
Published: (2026)

REDUCIO! Generating 1K Video within 16 Seconds using Extremely Compressed Motion Latents
by: Tian, Rui, et al.
Published: (2024)

AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction
by: Cheng, Junhao, et al.
Published: (2025)

Video-Bench: Human-Aligned Video Generation Benchmark
by: Han, Hui, et al.
Published: (2025)

VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding
by: Shi, Jiapeng, et al.
Published: (2026)

Repeating Words for Video-Language Retrieval with Coarse-to-Fine Objectives
by: Zhao, Haoyu, et al.
Published: (2025)

EgoSound: Benchmarking Sound Understanding in Egocentric Videos
by: Zhu, Bingwen, et al.
Published: (2026)

DCDM: Divide-and-Conquer Diffusion Models for Consistency-Preserving Video Generation
by: Zhao, Haoyu, et al.
Published: (2026)

Zero-shot High-fidelity and Pose-controllable Character Animation
by: Zhu, Bingwen, et al.
Published: (2024)

Facial Expression Generation Aligned with Human Preference for Natural Dyadic Interaction
by: Chen, Xu, et al.
Published: (2026)

AnimeColor: Reference-based Animation Colorization with Diffusion Transformers
by: Zhang, Yuhong, et al.
Published: (2025)

Baton: Explicit Semantic Blueprints for Joint Video-Audio Generation
by: Tu, Shuyuan, et al.
Published: (2026)

NOVA-3D: Non-overlapped Views for 3D Anime Character Reconstruction
by: Wang, Hongsheng, et al.
Published: (2024)

MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing
by: Zhao, Haoyu, et al.
Published: (2023)

StableAvatar: Infinite-Length Audio-Driven Avatar Video Generation
by: Tu, Shuyuan, et al.
Published: (2025)

UniHand: A Unified Model for Diverse Controlled 4D Hand Motion Modeling
by: Sun, Zhihao, et al.
Published: (2026)

Compositional Text-to-Image Generation Via Region-aware Bimodal Direct Preference Optimization
by: Liu, Zhuohan, et al.
Published: (2026)

Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms
by: Zhang, Miaosen, et al.
Published: (2024)

Aligning Human Motion Generation with Human Perceptions
by: Wang, Haoru, et al.
Published: (2024)

Enhancing Video Large Language Models with Structured Multi-Video Collaborative Reasoning
by: He, Zhihao, et al.
Published: (2025)

RefAlign: Representation Alignment for Reference-to-Video Generation
by: Wang, Lei, et al.
Published: (2026)

AGHI-QA: A Subjective-Aligned Dataset and Metric for AI-Generated Human Images
by: Li, Yunhao, et al.
Published: (2025)

Improving Video Generation with Human Feedback
by: Liu, Jie, et al.
Published: (2025)

Attention Itself Could Retrieve.RetrieveVGGT: Training-Free Long Context Streaming 3D Reconstruction via Query-Key Similarity Retrieval
by: Zou, Zichen, et al.
Published: (2026)

Multi-Prompt Alignment for Multi-Source Unsupervised Domain Adaptation
by: Chen, Haoran, et al.
Published: (2022)

GeoGS3D: Single-view 3D Reconstruction via Geometric-aware Diffusion Model and Gaussian Splatting
by: Feng, Qijun, et al.
Published: (2024)

CaTok: Taming Mean Flows for One-Dimensional Causal Image Tokenization
by: Chen, Yitong, et al.
Published: (2026)