Saved in:
| Main Authors: | Zhu, Bingwen, Jiang, Yudong, Xu, Baohan, Yang, Siqian, Yin, Mingyu, Wu, Yidi, Sun, Huyang, Wu, Zuxuan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.10044 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AniSora: Exploring the Frontiers of Animation Video Generation in the Sora Era
by: Jiang, Yudong, et al.
Published: (2024)
by: Jiang, Yudong, et al.
Published: (2024)
GenRec: Unifying Video Generation and Recognition with Diffusion Models
by: Weng, Zejia, et al.
Published: (2024)
by: Weng, Zejia, et al.
Published: (2024)
Preference Score Distillation: Leveraging 2D Rewards to Align Text-to-3D Generation with Human Preference
by: Leng, Jiaqi, et al.
Published: (2026)
by: Leng, Jiaqi, et al.
Published: (2026)
AnimeDL-2M: Million-Scale AI-Generated Anime Image Detection and Localization in Diffusion Era
by: Zhu, Chenyang, et al.
Published: (2025)
by: Zhu, Chenyang, et al.
Published: (2025)
OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
by: Wang, Junke, et al.
Published: (2024)
by: Wang, Junke, et al.
Published: (2024)
AnimeAdapter: Fine-grained and Consistent Zero-shot Anime Character Generation
by: Han, Yixuan
Published: (2026)
by: Han, Yixuan
Published: (2026)
E-comIQ-ZH: A Human-Aligned Dataset and Benchmark for Fine-Grained Evaluation of E-commerce Posters with Chain-of-Thought
by: Sun, Meiqi, et al.
Published: (2026)
by: Sun, Meiqi, et al.
Published: (2026)
VMBench: A Benchmark for Perception-Aligned Video Motion Generation
by: Ling, Xinran, et al.
Published: (2025)
by: Ling, Xinran, et al.
Published: (2025)
GEditBench v2: A Human-Aligned Benchmark for General Image Editing
by: Jiang, Zhangqi, et al.
Published: (2026)
by: Jiang, Zhangqi, et al.
Published: (2026)
OmniVid: A Generative Framework for Universal Video Understanding
by: Wang, Junke, et al.
Published: (2024)
by: Wang, Junke, et al.
Published: (2024)
WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors
by: Wu, Keming, et al.
Published: (2026)
by: Wu, Keming, et al.
Published: (2026)
Seg-R1: Segmentation Can Be Surprisingly Simple with Reinforcement Learning
by: You, Zuyao, et al.
Published: (2025)
by: You, Zuyao, et al.
Published: (2025)
AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation
by: Qiu, Lu, et al.
Published: (2025)
by: Qiu, Lu, et al.
Published: (2025)
Learning Accurate Segmentation Purely from Self-Supervision
by: You, Zuyao, et al.
Published: (2026)
by: You, Zuyao, et al.
Published: (2026)
REDUCIO! Generating 1K Video within 16 Seconds using Extremely Compressed Motion Latents
by: Tian, Rui, et al.
Published: (2024)
by: Tian, Rui, et al.
Published: (2024)
AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction
by: Cheng, Junhao, et al.
Published: (2025)
by: Cheng, Junhao, et al.
Published: (2025)
Video-Bench: Human-Aligned Video Generation Benchmark
by: Han, Hui, et al.
Published: (2025)
by: Han, Hui, et al.
Published: (2025)
VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding
by: Shi, Jiapeng, et al.
Published: (2026)
by: Shi, Jiapeng, et al.
Published: (2026)
Repeating Words for Video-Language Retrieval with Coarse-to-Fine Objectives
by: Zhao, Haoyu, et al.
Published: (2025)
by: Zhao, Haoyu, et al.
Published: (2025)
EgoSound: Benchmarking Sound Understanding in Egocentric Videos
by: Zhu, Bingwen, et al.
Published: (2026)
by: Zhu, Bingwen, et al.
Published: (2026)
DCDM: Divide-and-Conquer Diffusion Models for Consistency-Preserving Video Generation
by: Zhao, Haoyu, et al.
Published: (2026)
by: Zhao, Haoyu, et al.
Published: (2026)
Zero-shot High-fidelity and Pose-controllable Character Animation
by: Zhu, Bingwen, et al.
Published: (2024)
by: Zhu, Bingwen, et al.
Published: (2024)
Facial Expression Generation Aligned with Human Preference for Natural Dyadic Interaction
by: Chen, Xu, et al.
Published: (2026)
by: Chen, Xu, et al.
Published: (2026)
AnimeColor: Reference-based Animation Colorization with Diffusion Transformers
by: Zhang, Yuhong, et al.
Published: (2025)
by: Zhang, Yuhong, et al.
Published: (2025)
Baton: Explicit Semantic Blueprints for Joint Video-Audio Generation
by: Tu, Shuyuan, et al.
Published: (2026)
by: Tu, Shuyuan, et al.
Published: (2026)
NOVA-3D: Non-overlapped Views for 3D Anime Character Reconstruction
by: Wang, Hongsheng, et al.
Published: (2024)
by: Wang, Hongsheng, et al.
Published: (2024)
MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing
by: Zhao, Haoyu, et al.
Published: (2023)
by: Zhao, Haoyu, et al.
Published: (2023)
StableAvatar: Infinite-Length Audio-Driven Avatar Video Generation
by: Tu, Shuyuan, et al.
Published: (2025)
by: Tu, Shuyuan, et al.
Published: (2025)
UniHand: A Unified Model for Diverse Controlled 4D Hand Motion Modeling
by: Sun, Zhihao, et al.
Published: (2026)
by: Sun, Zhihao, et al.
Published: (2026)
Compositional Text-to-Image Generation Via Region-aware Bimodal Direct Preference Optimization
by: Liu, Zhuohan, et al.
Published: (2026)
by: Liu, Zhuohan, et al.
Published: (2026)
Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms
by: Zhang, Miaosen, et al.
Published: (2024)
by: Zhang, Miaosen, et al.
Published: (2024)
Aligning Human Motion Generation with Human Perceptions
by: Wang, Haoru, et al.
Published: (2024)
by: Wang, Haoru, et al.
Published: (2024)
Enhancing Video Large Language Models with Structured Multi-Video Collaborative Reasoning
by: He, Zhihao, et al.
Published: (2025)
by: He, Zhihao, et al.
Published: (2025)
RefAlign: Representation Alignment for Reference-to-Video Generation
by: Wang, Lei, et al.
Published: (2026)
by: Wang, Lei, et al.
Published: (2026)
AGHI-QA: A Subjective-Aligned Dataset and Metric for AI-Generated Human Images
by: Li, Yunhao, et al.
Published: (2025)
by: Li, Yunhao, et al.
Published: (2025)
Improving Video Generation with Human Feedback
by: Liu, Jie, et al.
Published: (2025)
by: Liu, Jie, et al.
Published: (2025)
Attention Itself Could Retrieve.RetrieveVGGT: Training-Free Long Context Streaming 3D Reconstruction via Query-Key Similarity Retrieval
by: Zou, Zichen, et al.
Published: (2026)
by: Zou, Zichen, et al.
Published: (2026)
Multi-Prompt Alignment for Multi-Source Unsupervised Domain Adaptation
by: Chen, Haoran, et al.
Published: (2022)
by: Chen, Haoran, et al.
Published: (2022)
GeoGS3D: Single-view 3D Reconstruction via Geometric-aware Diffusion Model and Gaussian Splatting
by: Feng, Qijun, et al.
Published: (2024)
by: Feng, Qijun, et al.
Published: (2024)
CaTok: Taming Mean Flows for One-Dimensional Causal Image Tokenization
by: Chen, Yitong, et al.
Published: (2026)
by: Chen, Yitong, et al.
Published: (2026)
Similar Items
-
AniSora: Exploring the Frontiers of Animation Video Generation in the Sora Era
by: Jiang, Yudong, et al.
Published: (2024) -
GenRec: Unifying Video Generation and Recognition with Diffusion Models
by: Weng, Zejia, et al.
Published: (2024) -
Preference Score Distillation: Leveraging 2D Rewards to Align Text-to-3D Generation with Human Preference
by: Leng, Jiaqi, et al.
Published: (2026) -
AnimeDL-2M: Million-Scale AI-Generated Anime Image Detection and Localization in Diffusion Era
by: Zhu, Chenyang, et al.
Published: (2025) -
OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
by: Wang, Junke, et al.
Published: (2024)