:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Jiahui, Wang, Weida, Shi, Runhua, Yang, Huan, Ding, Chaofan, Chen, Zihao
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2512.02492
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Self-Supervised Learning of Deviation in Latent Representation for Co-speech Gesture Video Generation
by: Yang, Huan, et al.
Published: (2024)

YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
by: Chen, Zihao, et al.
Published: (2024)

DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation
by: Zhang, Haomin, et al.
Published: (2025)

Towards Video to Piano Music Generation with Chain-of-Perform Support Benchmarks
by: Liu, Chang, et al.
Published: (2025)

Audio-driven Gesture Generation via Deviation Feature in the Latent Space
by: Chen, Jiahui, et al.
Published: (2025)

AutoMV: An Automatic Multi-Agent System for Music Video Generation
by: Tang, Xiaoxuan, et al.
Published: (2025)

DeepSound-V1: Start to Think Step-by-Step in the Audio Generation from Videos
by: Liang, Yunming, et al.
Published: (2025)

DeepDubber-V1: Towards High Quality and Dialogue, Narration, Monologue Adaptive Movie Dubbing Via Multi-Modal Chain-of-Thoughts Reasoning Guidance
by: Zheng, Junjie, et al.
Published: (2025)

LD-LAudio-V1: Video-to-Long-Form-Audio Generation Extension with Dual Lightweight Adapters
by: Zhang, Haomin, et al.
Published: (2025)

MM-MovieDubber: Towards Multi-Modal Learning for Multi-Modal Movie Dubbing
by: Zheng, Junjie, et al.
Published: (2025)

VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model
by: Zuo, Qi, et al.
Published: (2024)

Enhancing Video Large Language Models with Structured Multi-Video Collaborative Reasoning
by: He, Zhihao, et al.
Published: (2025)

MV-TAP: Tracking Any Point in Multi-View Videos
by: Koo, Jahyeok, et al.
Published: (2025)

MV2MAE: Multi-View Video Masked Autoencoders
by: Shah, Ketul, et al.
Published: (2024)

MV-Adapter: Multimodal Video Transfer Learning for Video Text Retrieval
by: Jin, Xiaojie, et al.
Published: (2023)

X-Dancer: Expressive Music to Human Dance Video Generation
by: Chen, Zeyuan, et al.
Published: (2025)

MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning
by: Chen, Tieyuan, et al.
Published: (2024)

MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation
by: Wang, Weimin, et al.
Published: (2024)

MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation
by: Yang, Kaixing, et al.
Published: (2025)

DIFFVSGG: Diffusion-Driven Online Video Scene Graph Generation
by: Chen, Mu, et al.
Published: (2025)

VidSketch: Hand-drawn Sketch-Driven Video Generation with Diffusion Control
by: Jiang, Lifan, et al.
Published: (2025)

MV-S2V: Multi-View Subject-Consistent Video Generation
by: Song, Ziyang, et al.
Published: (2026)

Multi-sentence Video Grounding for Long Video Generation
by: Feng, Wei, et al.
Published: (2024)

AllocMV: Optimal Resource Allocation for Music Video Generation via Structured Persistent State
by: Wang, Huimin, et al.
Published: (2026)

FlashVideo: A Framework for Swift Inference in Text-to-Video Generation
by: Lei, Bin, et al.
Published: (2023)

MV-Performer: Taming Video Diffusion Model for Faithful and Synchronized Multi-view Performer Synthesis
by: Zhi, Yihao, et al.
Published: (2025)

PMR: Physical Model-Driven Multi-Stage Restoration of Turbulent Dynamic Videos
by: Wu, Tao, et al.
Published: (2025)

VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos
by: Lin, Yan-Bo, et al.
Published: (2024)

Hyper-3DG: Text-to-3D Gaussian Generation via Hypergraph
by: Di, Donglin, et al.
Published: (2024)

Versatile Transition Generation with Image-to-Video Diffusion
by: Yang, Zuhao, et al.
Published: (2025)

Controllable Generative Video Compression
by: Ding, Ding, et al.
Published: (2026)

METok: Multi-Stage Event-based Token Compression for Efficient Long Video Understanding
by: Wang, Mengyue, et al.
Published: (2025)

VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
by: Ren, Weiming, et al.
Published: (2024)

Enhance-A-Video: Better Generated Video for Free
by: Luo, Yang, et al.
Published: (2025)

T-SVG: Text-Driven Stereoscopic Video Generation
by: Jin, Qiao, et al.
Published: (2024)

MECD+: Unlocking Event-Level Causal Graph Discovery for Video Reasoning
by: Chen, Tieyuan, et al.
Published: (2025)

AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration
by: Chen, Xinlong, et al.
Published: (2025)

Generalizing to Out-of-Sample Degradations via Model Reprogramming
by: Jiang, Runhua, et al.
Published: (2024)

FC-VFI: Faithful and Consistent Video Frame Interpolation for High-FPS Slow Motion Video Generation
by: Ding, Ganggui, et al.
Published: (2026)

MV-Adapter: Multi-view Consistent Image Generation Made Easy
by: Huang, Zehuan, et al.
Published: (2024)