Saved in:
| Main Authors: | Chen, Jiahui, Wang, Weida, Shi, Runhua, Yang, Huan, Ding, Chaofan, Chen, Zihao |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.02492 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Self-Supervised Learning of Deviation in Latent Representation for Co-speech Gesture Video Generation
by: Yang, Huan, et al.
Published: (2024)
by: Yang, Huan, et al.
Published: (2024)
YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
by: Chen, Zihao, et al.
Published: (2024)
by: Chen, Zihao, et al.
Published: (2024)
DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation
by: Zhang, Haomin, et al.
Published: (2025)
by: Zhang, Haomin, et al.
Published: (2025)
Towards Video to Piano Music Generation with Chain-of-Perform Support Benchmarks
by: Liu, Chang, et al.
Published: (2025)
by: Liu, Chang, et al.
Published: (2025)
Audio-driven Gesture Generation via Deviation Feature in the Latent Space
by: Chen, Jiahui, et al.
Published: (2025)
by: Chen, Jiahui, et al.
Published: (2025)
AutoMV: An Automatic Multi-Agent System for Music Video Generation
by: Tang, Xiaoxuan, et al.
Published: (2025)
by: Tang, Xiaoxuan, et al.
Published: (2025)
DeepSound-V1: Start to Think Step-by-Step in the Audio Generation from Videos
by: Liang, Yunming, et al.
Published: (2025)
by: Liang, Yunming, et al.
Published: (2025)
DeepDubber-V1: Towards High Quality and Dialogue, Narration, Monologue Adaptive Movie Dubbing Via Multi-Modal Chain-of-Thoughts Reasoning Guidance
by: Zheng, Junjie, et al.
Published: (2025)
by: Zheng, Junjie, et al.
Published: (2025)
LD-LAudio-V1: Video-to-Long-Form-Audio Generation Extension with Dual Lightweight Adapters
by: Zhang, Haomin, et al.
Published: (2025)
by: Zhang, Haomin, et al.
Published: (2025)
MM-MovieDubber: Towards Multi-Modal Learning for Multi-Modal Movie Dubbing
by: Zheng, Junjie, et al.
Published: (2025)
by: Zheng, Junjie, et al.
Published: (2025)
VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model
by: Zuo, Qi, et al.
Published: (2024)
by: Zuo, Qi, et al.
Published: (2024)
Enhancing Video Large Language Models with Structured Multi-Video Collaborative Reasoning
by: He, Zhihao, et al.
Published: (2025)
by: He, Zhihao, et al.
Published: (2025)
MV-TAP: Tracking Any Point in Multi-View Videos
by: Koo, Jahyeok, et al.
Published: (2025)
by: Koo, Jahyeok, et al.
Published: (2025)
MV2MAE: Multi-View Video Masked Autoencoders
by: Shah, Ketul, et al.
Published: (2024)
by: Shah, Ketul, et al.
Published: (2024)
MV-Adapter: Multimodal Video Transfer Learning for Video Text Retrieval
by: Jin, Xiaojie, et al.
Published: (2023)
by: Jin, Xiaojie, et al.
Published: (2023)
X-Dancer: Expressive Music to Human Dance Video Generation
by: Chen, Zeyuan, et al.
Published: (2025)
by: Chen, Zeyuan, et al.
Published: (2025)
MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning
by: Chen, Tieyuan, et al.
Published: (2024)
by: Chen, Tieyuan, et al.
Published: (2024)
MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation
by: Wang, Weimin, et al.
Published: (2024)
by: Wang, Weimin, et al.
Published: (2024)
MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation
by: Yang, Kaixing, et al.
Published: (2025)
by: Yang, Kaixing, et al.
Published: (2025)
DIFFVSGG: Diffusion-Driven Online Video Scene Graph Generation
by: Chen, Mu, et al.
Published: (2025)
by: Chen, Mu, et al.
Published: (2025)
VidSketch: Hand-drawn Sketch-Driven Video Generation with Diffusion Control
by: Jiang, Lifan, et al.
Published: (2025)
by: Jiang, Lifan, et al.
Published: (2025)
MV-S2V: Multi-View Subject-Consistent Video Generation
by: Song, Ziyang, et al.
Published: (2026)
by: Song, Ziyang, et al.
Published: (2026)
Multi-sentence Video Grounding for Long Video Generation
by: Feng, Wei, et al.
Published: (2024)
by: Feng, Wei, et al.
Published: (2024)
AllocMV: Optimal Resource Allocation for Music Video Generation via Structured Persistent State
by: Wang, Huimin, et al.
Published: (2026)
by: Wang, Huimin, et al.
Published: (2026)
FlashVideo: A Framework for Swift Inference in Text-to-Video Generation
by: Lei, Bin, et al.
Published: (2023)
by: Lei, Bin, et al.
Published: (2023)
MV-Performer: Taming Video Diffusion Model for Faithful and Synchronized Multi-view Performer Synthesis
by: Zhi, Yihao, et al.
Published: (2025)
by: Zhi, Yihao, et al.
Published: (2025)
PMR: Physical Model-Driven Multi-Stage Restoration of Turbulent Dynamic Videos
by: Wu, Tao, et al.
Published: (2025)
by: Wu, Tao, et al.
Published: (2025)
VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos
by: Lin, Yan-Bo, et al.
Published: (2024)
by: Lin, Yan-Bo, et al.
Published: (2024)
Hyper-3DG: Text-to-3D Gaussian Generation via Hypergraph
by: Di, Donglin, et al.
Published: (2024)
by: Di, Donglin, et al.
Published: (2024)
Versatile Transition Generation with Image-to-Video Diffusion
by: Yang, Zuhao, et al.
Published: (2025)
by: Yang, Zuhao, et al.
Published: (2025)
Controllable Generative Video Compression
by: Ding, Ding, et al.
Published: (2026)
by: Ding, Ding, et al.
Published: (2026)
METok: Multi-Stage Event-based Token Compression for Efficient Long Video Understanding
by: Wang, Mengyue, et al.
Published: (2025)
by: Wang, Mengyue, et al.
Published: (2025)
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
by: Ren, Weiming, et al.
Published: (2024)
by: Ren, Weiming, et al.
Published: (2024)
Enhance-A-Video: Better Generated Video for Free
by: Luo, Yang, et al.
Published: (2025)
by: Luo, Yang, et al.
Published: (2025)
T-SVG: Text-Driven Stereoscopic Video Generation
by: Jin, Qiao, et al.
Published: (2024)
by: Jin, Qiao, et al.
Published: (2024)
MECD+: Unlocking Event-Level Causal Graph Discovery for Video Reasoning
by: Chen, Tieyuan, et al.
Published: (2025)
by: Chen, Tieyuan, et al.
Published: (2025)
AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration
by: Chen, Xinlong, et al.
Published: (2025)
by: Chen, Xinlong, et al.
Published: (2025)
Generalizing to Out-of-Sample Degradations via Model Reprogramming
by: Jiang, Runhua, et al.
Published: (2024)
by: Jiang, Runhua, et al.
Published: (2024)
FC-VFI: Faithful and Consistent Video Frame Interpolation for High-FPS Slow Motion Video Generation
by: Ding, Ganggui, et al.
Published: (2026)
by: Ding, Ganggui, et al.
Published: (2026)
MV-Adapter: Multi-view Consistent Image Generation Made Easy
by: Huang, Zehuan, et al.
Published: (2024)
by: Huang, Zehuan, et al.
Published: (2024)
Similar Items
-
Self-Supervised Learning of Deviation in Latent Representation for Co-speech Gesture Video Generation
by: Yang, Huan, et al.
Published: (2024) -
YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
by: Chen, Zihao, et al.
Published: (2024) -
DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation
by: Zhang, Haomin, et al.
Published: (2025) -
Towards Video to Piano Music Generation with Chain-of-Perform Support Benchmarks
by: Liu, Chang, et al.
Published: (2025) -
Audio-driven Gesture Generation via Deviation Feature in the Latent Space
by: Chen, Jiahui, et al.
Published: (2025)