Saved in:
| Main Authors: | Bian, Yuxuan, Chen, Xin, Li, Zenan, Zhi, Tiancheng, Sang, Shen, Luo, Linjie, Xu, Qiang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.20888 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Plan-X: Instruct Video Generation via Semantic Planning
by: Huang, Lun, et al.
Published: (2025)
by: Huang, Lun, et al.
Published: (2025)
Lynx: Towards High-Fidelity Personalized Video Generation
by: Sang, Shen, et al.
Published: (2025)
by: Sang, Shen, et al.
Published: (2025)
VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control
by: Bian, Yuxuan, et al.
Published: (2025)
by: Bian, Yuxuan, et al.
Published: (2025)
X-UniMotion: Animating Human Images with Expressive, Unified and Identity-Agnostic Motion Latents
by: Song, Guoxian, et al.
Published: (2025)
by: Song, Guoxian, et al.
Published: (2025)
Bridging Your Imagination with Audio-Video Generation via a Unified Director
by: Zhang, Jiaxu, et al.
Published: (2025)
by: Zhang, Jiaxu, et al.
Published: (2025)
COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection
by: Xiao, Jinqi, et al.
Published: (2024)
by: Xiao, Jinqi, et al.
Published: (2024)
Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals
by: Gillman, Nate, et al.
Published: (2025)
by: Gillman, Nate, et al.
Published: (2025)
OmniCam: Unified Multimodal Video Generation via Camera Control
by: Yang, Xiaoda, et al.
Published: (2025)
by: Yang, Xiaoda, et al.
Published: (2025)
VideoQA-SC: Adaptive Semantic Communication for Video Question Answering
by: Guo, Jiangyuan, et al.
Published: (2024)
by: Guo, Jiangyuan, et al.
Published: (2024)
VideoCogQA: A Controllable Benchmark for Evaluating Cognitive Abilities in Video-Language Models
by: Li, Chenglin, et al.
Published: (2024)
by: Li, Chenglin, et al.
Published: (2024)
GenDDS: Generating Diverse Driving Video Scenarios with Prompt-to-Video Generative Model
by: Fu, Yongjie, et al.
Published: (2024)
by: Fu, Yongjie, et al.
Published: (2024)
Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator
by: Qin, Luozheng, et al.
Published: (2026)
by: Qin, Luozheng, et al.
Published: (2026)
PhyPrompt: RL-based Prompt Refinement for Physically Plausible Text-to-Video Generation
by: Wu, Shang, et al.
Published: (2026)
by: Wu, Shang, et al.
Published: (2026)
Learning Feature-Preserving Portrait Editing from Generated Pairs
by: Chen, Bowei, et al.
Published: (2024)
by: Chen, Bowei, et al.
Published: (2024)
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation
by: Wang, Zhouxia, et al.
Published: (2023)
by: Wang, Zhouxia, et al.
Published: (2023)
Are Unified Vision-Language Models Necessary: Generalization Across Understanding and Generation
by: Zhang, Jihai, et al.
Published: (2025)
by: Zhang, Jihai, et al.
Published: (2025)
Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation
by: Yan, Xin, et al.
Published: (2024)
by: Yan, Xin, et al.
Published: (2024)
Controllable Video Generation with Provable Disentanglement
by: Shen, Yifan, et al.
Published: (2025)
by: Shen, Yifan, et al.
Published: (2025)
VideoGUI: A Benchmark for GUI Automation from Instructional Videos
by: Lin, Kevin Qinghong, et al.
Published: (2024)
by: Lin, Kevin Qinghong, et al.
Published: (2024)
DepthPilot: From Controllability to Interpretability in Colonoscopy Video Generation
by: Fu, Junhu, et al.
Published: (2026)
by: Fu, Junhu, et al.
Published: (2026)
HaploOmni: Unified Single Transformer for Multimodal Video Understanding and Generation
by: Xiao, Yicheng, et al.
Published: (2025)
by: Xiao, Yicheng, et al.
Published: (2025)
InstructMoLE: Instruction-Guided Mixture of Low-rank Experts for Multi-Conditional Image Generation
by: Xiao, Jinqi, et al.
Published: (2025)
by: Xiao, Jinqi, et al.
Published: (2025)
Apollo: Unified Multi-Task Audio-Video Joint Generation
by: Wang, Jun, et al.
Published: (2026)
by: Wang, Jun, et al.
Published: (2026)
Domain Adaptation of VLM for Soccer Video Understanding
by: Jiang, Tiancheng, et al.
Published: (2025)
by: Jiang, Tiancheng, et al.
Published: (2025)
S2DM: Sector-Shaped Diffusion Models for Video Generation
by: Lang, Haoran, et al.
Published: (2024)
by: Lang, Haoran, et al.
Published: (2024)
EVC-MF: End-to-end Video Captioning Network with Multi-scale Features
by: Niu, Tian-Zi, et al.
Published: (2024)
by: Niu, Tian-Zi, et al.
Published: (2024)
MemCam: Memory-Augmented Camera Control for Consistent Video Generation
by: Gao, Xinhang, et al.
Published: (2026)
by: Gao, Xinhang, et al.
Published: (2026)
Space-time Reinforcement Network for Video Object Segmentation
by: Chen, Yadang, et al.
Published: (2024)
by: Chen, Yadang, et al.
Published: (2024)
LayerT2V: A Unified Multi-Layer Video Generation Framework
by: Li, Guangzhao, et al.
Published: (2025)
by: Li, Guangzhao, et al.
Published: (2025)
Conditional Video Generation for High-Efficiency Video Compression
by: Yi, Fangqiu, et al.
Published: (2025)
by: Yi, Fangqiu, et al.
Published: (2025)
Causality Model for Semantic Understanding on Videos
by: Yicong, Li
Published: (2025)
by: Yicong, Li
Published: (2025)
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
by: Hua, Hang, et al.
Published: (2024)
by: Hua, Hang, et al.
Published: (2024)
A Mechanistic View on Video Generation as World Models: State and Dynamics
by: Wang, Luozhou, et al.
Published: (2026)
by: Wang, Luozhou, et al.
Published: (2026)
Learning Spatial-Semantic Features for Robust Video Object Segmentation
by: Li, Xin, et al.
Published: (2024)
by: Li, Xin, et al.
Published: (2024)
BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation
by: Wang, Ruotong, et al.
Published: (2025)
by: Wang, Ruotong, et al.
Published: (2025)
SWIFT: Prompt-Adaptive Memory for Efficient Interactive Long Video Generation
by: Tan, Shanwen, et al.
Published: (2026)
by: Tan, Shanwen, et al.
Published: (2026)
Semantic Generative Tuning for Unified Multimodal Models
by: Yu, Songsong, et al.
Published: (2026)
by: Yu, Songsong, et al.
Published: (2026)
Accurate and Fast Compressed Video Captioning
by: Shen, Yaojie, et al.
Published: (2023)
by: Shen, Yaojie, et al.
Published: (2023)
EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation
by: Qu, Qiang, et al.
Published: (2025)
by: Qu, Qiang, et al.
Published: (2025)
UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos
by: Mei, Yuting, et al.
Published: (2024)
by: Mei, Yuting, et al.
Published: (2024)
Similar Items
-
Plan-X: Instruct Video Generation via Semantic Planning
by: Huang, Lun, et al.
Published: (2025) -
Lynx: Towards High-Fidelity Personalized Video Generation
by: Sang, Shen, et al.
Published: (2025) -
VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control
by: Bian, Yuxuan, et al.
Published: (2025) -
X-UniMotion: Animating Human Images with Expressive, Unified and Identity-Agnostic Motion Latents
by: Song, Guoxian, et al.
Published: (2025) -
Bridging Your Imagination with Audio-Video Generation via a Unified Director
by: Zhang, Jiaxu, et al.
Published: (2025)