:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Bian, Yuxuan, Chen, Xin, Li, Zenan, Zhi, Tiancheng, Sang, Shen, Luo, Linjie, Xu, Qiang
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2510.20888
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Plan-X: Instruct Video Generation via Semantic Planning
by: Huang, Lun, et al.
Published: (2025)

Lynx: Towards High-Fidelity Personalized Video Generation
by: Sang, Shen, et al.
Published: (2025)

VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control
by: Bian, Yuxuan, et al.
Published: (2025)

X-UniMotion: Animating Human Images with Expressive, Unified and Identity-Agnostic Motion Latents
by: Song, Guoxian, et al.
Published: (2025)

Bridging Your Imagination with Audio-Video Generation via a Unified Director
by: Zhang, Jiaxu, et al.
Published: (2025)

COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection
by: Xiao, Jinqi, et al.
Published: (2024)

Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals
by: Gillman, Nate, et al.
Published: (2025)

OmniCam: Unified Multimodal Video Generation via Camera Control
by: Yang, Xiaoda, et al.
Published: (2025)

VideoQA-SC: Adaptive Semantic Communication for Video Question Answering
by: Guo, Jiangyuan, et al.
Published: (2024)

VideoCogQA: A Controllable Benchmark for Evaluating Cognitive Abilities in Video-Language Models
by: Li, Chenglin, et al.
Published: (2024)

GenDDS: Generating Diverse Driving Video Scenarios with Prompt-to-Video Generative Model
by: Fu, Yongjie, et al.
Published: (2024)

Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator
by: Qin, Luozheng, et al.
Published: (2026)

PhyPrompt: RL-based Prompt Refinement for Physically Plausible Text-to-Video Generation
by: Wu, Shang, et al.
Published: (2026)

Learning Feature-Preserving Portrait Editing from Generated Pairs
by: Chen, Bowei, et al.
Published: (2024)

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation
by: Wang, Zhouxia, et al.
Published: (2023)

Are Unified Vision-Language Models Necessary: Generalization Across Understanding and Generation
by: Zhang, Jihai, et al.
Published: (2025)

Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation
by: Yan, Xin, et al.
Published: (2024)

Controllable Video Generation with Provable Disentanglement
by: Shen, Yifan, et al.
Published: (2025)

VideoGUI: A Benchmark for GUI Automation from Instructional Videos
by: Lin, Kevin Qinghong, et al.
Published: (2024)

DepthPilot: From Controllability to Interpretability in Colonoscopy Video Generation
by: Fu, Junhu, et al.
Published: (2026)

HaploOmni: Unified Single Transformer for Multimodal Video Understanding and Generation
by: Xiao, Yicheng, et al.
Published: (2025)

InstructMoLE: Instruction-Guided Mixture of Low-rank Experts for Multi-Conditional Image Generation
by: Xiao, Jinqi, et al.
Published: (2025)

Apollo: Unified Multi-Task Audio-Video Joint Generation
by: Wang, Jun, et al.
Published: (2026)

Domain Adaptation of VLM for Soccer Video Understanding
by: Jiang, Tiancheng, et al.
Published: (2025)

S2DM: Sector-Shaped Diffusion Models for Video Generation
by: Lang, Haoran, et al.
Published: (2024)

EVC-MF: End-to-end Video Captioning Network with Multi-scale Features
by: Niu, Tian-Zi, et al.
Published: (2024)

MemCam: Memory-Augmented Camera Control for Consistent Video Generation
by: Gao, Xinhang, et al.
Published: (2026)

Space-time Reinforcement Network for Video Object Segmentation
by: Chen, Yadang, et al.
Published: (2024)

LayerT2V: A Unified Multi-Layer Video Generation Framework
by: Li, Guangzhao, et al.
Published: (2025)

Conditional Video Generation for High-Efficiency Video Compression
by: Yi, Fangqiu, et al.
Published: (2025)

Causality Model for Semantic Understanding on Videos
by: Yicong, Li
Published: (2025)

V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
by: Hua, Hang, et al.
Published: (2024)

A Mechanistic View on Video Generation as World Models: State and Dynamics
by: Wang, Luozhou, et al.
Published: (2026)

Learning Spatial-Semantic Features for Robust Video Object Segmentation
by: Li, Xin, et al.
Published: (2024)

BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation
by: Wang, Ruotong, et al.
Published: (2025)

SWIFT: Prompt-Adaptive Memory for Efficient Interactive Long Video Generation
by: Tan, Shanwen, et al.
Published: (2026)

Semantic Generative Tuning for Unified Multimodal Models
by: Yu, Songsong, et al.
Published: (2026)

Accurate and Fast Compressed Video Captioning
by: Shen, Yaojie, et al.
Published: (2023)

EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation
by: Qu, Qiang, et al.
Published: (2025)

UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos
by: Mei, Yuting, et al.
Published: (2024)