Saved in:
| Main Authors: | Chen, Tsai-Shien, Siarohin, Aliaksandr, Menapace, Willi, Deyneka, Ekaterina, Chao, Hsiang-wei, Jeon, Byung Eun, Fang, Yuwei, Lee, Hsin-Ying, Ren, Jian, Yang, Ming-Hsuan, Tulyakov, Sergey |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.19479 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
by: Menapace, Willi, et al.
Published: (2024)
by: Menapace, Willi, et al.
Published: (2024)
Hierarchical Patch Diffusion Models for High-Resolution Video Generation
by: Skorokhodov, Ivan, et al.
Published: (2024)
by: Skorokhodov, Ivan, et al.
Published: (2024)
VIMI: Grounding Video Generation through Multi-modal Instruction
by: Fang, Yuwei, et al.
Published: (2024)
by: Fang, Yuwei, et al.
Published: (2024)
Multi-subject Open-set Personalization in Video Generation
by: Chen, Tsai-Shien, et al.
Published: (2025)
by: Chen, Tsai-Shien, et al.
Published: (2025)
Mind the Time: Temporally-Controlled Multi-Event Video Generation
by: Wu, Ziyi, et al.
Published: (2024)
by: Wu, Ziyi, et al.
Published: (2024)
Zero-Shot Dynamic Concept Personalization with Grid-Based LoRA
by: Abdal, Rameen, et al.
Published: (2025)
by: Abdal, Rameen, et al.
Published: (2025)
AlphaFlow: Understanding and Improving MeanFlow Models
by: Zhang, Huijie, et al.
Published: (2025)
by: Zhang, Huijie, et al.
Published: (2025)
Promptable Game Models: Text-Guided Game Simulation via Masked Diffusion Models
by: Menapace, Willi, et al.
Published: (2023)
by: Menapace, Willi, et al.
Published: (2023)
AlcheMinT: Fine-grained Temporal Control for Multi-Reference Consistent Video Generation
by: Girish, Sharath, et al.
Published: (2025)
by: Girish, Sharath, et al.
Published: (2025)
AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation
by: Haji-Ali, Moayed, et al.
Published: (2024)
by: Haji-Ali, Moayed, et al.
Published: (2024)
Improving Progressive Generation with Decomposable Flow Matching
by: Haji-Ali, Moayed, et al.
Published: (2025)
by: Haji-Ali, Moayed, et al.
Published: (2025)
Dynamic Concepts Personalization from Single Videos
by: Abdal, Rameen, et al.
Published: (2025)
by: Abdal, Rameen, et al.
Published: (2025)
DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models
by: Wu, Ziyi, et al.
Published: (2025)
by: Wu, Ziyi, et al.
Published: (2025)
Improving the Diffusability of Autoencoders
by: Skorokhodov, Ivan, et al.
Published: (2025)
by: Skorokhodov, Ivan, et al.
Published: (2025)
AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation
by: Kag, Anil, et al.
Published: (2024)
by: Kag, Anil, et al.
Published: (2024)
4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models
by: Yu, Heng, et al.
Published: (2024)
by: Yu, Heng, et al.
Published: (2024)
AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers
by: Bahmani, Sherwin, et al.
Published: (2024)
by: Bahmani, Sherwin, et al.
Published: (2024)
Sprint: Sparse-Dense Residual Fusion for Efficient Diffusion Transformers
by: Park, Dogyun, et al.
Published: (2025)
by: Park, Dogyun, et al.
Published: (2025)
H3AE: High Compression, High Speed, and High Quality AutoEncoder for Video Diffusion Models
by: Wu, Yushu, et al.
Published: (2025)
by: Wu, Yushu, et al.
Published: (2025)
4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion
by: Wang, Chaoyang, et al.
Published: (2024)
by: Wang, Chaoyang, et al.
Published: (2024)
One Model, Many Budgets: Elastic Latent Interfaces for Diffusion Transformers
by: Haji-Ali, Moayed, et al.
Published: (2026)
by: Haji-Ali, Moayed, et al.
Published: (2026)
Taming Data and Transformers for Audio Generation
by: Haji-Ali, Moayed, et al.
Published: (2024)
by: Haji-Ali, Moayed, et al.
Published: (2024)
Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization
by: Chen, Tsai-Shien, et al.
Published: (2025)
by: Chen, Tsai-Shien, et al.
Published: (2025)
Video Motion Transfer with Diffusion Transformers
by: Pondaven, Alexander, et al.
Published: (2024)
by: Pondaven, Alexander, et al.
Published: (2024)
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
by: Bahmani, Sherwin, et al.
Published: (2024)
by: Bahmani, Sherwin, et al.
Published: (2024)
4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation
by: Wang, Chaoyang, et al.
Published: (2025)
by: Wang, Chaoyang, et al.
Published: (2025)
Diffusion Priors for Dynamic View Synthesis from Monocular Videos
by: Wang, Chaoyang, et al.
Published: (2024)
by: Wang, Chaoyang, et al.
Published: (2024)
Taming Diffusion Transformer for Efficient Mobile Video Generation in Seconds
by: Wu, Yushu, et al.
Published: (2025)
by: Wu, Yushu, et al.
Published: (2025)
SF-V: Single Forward Video Generation Model
by: Zhang, Zhixing, et al.
Published: (2024)
by: Zhang, Zhixing, et al.
Published: (2024)
EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing
by: Li, Runjia, et al.
Published: (2025)
by: Li, Runjia, et al.
Published: (2025)
Pixel-Aligned Multi-View Generation with Depth Guided Decoder
by: Tang, Zhenggang, et al.
Published: (2024)
by: Tang, Zhenggang, et al.
Published: (2024)
Can Text-to-Video Generation help Video-Language Alignment?
by: Zanella, Luca, et al.
Published: (2025)
by: Zanella, Luca, et al.
Published: (2025)
ActionParty: Multi-Subject Action Binding in Generative Video Games
by: Pondaven, Alexander, et al.
Published: (2026)
by: Pondaven, Alexander, et al.
Published: (2026)
AToM: Amortized Text-to-Mesh using 2D Diffusion
by: Qian, Guocheng, et al.
Published: (2024)
by: Qian, Guocheng, et al.
Published: (2024)
GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement
by: Zhuang, Peiye, et al.
Published: (2024)
by: Zhuang, Peiye, et al.
Published: (2024)
Sediment echosounder raw data (Atlas Parasound P70 echosounder entire dataset) of RV METEOR during cruise M167
by: Menapace, Walter
Published: (2021)
by: Menapace, Walter
Published: (2021)
HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion
by: Liu, Xian, et al.
Published: (2023)
by: Liu, Xian, et al.
Published: (2023)
OmniView: An All-Seeing Diffusion Model for 3D and 4D View Synthesis
by: Fan, Xiang, et al.
Published: (2025)
by: Fan, Xiang, et al.
Published: (2025)
LayerComposer: Multi-Human Personalized Generation via Layered Canvas
by: Qian, Guocheng Gordon, et al.
Published: (2025)
by: Qian, Guocheng Gordon, et al.
Published: (2025)
EasyV2V: A High-quality Instruction-based Video Editing Framework
by: Mai, Jinjie, et al.
Published: (2025)
by: Mai, Jinjie, et al.
Published: (2025)
Similar Items
-
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
by: Menapace, Willi, et al.
Published: (2024) -
Hierarchical Patch Diffusion Models for High-Resolution Video Generation
by: Skorokhodov, Ivan, et al.
Published: (2024) -
VIMI: Grounding Video Generation through Multi-modal Instruction
by: Fang, Yuwei, et al.
Published: (2024) -
Multi-subject Open-set Personalization in Video Generation
by: Chen, Tsai-Shien, et al.
Published: (2025) -
Mind the Time: Temporally-Controlled Multi-Event Video Generation
by: Wu, Ziyi, et al.
Published: (2024)