Guardado en:
| Autores principales: | Mai, Jinjie, Wang, Chaoyang, Qian, Guocheng Gordon, Menapace, Willi, Tulyakov, Sergey, Ghanem, Bernard, Wonka, Peter, Mirzaei, Ashkan |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2512.16920 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation
por: Wang, Chaoyang, et al.
Publicado: (2025)
por: Wang, Chaoyang, et al.
Publicado: (2025)
ShapeGen4D: Towards High Quality 4D Shape Generation from Videos
por: Yenphraphai, Jiraphon, et al.
Publicado: (2025)
por: Yenphraphai, Jiraphon, et al.
Publicado: (2025)
EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing
por: Li, Runjia, et al.
Publicado: (2025)
por: Li, Runjia, et al.
Publicado: (2025)
Hierarchical Patch Diffusion Models for High-Resolution Video Generation
por: Skorokhodov, Ivan, et al.
Publicado: (2024)
por: Skorokhodov, Ivan, et al.
Publicado: (2024)
DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models
por: Wu, Ziyi, et al.
Publicado: (2025)
por: Wu, Ziyi, et al.
Publicado: (2025)
4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion
por: Wang, Chaoyang, et al.
Publicado: (2024)
por: Wang, Chaoyang, et al.
Publicado: (2024)
AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers
por: Bahmani, Sherwin, et al.
Publicado: (2024)
por: Bahmani, Sherwin, et al.
Publicado: (2024)
DELTAv2: Accelerating Dense 3D Tracking
por: Ngo, Tuan Duc, et al.
Publicado: (2025)
por: Ngo, Tuan Duc, et al.
Publicado: (2025)
Vivid-ZOO: Multi-View Video Generation with Diffusion Model
por: Li, Bing, et al.
Publicado: (2024)
por: Li, Bing, et al.
Publicado: (2024)
Can Text-to-Video Generation help Video-Language Alignment?
por: Zanella, Luca, et al.
Publicado: (2025)
por: Zanella, Luca, et al.
Publicado: (2025)
VIA: Unified Spatiotemporal Video Adaptation Framework for Global and Local Video Editing
por: Gu, Jing, et al.
Publicado: (2024)
por: Gu, Jing, et al.
Publicado: (2024)
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
por: Bahmani, Sherwin, et al.
Publicado: (2024)
por: Bahmani, Sherwin, et al.
Publicado: (2024)
VIMI: Grounding Video Generation through Multi-modal Instruction
por: Fang, Yuwei, et al.
Publicado: (2024)
por: Fang, Yuwei, et al.
Publicado: (2024)
Diffusion Priors for Dynamic View Synthesis from Monocular Videos
por: Wang, Chaoyang, et al.
Publicado: (2024)
por: Wang, Chaoyang, et al.
Publicado: (2024)
4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models
por: Yu, Heng, et al.
Publicado: (2024)
por: Yu, Heng, et al.
Publicado: (2024)
SF-V: Single Forward Video Generation Model
por: Zhang, Zhixing, et al.
Publicado: (2024)
por: Zhang, Zhixing, et al.
Publicado: (2024)
Diffusion-DRF: Free, Rich, and Differentiable Reward for Video Diffusion Fine-Tuning
por: Wang, Yifan, et al.
Publicado: (2026)
por: Wang, Yifan, et al.
Publicado: (2026)
Dynamic Concepts Personalization from Single Videos
por: Abdal, Rameen, et al.
Publicado: (2025)
por: Abdal, Rameen, et al.
Publicado: (2025)
H3AE: High Compression, High Speed, and High Quality AutoEncoder for Video Diffusion Models
por: Wu, Yushu, et al.
Publicado: (2025)
por: Wu, Yushu, et al.
Publicado: (2025)
Helix4D: Complex 4D Mesh Generation
por: Yenphraphai, Jiraphon, et al.
Publicado: (2026)
por: Yenphraphai, Jiraphon, et al.
Publicado: (2026)
Mind the Time: Temporally-Controlled Multi-Event Video Generation
por: Wu, Ziyi, et al.
Publicado: (2024)
por: Wu, Ziyi, et al.
Publicado: (2024)
GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering
por: Hamdi, Abdullah, et al.
Publicado: (2024)
por: Hamdi, Abdullah, et al.
Publicado: (2024)
OmniView: An All-Seeing Diffusion Model for 3D and 4D View Synthesis
por: Fan, Xiang, et al.
Publicado: (2025)
por: Fan, Xiang, et al.
Publicado: (2025)
TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks
por: Mai, Jinjie, et al.
Publicado: (2024)
por: Mai, Jinjie, et al.
Publicado: (2024)
AlphaFlow: Understanding and Improving MeanFlow Models
por: Zhang, Huijie, et al.
Publicado: (2025)
por: Zhang, Huijie, et al.
Publicado: (2025)
Promptable Game Models: Text-Guided Game Simulation via Masked Diffusion Models
por: Menapace, Willi, et al.
Publicado: (2023)
por: Menapace, Willi, et al.
Publicado: (2023)
AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation
por: Haji-Ali, Moayed, et al.
Publicado: (2024)
por: Haji-Ali, Moayed, et al.
Publicado: (2024)
Pix4Point: Image Pretrained Standard Transformers for 3D Point Cloud Understanding
por: Qian, Guocheng, et al.
Publicado: (2022)
por: Qian, Guocheng, et al.
Publicado: (2022)
Mind-the-Glitch: Visual Correspondence for Detecting Inconsistencies in Subject-Driven Generation
por: Eldesokey, Abdelrahman, et al.
Publicado: (2025)
por: Eldesokey, Abdelrahman, et al.
Publicado: (2025)
Improving Progressive Generation with Decomposable Flow Matching
por: Haji-Ali, Moayed, et al.
Publicado: (2025)
por: Haji-Ali, Moayed, et al.
Publicado: (2025)
SPAD : Spatially Aware Multiview Diffusers
por: Kant, Yash, et al.
Publicado: (2024)
por: Kant, Yash, et al.
Publicado: (2024)
Improving the Diffusability of Autoencoders
por: Skorokhodov, Ivan, et al.
Publicado: (2025)
por: Skorokhodov, Ivan, et al.
Publicado: (2025)
AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation
por: Kag, Anil, et al.
Publicado: (2024)
por: Kag, Anil, et al.
Publicado: (2024)
LayerComposer: Multi-Human Personalized Generation via Layered Canvas
por: Qian, Guocheng Gordon, et al.
Publicado: (2025)
por: Qian, Guocheng Gordon, et al.
Publicado: (2025)
Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization
por: Chen, Tsai-Shien, et al.
Publicado: (2025)
por: Chen, Tsai-Shien, et al.
Publicado: (2025)
Hybrid Structure-from-Motion and Camera Relocalization for Enhanced Egocentric Localization
por: Mai, Jinjie, et al.
Publicado: (2024)
por: Mai, Jinjie, et al.
Publicado: (2024)
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
por: Menapace, Willi, et al.
Publicado: (2024)
por: Menapace, Willi, et al.
Publicado: (2024)
Can Video Diffusion Model Reconstruct 4D Geometry?
por: Mai, Jinjie, et al.
Publicado: (2025)
por: Mai, Jinjie, et al.
Publicado: (2025)
T2Bs: Text-to-Character Blendshapes via Video Generation
por: Luo, Jiahao, et al.
Publicado: (2025)
por: Luo, Jiahao, et al.
Publicado: (2025)
NearID: Identity Representation Learning via Near-identity Distractors
por: Cvejic, Aleksandar, et al.
Publicado: (2026)
por: Cvejic, Aleksandar, et al.
Publicado: (2026)
Ejemplares similares
-
4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation
por: Wang, Chaoyang, et al.
Publicado: (2025) -
ShapeGen4D: Towards High Quality 4D Shape Generation from Videos
por: Yenphraphai, Jiraphon, et al.
Publicado: (2025) -
EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing
por: Li, Runjia, et al.
Publicado: (2025) -
Hierarchical Patch Diffusion Models for High-Resolution Video Generation
por: Skorokhodov, Ivan, et al.
Publicado: (2024) -
DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models
por: Wu, Ziyi, et al.
Publicado: (2025)