:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Mai, Jinjie, Wang, Chaoyang, Qian, Guocheng Gordon, Menapace, Willi, Tulyakov, Sergey, Ghanem, Bernard, Wonka, Peter, Mirzaei, Ashkan
Formato:	Preprint
Publicado:	2025
Materias:	Computer Vision and Pattern Recognition Artificial Intelligence
Acceso en línea:	https://arxiv.org/abs/2512.16920
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation
por: Wang, Chaoyang, et al.
Publicado: (2025)

ShapeGen4D: Towards High Quality 4D Shape Generation from Videos
por: Yenphraphai, Jiraphon, et al.
Publicado: (2025)

EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing
por: Li, Runjia, et al.
Publicado: (2025)

Hierarchical Patch Diffusion Models for High-Resolution Video Generation
por: Skorokhodov, Ivan, et al.
Publicado: (2024)

DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models
por: Wu, Ziyi, et al.
Publicado: (2025)

4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion
por: Wang, Chaoyang, et al.
Publicado: (2024)

AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers
por: Bahmani, Sherwin, et al.
Publicado: (2024)

DELTAv2: Accelerating Dense 3D Tracking
por: Ngo, Tuan Duc, et al.
Publicado: (2025)

Vivid-ZOO: Multi-View Video Generation with Diffusion Model
por: Li, Bing, et al.
Publicado: (2024)

Can Text-to-Video Generation help Video-Language Alignment?
por: Zanella, Luca, et al.
Publicado: (2025)

VIA: Unified Spatiotemporal Video Adaptation Framework for Global and Local Video Editing
por: Gu, Jing, et al.
Publicado: (2024)

VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
por: Bahmani, Sherwin, et al.
Publicado: (2024)

VIMI: Grounding Video Generation through Multi-modal Instruction
por: Fang, Yuwei, et al.
Publicado: (2024)

Diffusion Priors for Dynamic View Synthesis from Monocular Videos
por: Wang, Chaoyang, et al.
Publicado: (2024)

4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models
por: Yu, Heng, et al.
Publicado: (2024)

SF-V: Single Forward Video Generation Model
por: Zhang, Zhixing, et al.
Publicado: (2024)

Diffusion-DRF: Free, Rich, and Differentiable Reward for Video Diffusion Fine-Tuning
por: Wang, Yifan, et al.
Publicado: (2026)

Dynamic Concepts Personalization from Single Videos
por: Abdal, Rameen, et al.
Publicado: (2025)

H3AE: High Compression, High Speed, and High Quality AutoEncoder for Video Diffusion Models
por: Wu, Yushu, et al.
Publicado: (2025)

Helix4D: Complex 4D Mesh Generation
por: Yenphraphai, Jiraphon, et al.
Publicado: (2026)

Mind the Time: Temporally-Controlled Multi-Event Video Generation
por: Wu, Ziyi, et al.
Publicado: (2024)

GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering
por: Hamdi, Abdullah, et al.
Publicado: (2024)

OmniView: An All-Seeing Diffusion Model for 3D and 4D View Synthesis
por: Fan, Xiang, et al.
Publicado: (2025)

TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks
por: Mai, Jinjie, et al.
Publicado: (2024)

AlphaFlow: Understanding and Improving MeanFlow Models
por: Zhang, Huijie, et al.
Publicado: (2025)

Promptable Game Models: Text-Guided Game Simulation via Masked Diffusion Models
por: Menapace, Willi, et al.
Publicado: (2023)

AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation
por: Haji-Ali, Moayed, et al.
Publicado: (2024)

Pix4Point: Image Pretrained Standard Transformers for 3D Point Cloud Understanding
por: Qian, Guocheng, et al.
Publicado: (2022)

Mind-the-Glitch: Visual Correspondence for Detecting Inconsistencies in Subject-Driven Generation
por: Eldesokey, Abdelrahman, et al.
Publicado: (2025)

Improving Progressive Generation with Decomposable Flow Matching
por: Haji-Ali, Moayed, et al.
Publicado: (2025)

SPAD : Spatially Aware Multiview Diffusers
por: Kant, Yash, et al.
Publicado: (2024)

Improving the Diffusability of Autoencoders
por: Skorokhodov, Ivan, et al.
Publicado: (2025)

AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation
por: Kag, Anil, et al.
Publicado: (2024)

LayerComposer: Multi-Human Personalized Generation via Layered Canvas
por: Qian, Guocheng Gordon, et al.
Publicado: (2025)

Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization
por: Chen, Tsai-Shien, et al.
Publicado: (2025)

Hybrid Structure-from-Motion and Camera Relocalization for Enhanced Egocentric Localization
por: Mai, Jinjie, et al.
Publicado: (2024)

Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
por: Menapace, Willi, et al.
Publicado: (2024)

Can Video Diffusion Model Reconstruct 4D Geometry?
por: Mai, Jinjie, et al.
Publicado: (2025)

T2Bs: Text-to-Character Blendshapes via Video Generation
por: Luo, Jiahao, et al.
Publicado: (2025)

NearID: Identity Representation Learning via Near-identity Distractors
por: Cvejic, Aleksandar, et al.
Publicado: (2026)