Saved in:
| Main Authors: | Mai, Jinjie, Zhu, Wenxuan, Liu, Haozhe, Li, Bing, Zheng, Cheng, Schmidhuber, Jürgen, Ghanem, Bernard |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.21082 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Vivid-ZOO: Multi-View Video Generation with Diffusion Model
by: Li, Bing, et al.
Published: (2024)
by: Li, Bing, et al.
Published: (2024)
Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable
by: Liu, Haozhe, et al.
Published: (2024)
by: Liu, Haozhe, et al.
Published: (2024)
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding
by: Zhu, Wenxuan, et al.
Published: (2025)
by: Zhu, Wenxuan, et al.
Published: (2025)
TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks
by: Mai, Jinjie, et al.
Published: (2024)
by: Mai, Jinjie, et al.
Published: (2024)
Hybrid Structure-from-Motion and Camera Relocalization for Enhanced Egocentric Localization
by: Mai, Jinjie, et al.
Published: (2024)
by: Mai, Jinjie, et al.
Published: (2024)
Dynamically Masked Discriminator for Generative Adversarial Networks
by: Zhang, Wentian, et al.
Published: (2023)
by: Zhang, Wentian, et al.
Published: (2023)
EasyV2V: A High-quality Instruction-based Video Editing Framework
by: Mai, Jinjie, et al.
Published: (2025)
by: Mai, Jinjie, et al.
Published: (2025)
PointDico: Contrastive 3D Representation Learning Guided by Diffusion Models
by: Li, Pengbo, et al.
Published: (2025)
by: Li, Pengbo, et al.
Published: (2025)
Faster Diffusion via Temporal Attention Decomposition
by: Liu, Haozhe, et al.
Published: (2024)
by: Liu, Haozhe, et al.
Published: (2024)
BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding
by: Liu, Shuming, et al.
Published: (2025)
by: Liu, Shuming, et al.
Published: (2025)
Video Self-Stitching Graph Network for Temporal Action Localization
by: Zhao, Chen, et al.
Published: (2020)
by: Zhao, Chen, et al.
Published: (2020)
GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering
by: Hamdi, Abdullah, et al.
Published: (2024)
by: Hamdi, Abdullah, et al.
Published: (2024)
Can Video Diffusion Models Predict Past Frames? Bidirectional Cycle Consistency for Reversible Interpolation
by: Liu, Lingyu, et al.
Published: (2026)
by: Liu, Lingyu, et al.
Published: (2026)
GeoDiff4D: Geometry-Aware Diffusion for 4D Head Avatar Reconstruction
by: Xu, Chao, et al.
Published: (2026)
by: Xu, Chao, et al.
Published: (2026)
Pix4Point: Image Pretrained Standard Transformers for 3D Point Cloud Understanding
by: Qian, Guocheng, et al.
Published: (2022)
by: Qian, Guocheng, et al.
Published: (2022)
Test-Time Adaptation for Combating Missing Modalities in Egocentric Videos
by: Ramazanova, Merey, et al.
Published: (2024)
by: Ramazanova, Merey, et al.
Published: (2024)
RoboTransfer: Controllable Geometry-Consistent Video Diffusion for Manipulation Policy Transfer
by: Liu, Liu, et al.
Published: (2025)
by: Liu, Liu, et al.
Published: (2025)
Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image
by: Zhang, Yanran, et al.
Published: (2025)
by: Zhang, Yanran, et al.
Published: (2025)
MarDini: Masked Autoregressive Diffusion for Video Generation at Scale
by: Liu, Haozhe, et al.
Published: (2024)
by: Liu, Haozhe, et al.
Published: (2024)
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
by: Thoker, Fida Mohammad, et al.
Published: (2025)
by: Thoker, Fida Mohammad, et al.
Published: (2025)
OmniResponse: Online Multimodal Conversational Response Generation in Dyadic Interactions
by: Luo, Cheng, et al.
Published: (2025)
by: Luo, Cheng, et al.
Published: (2025)
VidSplat: Gaussian Splatting Reconstruction with Geometry-Guided Video Diffusion Priors
by: Tang, Jimin, et al.
Published: (2026)
by: Tang, Jimin, et al.
Published: (2026)
GeoQuery: Geometry-Query Diffusion for Sparse-View Reconstruction
by: Cao, Xiao, et al.
Published: (2026)
by: Cao, Xiao, et al.
Published: (2026)
Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture
by: Li, Xuanchen, et al.
Published: (2024)
by: Li, Xuanchen, et al.
Published: (2024)
TrackMAE: Video Representation Learning via Track Mask and Predict
by: Vandeghen, Renaud, et al.
Published: (2026)
by: Vandeghen, Renaud, et al.
Published: (2026)
Compositional Generative Model of Unbounded 4D Cities
by: Xie, Haozhe, et al.
Published: (2025)
by: Xie, Haozhe, et al.
Published: (2025)
MoCA-Video: Motion-Aware Concept Alignment for Consistent Video Editing
by: Zhang, Tong, et al.
Published: (2025)
by: Zhang, Tong, et al.
Published: (2025)
Investigating Event-Based Cameras for Video Frame Interpolation in Sports
by: Deckyvere, Antoine, et al.
Published: (2024)
by: Deckyvere, Antoine, et al.
Published: (2024)
VGGT4D: Mining Motion Cues in Visual Geometry Transformers for 4D Scene Reconstruction
by: Hu, Yu, et al.
Published: (2025)
by: Hu, Yu, et al.
Published: (2025)
SoccerNet-Caption: Dense Video Captioning for Soccer Broadcasts Commentaries
by: Mkhallati, Hassan, et al.
Published: (2023)
by: Mkhallati, Hassan, et al.
Published: (2023)
3D-JEPA: A Joint Embedding Predictive Architecture for 3D Self-Supervised Representation Learning
by: Hu, Naiwen, et al.
Published: (2024)
by: Hu, Naiwen, et al.
Published: (2024)
4Diffusion: Multi-view Video Diffusion Model for 4D Generation
by: Zhang, Haiyu, et al.
Published: (2024)
by: Zhang, Haiyu, et al.
Published: (2024)
GaussFusion: Improving 3D Reconstruction in the Wild with A Geometry-Informed Video Generator
by: Zhu, Liyuan, et al.
Published: (2026)
by: Zhu, Liyuan, et al.
Published: (2026)
WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion
by: Kong, Hanyang, et al.
Published: (2025)
by: Kong, Hanyang, et al.
Published: (2025)
GVGS: Gaussian Visibility-Aware Multi-View Geometry for Accurate Surface Reconstruction
by: Su, Mai, et al.
Published: (2026)
by: Su, Mai, et al.
Published: (2026)
4DEquine: Disentangling Motion and Appearance for 4D Equine Reconstruction from Monocular Video
by: Lyu, Jin, et al.
Published: (2026)
by: Lyu, Jin, et al.
Published: (2026)
Phased One-Step Adversarial Equilibrium for Video Diffusion Models
by: Cheng, Jiaxiang, et al.
Published: (2025)
by: Cheng, Jiaxiang, et al.
Published: (2025)
Continual Learning on a Diet: Learning from Sparsely Labeled Streams Under Constrained Computation
by: Zhang, Wenxuan, et al.
Published: (2024)
by: Zhang, Wenxuan, et al.
Published: (2024)
MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction
by: Li, Haitian, et al.
Published: (2026)
by: Li, Haitian, et al.
Published: (2026)
Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction
by: Zhu, Lin, et al.
Published: (2024)
by: Zhu, Lin, et al.
Published: (2024)
Similar Items
-
Vivid-ZOO: Multi-View Video Generation with Diffusion Model
by: Li, Bing, et al.
Published: (2024) -
Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable
by: Liu, Haozhe, et al.
Published: (2024) -
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding
by: Zhu, Wenxuan, et al.
Published: (2025) -
TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks
by: Mai, Jinjie, et al.
Published: (2024) -
Hybrid Structure-from-Motion and Camera Relocalization for Enhanced Egocentric Localization
by: Mai, Jinjie, et al.
Published: (2024)