Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.07489 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866915333234753536 |
|---|---|
| author | Shi, Yahao Liu, Yang Wu, Yanmin Liu, Xing Zhao, Chen Luo, Jie Zhou, Bin |
| author_facet | Shi, Yahao Liu, Yang Wu, Yanmin Liu, Xing Zhao, Chen Luo, Jie Zhou, Bin |
| contents | We propose DriveAnyMesh, a method for driving mesh guided by monocular video. Current 4D generation techniques encounter challenges with modern rendering engines. Implicit methods have low rendering efficiency and are unfriendly to rasterization-based engines, while skeletal methods demand significant manual effort and lack cross-category generalization. Animating existing 3D assets, instead of creating 4D assets from scratch, demands a deep understanding of the input's 3D structure. To tackle these challenges, we present a 4D diffusion model that denoises sequences of latent sets, which are then decoded to produce mesh animations from point cloud trajectory sequences. These latent sets leverage a transformer-based variational autoencoder, simultaneously capturing 3D shape and motion information. By employing a spatiotemporal, transformer-based diffusion model, information is exchanged across multiple latent frames, enhancing the efficiency and generalization of the generated results. Our experimental results demonstrate that DriveAnyMesh can rapidly produce high-quality animations for complex motions and is compatible with modern rendering engines. This method holds potential for applications in both the gaming and filming industries. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2506_07489 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Drive Any Mesh: 4D Latent Diffusion for Mesh Deformation from Video Shi, Yahao Liu, Yang Wu, Yanmin Liu, Xing Zhao, Chen Luo, Jie Zhou, Bin Computer Vision and Pattern Recognition We propose DriveAnyMesh, a method for driving mesh guided by monocular video. Current 4D generation techniques encounter challenges with modern rendering engines. Implicit methods have low rendering efficiency and are unfriendly to rasterization-based engines, while skeletal methods demand significant manual effort and lack cross-category generalization. Animating existing 3D assets, instead of creating 4D assets from scratch, demands a deep understanding of the input's 3D structure. To tackle these challenges, we present a 4D diffusion model that denoises sequences of latent sets, which are then decoded to produce mesh animations from point cloud trajectory sequences. These latent sets leverage a transformer-based variational autoencoder, simultaneously capturing 3D shape and motion information. By employing a spatiotemporal, transformer-based diffusion model, information is exchanged across multiple latent frames, enhancing the efficiency and generalization of the generated results. Our experimental results demonstrate that DriveAnyMesh can rapidly produce high-quality animations for complex motions and is compatible with modern rendering engines. This method holds potential for applications in both the gaming and filming industries. |
| title | Drive Any Mesh: 4D Latent Diffusion for Mesh Deformation from Video |
| topic | Computer Vision and Pattern Recognition |
| url | https://arxiv.org/abs/2506.07489 |