:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wu, Yuheng, Gao, Xiangbo, Chen, Tianhao, Chen, Xinghao, Yin, Qing, Tu, Zhengzhong, Lee, Dongman
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Graphics Multimedia
Online Access:	https://arxiv.org/abs/2605.14382
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing
by: Lionar, Stefan, et al.
Published: (2025)

Cert-LAS: Toward Certified Model Ownership Verification for Text-to-Image Diffusion Models via Layer-Adaptive Smoothing
by: Qi, Leyi, et al.
Published: (2026)

Unison: Harmonizing Motion, Speech, and Sound for Human-Centric Audio-Video Generation
by: Cheng, Shihao, et al.
Published: (2026)

Kubrick: Multimodal Agent Collaborations for Synthetic Video Generation
by: He, Liu, et al.
Published: (2024)

Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation
by: Lin, Jiantao, et al.
Published: (2025)

VerbDiff: Text-Only Diffusion Models with Enhanced Interaction Awareness
by: Cha, SeungJu, et al.
Published: (2025)

AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers
by: Guan, Jiazhi, et al.
Published: (2025)

Casual3DHDR: Deblurring High Dynamic Range 3D Gaussian Splatting from Casually Captured Videos
by: Gong, Shucheng, et al.
Published: (2025)

InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models
by: Hoe, Jiun Tian, et al.
Published: (2023)

InteractEdit: Zero-Shot Editing of Human-Object Interactions in Images
by: Hoe, Jiun Tian, et al.
Published: (2025)

STAR: Skeleton-aware Text-based 4D Avatar Generation with In-Network Motion Retargeting
by: Chai, Zenghao, et al.
Published: (2024)

Unveiling Deep Shadows: A Survey and Benchmark on Image and Video Shadow Detection, Removal, and Generation in the Deep Learning Era
by: Hu, Xiaowei, et al.
Published: (2024)

DreamCinema: Cinematic Transfer with Free Camera and 3D Character
by: Chen, Weiliang, et al.
Published: (2024)

Representing Long Volumetric Video with Temporal Gaussian Hierarchy
by: Xu, Zhen, et al.
Published: (2024)

FairyGen: Storied Cartoon Video from a Single Child-Drawn Character
by: Zheng, Jiayi, et al.
Published: (2025)

Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning
by: Girdhar, Rohit, et al.
Published: (2023)

Neural Network-Based Tracking and 3D Reconstruction of Baseball Pitch Trajectories from Single-View 2D Video
by: Hsieh, Jhen
Published: (2024)

EditYourself: Audio-Driven Generation and Manipulation of Talking Head Videos with Diffusion Transformers
by: Flynn, John, et al.
Published: (2026)

MusicScore: A Dataset for Music Score Modeling and Generation
by: Lin, Yuheng, et al.
Published: (2024)

Sound Sparks Motion: Audio and Text Tuning for Video Editing
by: Razlighi, AmirHossein Naghi, et al.
Published: (2026)

ImagenHub: Standardizing the evaluation of conditional image generation models
by: Ku, Max, et al.
Published: (2023)

HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation
by: Dong, Wenqi, et al.
Published: (2025)

ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer
by: Guan, Jiazhi, et al.
Published: (2024)

PersonaGest: Personalized Co-Speech Gesture Generation with Semantic-Guided Hierarchical Motion Representation
by: Zhao, Junchuan, et al.
Published: (2026)

Laplacian Analysis Meets Dynamics Modelling: Gaussian Splatting for 4D Reconstruction
by: Zhou, Yifan, et al.
Published: (2025)

SVGS: Enhancing Gaussian Splatting Using Primitives with Spatially Varying Colors
by: Xu, Rui, et al.
Published: (2024)

DrawVideo: Generating Long Video from Storyboard Keyframe Sketches
by: Xu, Chuanzhi, et al.
Published: (2026)

Break-for-Make: Modular Low-Rank Adaptations for Composable Content-Style Customization
by: Xu, Yu, et al.
Published: (2024)

MesonGS++: Post-training Compression of 3D Gaussian Splatting with Hyperparameter Searching
by: Xie, Shuzhao, et al.
Published: (2026)

SIG-Chat: Spatial Intent-Guided Conversational Gesture Generation Involving How, When and Where
by: Huang, Yiheng, et al.
Published: (2025)

ChoreoMuse: Robust Music-to-Dance Video Generation with Style Transfer and Beat-Adherent Motion
by: Wang, Xuanchen, et al.
Published: (2025)

MDD: A Dataset for Text-and-Music Conditioned Duet Dance Generation
by: Gupta, Prerit, et al.
Published: (2025)

A Survey on 3D Gaussian Splatting
by: Chen, Guikun, et al.
Published: (2024)

DanceEditor: Towards Iterative Editable Music-driven Dance Generation with Open-Vocabulary Descriptions
by: Zhang, Hengyuan, et al.
Published: (2025)

SpA2V: Harnessing Spatial Auditory Cues for Audio-driven Spatially-aware Video Generation
by: Pham, Kien T., et al.
Published: (2025)

Improving Generative Adversarial Network Generalization for Facial Expression Synthesis
by: Akram, Arbish, et al.
Published: (2026)

Perceive-Sample-Compress: Towards Real-Time 3D Gaussian Splatting
by: Wang, Zijian, et al.
Published: (2025)

Splatography: Sparse multi-view dynamic Gaussian Splatting for filmmaking challenges
by: Azzarelli, Adrian, et al.
Published: (2025)

altiro3D: Scene representation from single image and novel view synthesis
by: Canessa, E., et al.
Published: (2023)

Exploring Palette based Color Guidance in Diffusion Models
by: Qiu, Qianru, et al.
Published: (2025)