Saved in:
| Main Authors: | Zhang, Yang, Tzun, Teoh Tze, Hern, Lim Wei, Wang, Haonan, Kawaguchi, Kenji |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2311.12803 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models
by: Zhang, Yang, et al.
Published: (2024)
by: Zhang, Yang, et al.
Published: (2024)
Think2Sing: Orchestrating Structured Motion Subtitles for Singing-Driven 3D Head Animation
by: Huang, Zikai, et al.
Published: (2025)
by: Huang, Zikai, et al.
Published: (2025)
VA3: Virtually Assured Amplification Attack on Probabilistic Copyright Protection for Text-to-Image Generative Models
by: Li, Xiang, et al.
Published: (2023)
by: Li, Xiang, et al.
Published: (2023)
Multimodal Cinematic Video Synthesis Using Text-to-Image and Audio Generation Models
by: S, Sridhar, et al.
Published: (2025)
by: S, Sridhar, et al.
Published: (2025)
KSDiff: Keyframe-Augmented Speech-Aware Dual-Path Diffusion for Facial Animation
by: Lyu, Tianle, et al.
Published: (2025)
by: Lyu, Tianle, et al.
Published: (2025)
Minecraft-ify: Minecraft Style Image Generation with Text-guided Image Editing for In-Game Application
by: Kim, Bumsoo, et al.
Published: (2024)
by: Kim, Bumsoo, et al.
Published: (2024)
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning
by: Girdhar, Rohit, et al.
Published: (2023)
by: Girdhar, Rohit, et al.
Published: (2023)
Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data
by: Sun, Zeyi, et al.
Published: (2024)
by: Sun, Zeyi, et al.
Published: (2024)
Narrative-to-Scene Generation: An LLM-Driven Pipeline for 2D Game Environments
by: Chen, Yi-Chun, et al.
Published: (2025)
by: Chen, Yi-Chun, et al.
Published: (2025)
Extreme Compression of Adaptive Neural Images
by: Hoshikawa, Leo, et al.
Published: (2024)
by: Hoshikawa, Leo, et al.
Published: (2024)
GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models
by: Zhang, Zewei, et al.
Published: (2024)
by: Zhang, Zewei, et al.
Published: (2024)
FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally
by: Shen, Qiuhong, et al.
Published: (2024)
by: Shen, Qiuhong, et al.
Published: (2024)
Instant3D: Instant Text-to-3D Generation
by: Li, Ming, et al.
Published: (2023)
by: Li, Ming, et al.
Published: (2023)
Text Slider: Efficient and Plug-and-Play Continuous Concept Control for Image/Video Synthesis via LoRA Adapters
by: Chiu, Pin-Yen, et al.
Published: (2025)
by: Chiu, Pin-Yen, et al.
Published: (2025)
InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models
by: Hoe, Jiun Tian, et al.
Published: (2023)
by: Hoe, Jiun Tian, et al.
Published: (2023)
d-Sketch: Improving Visual Fidelity of Sketch-to-Image Translation with Pretrained Latent Diffusion Models without Retraining
by: Roy, Prasun, et al.
Published: (2025)
by: Roy, Prasun, et al.
Published: (2025)
Audio is all in one: speech-driven gesture synthetics using WavLM pre-trained model
by: Zhang, Fan, et al.
Published: (2023)
by: Zhang, Fan, et al.
Published: (2023)
Seeing World Dynamics in a Nutshell
by: Shen, Qiuhong, et al.
Published: (2025)
by: Shen, Qiuhong, et al.
Published: (2025)
Generating Digital Models Using Text-to-3D and Image-to-3D Prompts: Critical Case Study
by: Ziatdinov, Rushan, et al.
Published: (2025)
by: Ziatdinov, Rushan, et al.
Published: (2025)
A Survey on 3D Gaussian Splatting
by: Chen, Guikun, et al.
Published: (2024)
by: Chen, Guikun, et al.
Published: (2024)
HiSC4D: Human-centered interaction and 4D Scene Capture in Large-scale Space Using Wearable IMUs and LiDAR
by: Dai, Yudi, et al.
Published: (2024)
by: Dai, Yudi, et al.
Published: (2024)
PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis
by: Xie, Yifan, et al.
Published: (2024)
by: Xie, Yifan, et al.
Published: (2024)
SAiD: Speech-driven Blendshape Facial Animation with Diffusion
by: Park, Inkyu, et al.
Published: (2023)
by: Park, Inkyu, et al.
Published: (2023)
ChoreoMuse: Robust Music-to-Dance Video Generation with Style Transfer and Beat-Adherent Motion
by: Wang, Xuanchen, et al.
Published: (2025)
by: Wang, Xuanchen, et al.
Published: (2025)
Lester: rotoscope animation through video object segmentation and tracking
by: Tous, Ruben
Published: (2024)
by: Tous, Ruben
Published: (2024)
Zero-Shot Visual Deepfake Detection: Can AI Predict and Prevent Fake Content Before It's Created?
by: Sar, Ayan, et al.
Published: (2025)
by: Sar, Ayan, et al.
Published: (2025)
PediatricsMQA: a Multi-modal Pediatrics Question Answering Benchmark
by: Bahaj, Adil, et al.
Published: (2025)
by: Bahaj, Adil, et al.
Published: (2025)
Freehand Sketch Generation from Mechanical Components
by: Liao, Zhichao, et al.
Published: (2024)
by: Liao, Zhichao, et al.
Published: (2024)
LAV: Audio-Driven Dynamic Visual Generation with Neural Compression and StyleGAN2
by: Jung, Jongmin, et al.
Published: (2025)
by: Jung, Jongmin, et al.
Published: (2025)
Coral Model Generation from Single Images for Virtual Reality Applications
by: Fu, Jie, et al.
Published: (2024)
by: Fu, Jie, et al.
Published: (2024)
Towards Real-Time Neural Volumetric Rendering on Mobile Devices: A Measurement Study
by: Wang, Zhe, et al.
Published: (2024)
by: Wang, Zhe, et al.
Published: (2024)
Squeezing Capacity from Multimodal Large Language Models for Subject-driven Generation
by: Zheng, Shuhong, et al.
Published: (2026)
by: Zheng, Shuhong, et al.
Published: (2026)
Towards Unified Co-Speech Gesture Generation via Hierarchical Implicit Periodicity Learning
by: Guo, Xin, et al.
Published: (2025)
by: Guo, Xin, et al.
Published: (2025)
DesignAsCode: Bridging Structural Editability and Visual Fidelity in Graphic Design Generation
by: Liu, Ziyuan, et al.
Published: (2026)
by: Liu, Ziyuan, et al.
Published: (2026)
VerbDiff: Text-Only Diffusion Models with Enhanced Interaction Awareness
by: Cha, SeungJu, et al.
Published: (2025)
by: Cha, SeungJu, et al.
Published: (2025)
Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation
by: Lin, Jiantao, et al.
Published: (2025)
by: Lin, Jiantao, et al.
Published: (2025)
ToonAging: Face Re-Aging upon Artistic Portrait Style Transfer
by: Kim, Bumsoo, et al.
Published: (2024)
by: Kim, Bumsoo, et al.
Published: (2024)
SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation
by: Xu, Xiangyu, et al.
Published: (2024)
by: Xu, Xiangyu, et al.
Published: (2024)
Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising
by: Singer, Assaf, et al.
Published: (2025)
by: Singer, Assaf, et al.
Published: (2025)
Hindi audio-video-Deepfake (HAV-DF): A Hindi language-based Audio-video Deepfake Dataset
by: Kaur, Sukhandeep, et al.
Published: (2024)
by: Kaur, Sukhandeep, et al.
Published: (2024)
Similar Items
-
Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models
by: Zhang, Yang, et al.
Published: (2024) -
Think2Sing: Orchestrating Structured Motion Subtitles for Singing-Driven 3D Head Animation
by: Huang, Zikai, et al.
Published: (2025) -
VA3: Virtually Assured Amplification Attack on Probabilistic Copyright Protection for Text-to-Image Generative Models
by: Li, Xiang, et al.
Published: (2023) -
Multimodal Cinematic Video Synthesis Using Text-to-Image and Audio Generation Models
by: S, Sridhar, et al.
Published: (2025) -
KSDiff: Keyframe-Augmented Speech-Aware Dual-Path Diffusion for Facial Animation
by: Lyu, Tianle, et al.
Published: (2025)