:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Yang, Tzun, Teoh Tze, Hern, Lim Wei, Wang, Haonan, Kawaguchi, Kenji
Format:	Preprint
Published:	2023
Subjects:	Multimedia Artificial Intelligence Graphics
Online Access:	https://arxiv.org/abs/2311.12803
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models
by: Zhang, Yang, et al.
Published: (2024)

Think2Sing: Orchestrating Structured Motion Subtitles for Singing-Driven 3D Head Animation
by: Huang, Zikai, et al.
Published: (2025)

VA3: Virtually Assured Amplification Attack on Probabilistic Copyright Protection for Text-to-Image Generative Models
by: Li, Xiang, et al.
Published: (2023)

Multimodal Cinematic Video Synthesis Using Text-to-Image and Audio Generation Models
by: S, Sridhar, et al.
Published: (2025)

KSDiff: Keyframe-Augmented Speech-Aware Dual-Path Diffusion for Facial Animation
by: Lyu, Tianle, et al.
Published: (2025)

Minecraft-ify: Minecraft Style Image Generation with Text-guided Image Editing for In-Game Application
by: Kim, Bumsoo, et al.
Published: (2024)

Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning
by: Girdhar, Rohit, et al.
Published: (2023)

Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data
by: Sun, Zeyi, et al.
Published: (2024)

Narrative-to-Scene Generation: An LLM-Driven Pipeline for 2D Game Environments
by: Chen, Yi-Chun, et al.
Published: (2025)

Extreme Compression of Adaptive Neural Images
by: Hoshikawa, Leo, et al.
Published: (2024)

GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models
by: Zhang, Zewei, et al.
Published: (2024)

FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally
by: Shen, Qiuhong, et al.
Published: (2024)

Instant3D: Instant Text-to-3D Generation
by: Li, Ming, et al.
Published: (2023)

Text Slider: Efficient and Plug-and-Play Continuous Concept Control for Image/Video Synthesis via LoRA Adapters
by: Chiu, Pin-Yen, et al.
Published: (2025)

InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models
by: Hoe, Jiun Tian, et al.
Published: (2023)

d-Sketch: Improving Visual Fidelity of Sketch-to-Image Translation with Pretrained Latent Diffusion Models without Retraining
by: Roy, Prasun, et al.
Published: (2025)

Audio is all in one: speech-driven gesture synthetics using WavLM pre-trained model
by: Zhang, Fan, et al.
Published: (2023)

Seeing World Dynamics in a Nutshell
by: Shen, Qiuhong, et al.
Published: (2025)

Generating Digital Models Using Text-to-3D and Image-to-3D Prompts: Critical Case Study
by: Ziatdinov, Rushan, et al.
Published: (2025)

A Survey on 3D Gaussian Splatting
by: Chen, Guikun, et al.
Published: (2024)

HiSC4D: Human-centered interaction and 4D Scene Capture in Large-scale Space Using Wearable IMUs and LiDAR
by: Dai, Yudi, et al.
Published: (2024)

PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis
by: Xie, Yifan, et al.
Published: (2024)

SAiD: Speech-driven Blendshape Facial Animation with Diffusion
by: Park, Inkyu, et al.
Published: (2023)

ChoreoMuse: Robust Music-to-Dance Video Generation with Style Transfer and Beat-Adherent Motion
by: Wang, Xuanchen, et al.
Published: (2025)

Lester: rotoscope animation through video object segmentation and tracking
by: Tous, Ruben
Published: (2024)

Zero-Shot Visual Deepfake Detection: Can AI Predict and Prevent Fake Content Before It's Created?
by: Sar, Ayan, et al.
Published: (2025)

PediatricsMQA: a Multi-modal Pediatrics Question Answering Benchmark
by: Bahaj, Adil, et al.
Published: (2025)

Freehand Sketch Generation from Mechanical Components
by: Liao, Zhichao, et al.
Published: (2024)

LAV: Audio-Driven Dynamic Visual Generation with Neural Compression and StyleGAN2
by: Jung, Jongmin, et al.
Published: (2025)

Coral Model Generation from Single Images for Virtual Reality Applications
by: Fu, Jie, et al.
Published: (2024)

Towards Real-Time Neural Volumetric Rendering on Mobile Devices: A Measurement Study
by: Wang, Zhe, et al.
Published: (2024)

Squeezing Capacity from Multimodal Large Language Models for Subject-driven Generation
by: Zheng, Shuhong, et al.
Published: (2026)

Towards Unified Co-Speech Gesture Generation via Hierarchical Implicit Periodicity Learning
by: Guo, Xin, et al.
Published: (2025)

DesignAsCode: Bridging Structural Editability and Visual Fidelity in Graphic Design Generation
by: Liu, Ziyuan, et al.
Published: (2026)

VerbDiff: Text-Only Diffusion Models with Enhanced Interaction Awareness
by: Cha, SeungJu, et al.
Published: (2025)

Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation
by: Lin, Jiantao, et al.
Published: (2025)

ToonAging: Face Re-Aging upon Artistic Portrait Style Transfer
by: Kim, Bumsoo, et al.
Published: (2024)

SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation
by: Xu, Xiangyu, et al.
Published: (2024)

Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising
by: Singer, Assaf, et al.
Published: (2025)

Hindi audio-video-Deepfake (HAV-DF): A Hindi language-based Audio-video Deepfake Dataset
by: Kaur, Sukhandeep, et al.
Published: (2024)