:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Pang, Haozhou, Ding, Tianwei, He, Lanshan, Gan, Qi
Format:	Preprint
Published:	2025
Subjects:	Graphics Computation and Language Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2503.09645
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

LLM Gesticulator: Leveraging Large Language Models for Scalable and Controllable Co-Speech Gesture Synthesis
by: Pang, Haozhou, et al.
Published: (2024)

TCDiff++: An End-to-end Trajectory-Controllable Diffusion Model for Harmonious Music-Driven Group Choreography
by: Dai, Yuqin, et al.
Published: (2025)

Lodge++: High-quality and Long Dance Generation with Vivid Choreography Patterns
by: Li, Ronghui, et al.
Published: (2024)

A Study of the Framework and Real-World Applications of Language Embedding for 3D Scene Understanding
by: Zaouali, Mahmoud Chick, et al.
Published: (2025)

Is Your World Simulator a Good Story Presenter? A Consecutive Events-Based Benchmark for Future Long Video Generation
by: Wang, Yiping, et al.
Published: (2024)

Towards Understanding Graphical Perception in Large Multimodal Models
by: Zhang, Kai, et al.
Published: (2025)

SMooGPT: Stylized Motion Generation using Large Language Models
by: Zhong, Lei, et al.
Published: (2025)

SplatFont3D: Structure-Aware Text-to-3D Artistic Font Generation with Part-Level Style Control
by: Gan, Ji, et al.
Published: (2025)

Neural Cone Radiosity for Interactive Global Illumination with Glossy Materials
by: Ren, Jierui, et al.
Published: (2025)

MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh
by: Fang, Shuangkang, et al.
Published: (2025)

FIFA: Unified Faithfulness Evaluation Framework for Text-to-Video and Video-to-Text Generation
by: Jing, Liqiang, et al.
Published: (2025)

Is this chart lying to me? Automating the detection of misleading visualizations
by: Tonglet, Jonathan, et al.
Published: (2025)

TexGS-VolVis: Expressive Scene Editing for Volume Visualization via Textured Gaussian Splatting
by: Tang, Kaiyuan, et al.
Published: (2025)

FlairGPT: Repurposing LLMs for Interior Designs
by: Littlefair, Gabrielle, et al.
Published: (2025)

Co-Layout: LLM-driven Co-optimization for Interior Layout
by: Xiang, Chucheng, et al.
Published: (2025)

ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation
by: Gal, Rinon, et al.
Published: (2024)

T$^3$-S2S: Training-free Triplet Tuning for Sketch to Scene Synthesis in Controllable Concept Art Generation
by: Sun, Zhenhong, et al.
Published: (2024)

CAP: Evaluation of Persuasive and Creative Image Generation
by: Aghazadeh, Aysan, et al.
Published: (2024)

TALK-Act: Enhance Textural-Awareness for 2D Speaking Avatar Reenactment with Diffusion Model
by: Guan, Jiazhi, et al.
Published: (2024)

OccluGaussian: Occlusion-Aware Gaussian Splatting for Large Scene Reconstruction and Rendering
by: Liu, Shiyong, et al.
Published: (2025)

Human-Aware 3D Scene Generation with Spatially-constrained Diffusion Models
by: Hong, Xiaolin, et al.
Published: (2024)

Grounding Language in Multi-Perspective Referential Communication
by: Tang, Zineng, et al.
Published: (2024)

ORACLE: Orchestrate NPC Daily Activities using Contrastive Learning with Transformer-CVAE
by: Hong, Seong-Eun, et al.
Published: (2026)

VLMaterial: Procedural Material Generation with Large Vision-Language Models
by: Li, Beichen, et al.
Published: (2025)

PairingNet: A Learning-based Pair-searching and -matching Network for Image Fragments
by: Zhou, Rixin, et al.
Published: (2023)

A Simplified Positional Cell Type Visualization using Spatially Aggregated Clusters
by: Mason, Lee, et al.
Published: (2024)

Inverse Rendering using Multi-Bounce Path Tracing and Reservoir Sampling
by: Dai, Yuxin, et al.
Published: (2024)

EAG-PT: Emission-Aware Gaussians and Path Tracing for Diffuse Indoor Scene Reconstruction and Editing
by: Yang, Xijie, et al.
Published: (2026)

Image Generation Models: A Technical History
by: Shirvani, Rouzbeh
Published: (2026)

Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models
by: Wu, Ronghuan, et al.
Published: (2024)

PALP: Prompt Aligned Personalization of Text-to-Image Models
by: Arar, Moab, et al.
Published: (2024)

FIT: A Large-Scale Dataset for Fit-Aware Virtual Try-On
by: Karras, Johanna, et al.
Published: (2026)

Real-Time Position-Aware View Synthesis from Single-View Input
by: Gond, Manu, et al.
Published: (2024)

Taking Language Embedded 3D Gaussian Splatting into the Wild
by: Wang, Yuze, et al.
Published: (2025)

LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS
by: Li, Wanhua, et al.
Published: (2025)

FontCLIP: A Semantic Typography Visual-Language Model for Multilingual Font Applications
by: Tatsukawa, Yuki, et al.
Published: (2024)

ArchGPT: Understanding the World's Architectures with Large Multimodal Models
by: Wang, Yuze, et al.
Published: (2025)

CADFS: A Big CAD Program Dataset and Framework for Computer-Aided Design with Large Language Models
by: Pyatov, Vladislav, et al.
Published: (2026)

MAPWise: Evaluating Vision-Language Models for Advanced Map Queries
by: Mukhopadhyay, Srija, et al.
Published: (2024)

Cutscene Agent: An LLM Agent Framework for Automated 3D Cutscene Generation
by: He, Lanshan, et al.
Published: (2026)