Saved in:
| Main Authors: | Pang, Haozhou, Ding, Tianwei, He, Lanshan, Gan, Qi |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.09645 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
LLM Gesticulator: Leveraging Large Language Models for Scalable and Controllable Co-Speech Gesture Synthesis
by: Pang, Haozhou, et al.
Published: (2024)
by: Pang, Haozhou, et al.
Published: (2024)
TCDiff++: An End-to-end Trajectory-Controllable Diffusion Model for Harmonious Music-Driven Group Choreography
by: Dai, Yuqin, et al.
Published: (2025)
by: Dai, Yuqin, et al.
Published: (2025)
Lodge++: High-quality and Long Dance Generation with Vivid Choreography Patterns
by: Li, Ronghui, et al.
Published: (2024)
by: Li, Ronghui, et al.
Published: (2024)
A Study of the Framework and Real-World Applications of Language Embedding for 3D Scene Understanding
by: Zaouali, Mahmoud Chick, et al.
Published: (2025)
by: Zaouali, Mahmoud Chick, et al.
Published: (2025)
Is Your World Simulator a Good Story Presenter? A Consecutive Events-Based Benchmark for Future Long Video Generation
by: Wang, Yiping, et al.
Published: (2024)
by: Wang, Yiping, et al.
Published: (2024)
Towards Understanding Graphical Perception in Large Multimodal Models
by: Zhang, Kai, et al.
Published: (2025)
by: Zhang, Kai, et al.
Published: (2025)
SMooGPT: Stylized Motion Generation using Large Language Models
by: Zhong, Lei, et al.
Published: (2025)
by: Zhong, Lei, et al.
Published: (2025)
SplatFont3D: Structure-Aware Text-to-3D Artistic Font Generation with Part-Level Style Control
by: Gan, Ji, et al.
Published: (2025)
by: Gan, Ji, et al.
Published: (2025)
Neural Cone Radiosity for Interactive Global Illumination with Glossy Materials
by: Ren, Jierui, et al.
Published: (2025)
by: Ren, Jierui, et al.
Published: (2025)
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh
by: Fang, Shuangkang, et al.
Published: (2025)
by: Fang, Shuangkang, et al.
Published: (2025)
FIFA: Unified Faithfulness Evaluation Framework for Text-to-Video and Video-to-Text Generation
by: Jing, Liqiang, et al.
Published: (2025)
by: Jing, Liqiang, et al.
Published: (2025)
Is this chart lying to me? Automating the detection of misleading visualizations
by: Tonglet, Jonathan, et al.
Published: (2025)
by: Tonglet, Jonathan, et al.
Published: (2025)
TexGS-VolVis: Expressive Scene Editing for Volume Visualization via Textured Gaussian Splatting
by: Tang, Kaiyuan, et al.
Published: (2025)
by: Tang, Kaiyuan, et al.
Published: (2025)
FlairGPT: Repurposing LLMs for Interior Designs
by: Littlefair, Gabrielle, et al.
Published: (2025)
by: Littlefair, Gabrielle, et al.
Published: (2025)
Co-Layout: LLM-driven Co-optimization for Interior Layout
by: Xiang, Chucheng, et al.
Published: (2025)
by: Xiang, Chucheng, et al.
Published: (2025)
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation
by: Gal, Rinon, et al.
Published: (2024)
by: Gal, Rinon, et al.
Published: (2024)
T$^3$-S2S: Training-free Triplet Tuning for Sketch to Scene Synthesis in Controllable Concept Art Generation
by: Sun, Zhenhong, et al.
Published: (2024)
by: Sun, Zhenhong, et al.
Published: (2024)
CAP: Evaluation of Persuasive and Creative Image Generation
by: Aghazadeh, Aysan, et al.
Published: (2024)
by: Aghazadeh, Aysan, et al.
Published: (2024)
TALK-Act: Enhance Textural-Awareness for 2D Speaking Avatar Reenactment with Diffusion Model
by: Guan, Jiazhi, et al.
Published: (2024)
by: Guan, Jiazhi, et al.
Published: (2024)
OccluGaussian: Occlusion-Aware Gaussian Splatting for Large Scene Reconstruction and Rendering
by: Liu, Shiyong, et al.
Published: (2025)
by: Liu, Shiyong, et al.
Published: (2025)
Human-Aware 3D Scene Generation with Spatially-constrained Diffusion Models
by: Hong, Xiaolin, et al.
Published: (2024)
by: Hong, Xiaolin, et al.
Published: (2024)
Grounding Language in Multi-Perspective Referential Communication
by: Tang, Zineng, et al.
Published: (2024)
by: Tang, Zineng, et al.
Published: (2024)
ORACLE: Orchestrate NPC Daily Activities using Contrastive Learning with Transformer-CVAE
by: Hong, Seong-Eun, et al.
Published: (2026)
by: Hong, Seong-Eun, et al.
Published: (2026)
VLMaterial: Procedural Material Generation with Large Vision-Language Models
by: Li, Beichen, et al.
Published: (2025)
by: Li, Beichen, et al.
Published: (2025)
PairingNet: A Learning-based Pair-searching and -matching Network for Image Fragments
by: Zhou, Rixin, et al.
Published: (2023)
by: Zhou, Rixin, et al.
Published: (2023)
A Simplified Positional Cell Type Visualization using Spatially Aggregated Clusters
by: Mason, Lee, et al.
Published: (2024)
by: Mason, Lee, et al.
Published: (2024)
Inverse Rendering using Multi-Bounce Path Tracing and Reservoir Sampling
by: Dai, Yuxin, et al.
Published: (2024)
by: Dai, Yuxin, et al.
Published: (2024)
EAG-PT: Emission-Aware Gaussians and Path Tracing for Diffuse Indoor Scene Reconstruction and Editing
by: Yang, Xijie, et al.
Published: (2026)
by: Yang, Xijie, et al.
Published: (2026)
Image Generation Models: A Technical History
by: Shirvani, Rouzbeh
Published: (2026)
by: Shirvani, Rouzbeh
Published: (2026)
Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models
by: Wu, Ronghuan, et al.
Published: (2024)
by: Wu, Ronghuan, et al.
Published: (2024)
PALP: Prompt Aligned Personalization of Text-to-Image Models
by: Arar, Moab, et al.
Published: (2024)
by: Arar, Moab, et al.
Published: (2024)
FIT: A Large-Scale Dataset for Fit-Aware Virtual Try-On
by: Karras, Johanna, et al.
Published: (2026)
by: Karras, Johanna, et al.
Published: (2026)
Real-Time Position-Aware View Synthesis from Single-View Input
by: Gond, Manu, et al.
Published: (2024)
by: Gond, Manu, et al.
Published: (2024)
Taking Language Embedded 3D Gaussian Splatting into the Wild
by: Wang, Yuze, et al.
Published: (2025)
by: Wang, Yuze, et al.
Published: (2025)
LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS
by: Li, Wanhua, et al.
Published: (2025)
by: Li, Wanhua, et al.
Published: (2025)
FontCLIP: A Semantic Typography Visual-Language Model for Multilingual Font Applications
by: Tatsukawa, Yuki, et al.
Published: (2024)
by: Tatsukawa, Yuki, et al.
Published: (2024)
ArchGPT: Understanding the World's Architectures with Large Multimodal Models
by: Wang, Yuze, et al.
Published: (2025)
by: Wang, Yuze, et al.
Published: (2025)
CADFS: A Big CAD Program Dataset and Framework for Computer-Aided Design with Large Language Models
by: Pyatov, Vladislav, et al.
Published: (2026)
by: Pyatov, Vladislav, et al.
Published: (2026)
MAPWise: Evaluating Vision-Language Models for Advanced Map Queries
by: Mukhopadhyay, Srija, et al.
Published: (2024)
by: Mukhopadhyay, Srija, et al.
Published: (2024)
Cutscene Agent: An LLM Agent Framework for Automated 3D Cutscene Generation
by: He, Lanshan, et al.
Published: (2026)
by: He, Lanshan, et al.
Published: (2026)
Similar Items
-
LLM Gesticulator: Leveraging Large Language Models for Scalable and Controllable Co-Speech Gesture Synthesis
by: Pang, Haozhou, et al.
Published: (2024) -
TCDiff++: An End-to-end Trajectory-Controllable Diffusion Model for Harmonious Music-Driven Group Choreography
by: Dai, Yuqin, et al.
Published: (2025) -
Lodge++: High-quality and Long Dance Generation with Vivid Choreography Patterns
by: Li, Ronghui, et al.
Published: (2024) -
A Study of the Framework and Real-World Applications of Language Embedding for 3D Scene Understanding
by: Zaouali, Mahmoud Chick, et al.
Published: (2025) -
Is Your World Simulator a Good Story Presenter? A Consecutive Events-Based Benchmark for Future Long Video Generation
by: Wang, Yiping, et al.
Published: (2024)