:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xiao, Hongcan, Xiao, Xinyue, Wang, Yilin, Zhang, Yue, Qi, Yonggang
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2604.08042
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

StickMotion: Generating 3D Human Motions by Drawing a Stickman
by: Wang, Tao, et al.
Published: (2025)

ShadowDraw: From Any Object to Shadow-Drawing Compositional Art
by: Luo, Rundong, et al.
Published: (2025)

VS-LLM: Visual-Semantic Depression Assessment based on LLM for Drawing Projection Test
by: Wu, Meiqi, et al.
Published: (2025)

From Drawings to Decisions: A Hybrid Vision-Language Framework for Parsing 2D Engineering Drawings into Structured Manufacturing Knowledge
by: Khan, Muhammad Tayyab, et al.
Published: (2025)

Text-Enhanced Panoptic Symbol Spotting in CAD Drawings
by: Liu, Xianlin, et al.
Published: (2025)

ViRED: Prediction of Visual Relations in Engineering Drawings
by: Gu, Chao, et al.
Published: (2024)

SridBench: Benchmark of Scientific Research Illustration Drawing of Image Generation Model
by: Chang, Yifan, et al.
Published: (2025)

MechVQA: Benchmarking and Enhancing Multimodal LLMs on Comprehensive Mechanical Drawing Understanding
by: Kou, Qian, et al.
Published: (2026)

Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
by: Wu, Junfei, et al.
Published: (2025)

DrawMotion: Generating 3D Human Motions by Freehand Drawing
by: Wang, Tao, et al.
Published: (2026)

Fine-Tuning Vision-Language Model for Automated Engineering Drawing Information Extraction
by: Khan, Muhammad Tayyab, et al.
Published: (2024)

PhyDrawGen: Physically Grounded Diagram Generation from Natural Language
by: Haque, Nafiul, et al.
Published: (2026)

Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing
by: Zeng, Ziyun, et al.
Published: (2025)

DrawVideo: Generating Long Video from Storyboard Keyframe Sketches
by: Xu, Chuanzhi, et al.
Published: (2026)

The Image Reconstruction Game: Drawing Common Ground Through Iterative Multimodal Dialogue
by: Hakimov, Sherzod, et al.
Published: (2026)

Few Channels Draw The Whole Picture: Revealing Massive Activations in Diffusion Transformers
by: Turri, Evelyn, et al.
Published: (2026)

PCEvE: Part Contribution Evaluation Based Model Explanation for Human Figure Drawing Assessment and Beyond
by: Lee, Jongseo, et al.
Published: (2024)

Think-Before-Draw: Decomposing Emotion Semantics & Fine-Grained Controllable Expressive Talking Head Generation
by: Shi, Hanlei, et al.
Published: (2025)

Generating Sketches in a Hierarchical Auto-Regressive Process for Flexible Sketch Drawing Manipulation at Stroke-Level
by: Zang, Sicong, et al.
Published: (2025)

PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion
by: Lu, Guansong, et al.
Published: (2023)

Drawing the Line: Deep Segmentation for Extracting Art from Ancient Etruscan Mirrors
by: Sterzinger, Rafael, et al.
Published: (2024)

Automated Parsing of Engineering Drawings for Structured Information Extraction Using a Fine-tuned Document Understanding Transformer
by: Khan, Muhammad Tayyab, et al.
Published: (2025)

PyPotteryInk: One-Step Diffusion Model for Sketch to Publication-ready Archaeological Drawings
by: Cardarelli, Lorenzo
Published: (2025)

Pencils to Pixels: A Systematic Study of Creative Drawings across Children, Adults and AI
by: Nath, Surabhi S, et al.
Published: (2025)

How to Enable LLM with 3D Capacity? A Survey of Spatial Reasoning in LLM
by: Zha, Jirong, et al.
Published: (2025)

Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models
by: Kim, Hyungjin, et al.
Published: (2025)

Spatial 3D-LLM: Exploring Spatial Awareness in 3D Vision-Language Models
by: Wang, Xiaoyan, et al.
Published: (2025)

3D-Agent:Tri-Modal Multi-Agent Collaboration for Scalable 3D Object Annotation
by: Zhang, Jusheng, et al.
Published: (2026)

3D-VCD: Hallucination Mitigation in 3D-LLM Embodied Agents through Visual Contrastive Decoding
by: Ogunleye, Makanjuola, et al.
Published: (2026)

RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding
by: Yang, Jihan, et al.
Published: (2023)

Is Contrastive Distillation Enough for Learning Comprehensive 3D Representations?
by: Zhang, Yifan, et al.
Published: (2024)

Semantic Aware Feature Extraction for Enhanced 3D Reconstruction
by: Nap, Ronald, et al.
Published: (2026)

Speed3R: Sparse Feed-forward 3D Reconstruction Models
by: Ren, Weining, et al.
Published: (2026)

Real-Time Intuitive AI Drawing System for Collaboration: Enhancing Human Creativity through Formal and Contextual Intent Integration
by: Song, Jookyung, et al.
Published: (2025)

A Multi-Stage Hybrid Framework for Automated Interpretation of Multi-View Engineering Drawings Using Vision Language Model
by: Khan, Muhammad Tayyab, et al.
Published: (2025)

Wonder3D++: Cross-domain Diffusion for High-fidelity 3D Generation from a Single Image
by: Yang, Yuxiao, et al.
Published: (2025)

MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration
by: Ding, Yanbo, et al.
Published: (2024)

Spatio-Temporal Token Pruning for Efficient High-Resolution GUI Agents
by: Xu, Zhou, et al.
Published: (2026)

TGBFormer: Transformer-GraphFormer Blender Network for Video Object Detection
by: Qi, Qiang, et al.
Published: (2025)

Hyperbolic Contrastive Learning for Hierarchical 3D Point Cloud Embedding
by: Liu, Yingjie, et al.
Published: (2025)