:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Yixin, Hou, Yunzhong, Li, Longqi, Qin, Zhenyue, Liu, Yang, Yao, Yue
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2604.05933
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

HandCraft: Anatomically Correct Restoration of Malformed Hands in Diffusion Generated Images
by: Qin, Zhenyue, et al.
Published: (2024)

Learning Camera Movement Control from Real-World Drone Videos
by: Hou, Yunzhong, et al.
Published: (2024)

Decision-Driven Semantic Object Exploration for Legged Robots via Confidence-Calibrated Perception and Topological Subgoal Selection
by: Zhao, Guoyang, et al.
Published: (2025)

Authentic Emotion Mapping: Benchmarking Facial Expressions in Real News
by: Zhang, Qixuan, et al.
Published: (2024)

Mind the Rarities: Can Rare Skin Diseases Be Reliably Diagnosed via Diagnostic Reasoning?
by: Liu, Yang, et al.
Published: (2026)

Visual Prompting in LLMs for Enhancing Emotion Recognition
by: Zhang, Qixuan, et al.
Published: (2024)

ActFormer: Scalable Collaborative Perception via Active Queries
by: Huang, Suozhi, et al.
Published: (2024)

MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception
by: Qin, Yiran, et al.
Published: (2023)

Pursuing Minimal Sufficiency in Spatial Reasoning
by: Guo, Yejie, et al.
Published: (2025)

Towards High-Fidelity CAD Generation via LLM-Driven Program Generation and Text-Based B-Rep Primitive Grounding
by: Li, Jiahao, et al.
Published: (2026)

Plane Geometry Problem Solving with Multi-modal Reasoning: A Survey
by: Cho, Seunghyuk, et al.
Published: (2025)

GeoDANO: Geometric VLM with Domain Agnostic Vision Encoder
by: Cho, Seunghyuk, et al.
Published: (2025)

HeatV2X: Scalable Heterogeneous Collaborative Perception via Efficient Alignment and Interaction
by: Zhao, Yueran, et al.
Published: (2025)

Active Visual Perception: Opportunities and Challenges
by: Li, Yian, et al.
Published: (2025)

SCSA: Exploring the Synergistic Effects Between Spatial and Channel Attention
by: Si, Yunzhong, et al.
Published: (2024)

ReCAD: Reinforcement Learning Enhanced Parametric CAD Model Generation with Vision-Language Models
by: Li, Jiahao, et al.
Published: (2025)

Effective Training Data Synthesis for Improving MLLM Chart Understanding
by: Yang, Yuwei, et al.
Published: (2025)

ADAptation: Reconstruction-based Unsupervised Active Learning for Breast Ultrasound Diagnosis
by: Duan, Yaofei, et al.
Published: (2025)

JAQ: Joint Efficient Architecture Design and Low-Bit Quantization with Hardware-Software Co-Exploration
by: Wang, Mingzi, et al.
Published: (2025)

Evaluating Time Awareness and Cross-modal Active Perception of Large Models via 4D Escape Room Task
by: Dong, Yurui, et al.
Published: (2026)

Look-Closer-Then-Diagnose: Confidence-Aware Ultrasound VQA via Active Zooming
by: Zhou, Yue, et al.
Published: (2026)

Perceive, Verify and Understand Long Video: Multi-Granular Perception and Active Verification via Interactive Agents
by: Li, Jiahua, et al.
Published: (2025)

Seek-CAD: A Self-refined Generative Modeling for 3D Parametric CAD Using Local Inference via DeepSeek
by: Li, Xueyang, et al.
Published: (2025)

UMind-VL: A Generalist Ultrasound Vision-Language Model for Unified Grounded Perception and Comprehensive Interpretation
by: Chen, Dengbo, et al.
Published: (2025)

Extreme Amodal Face Detection
by: Song, Changlin, et al.
Published: (2025)

Scaling Laws for Deepfake Detection
by: Wang, Wenhao, et al.
Published: (2025)

ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models
by: Wang, Ziyue, et al.
Published: (2024)

Spatial-VLN: Zero-Shot Vision-and-Language Navigation With Explicit Spatial Perception and Exploration
by: Yue, Lu, et al.
Published: (2026)

ComGS: Efficient 3D Object-Scene Composition via Surface Octahedral Probes
by: Gao, Jian, et al.
Published: (2025)

FailureAtlas:Mapping the Failure Landscape of T2I Models via Active Exploration
by: Chen, Muxi, et al.
Published: (2025)

Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO
by: Zhu, Muzhi, et al.
Published: (2025)

REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers
by: Leng, Xingjian, et al.
Published: (2025)

Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation
by: Zhu, Ziyu, et al.
Published: (2025)

Mamba-CAD: State Space Model For 3D Computer-Aided Design Generative Modeling
by: Li, Xueyang, et al.
Published: (2026)

ESAM++: Efficient Online 3D Perception on the Edge
by: Liu, Qin, et al.
Published: (2026)

MaFreeI2P: A Matching-Free Image-to-Point Cloud Registration Paradigm with Active Camera Pose Retrieval
by: Yao, Gongxin, et al.
Published: (2024)

Communication-Efficient Collaborative Perception via Information Filling with Codebook
by: Hu, Yue, et al.
Published: (2024)

Learn 3D VQA Better with Active Selection and Reannotation
by: Zhou, Shengli, et al.
Published: (2025)

Pragmatic Communication in Multi-Agent Collaborative Perception
by: Hu, Yue, et al.
Published: (2024)

CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation
by: Li, Jiahao, et al.
Published: (2025)