Saved in:
| Main Authors: | Zhang, Yixin, Hou, Yunzhong, Li, Longqi, Qin, Zhenyue, Liu, Yang, Yao, Yue |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.05933 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
HandCraft: Anatomically Correct Restoration of Malformed Hands in Diffusion Generated Images
by: Qin, Zhenyue, et al.
Published: (2024)
by: Qin, Zhenyue, et al.
Published: (2024)
Learning Camera Movement Control from Real-World Drone Videos
by: Hou, Yunzhong, et al.
Published: (2024)
by: Hou, Yunzhong, et al.
Published: (2024)
Decision-Driven Semantic Object Exploration for Legged Robots via Confidence-Calibrated Perception and Topological Subgoal Selection
by: Zhao, Guoyang, et al.
Published: (2025)
by: Zhao, Guoyang, et al.
Published: (2025)
Authentic Emotion Mapping: Benchmarking Facial Expressions in Real News
by: Zhang, Qixuan, et al.
Published: (2024)
by: Zhang, Qixuan, et al.
Published: (2024)
Mind the Rarities: Can Rare Skin Diseases Be Reliably Diagnosed via Diagnostic Reasoning?
by: Liu, Yang, et al.
Published: (2026)
by: Liu, Yang, et al.
Published: (2026)
Visual Prompting in LLMs for Enhancing Emotion Recognition
by: Zhang, Qixuan, et al.
Published: (2024)
by: Zhang, Qixuan, et al.
Published: (2024)
ActFormer: Scalable Collaborative Perception via Active Queries
by: Huang, Suozhi, et al.
Published: (2024)
by: Huang, Suozhi, et al.
Published: (2024)
MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception
by: Qin, Yiran, et al.
Published: (2023)
by: Qin, Yiran, et al.
Published: (2023)
Pursuing Minimal Sufficiency in Spatial Reasoning
by: Guo, Yejie, et al.
Published: (2025)
by: Guo, Yejie, et al.
Published: (2025)
Towards High-Fidelity CAD Generation via LLM-Driven Program Generation and Text-Based B-Rep Primitive Grounding
by: Li, Jiahao, et al.
Published: (2026)
by: Li, Jiahao, et al.
Published: (2026)
Plane Geometry Problem Solving with Multi-modal Reasoning: A Survey
by: Cho, Seunghyuk, et al.
Published: (2025)
by: Cho, Seunghyuk, et al.
Published: (2025)
GeoDANO: Geometric VLM with Domain Agnostic Vision Encoder
by: Cho, Seunghyuk, et al.
Published: (2025)
by: Cho, Seunghyuk, et al.
Published: (2025)
HeatV2X: Scalable Heterogeneous Collaborative Perception via Efficient Alignment and Interaction
by: Zhao, Yueran, et al.
Published: (2025)
by: Zhao, Yueran, et al.
Published: (2025)
Active Visual Perception: Opportunities and Challenges
by: Li, Yian, et al.
Published: (2025)
by: Li, Yian, et al.
Published: (2025)
SCSA: Exploring the Synergistic Effects Between Spatial and Channel Attention
by: Si, Yunzhong, et al.
Published: (2024)
by: Si, Yunzhong, et al.
Published: (2024)
ReCAD: Reinforcement Learning Enhanced Parametric CAD Model Generation with Vision-Language Models
by: Li, Jiahao, et al.
Published: (2025)
by: Li, Jiahao, et al.
Published: (2025)
Effective Training Data Synthesis for Improving MLLM Chart Understanding
by: Yang, Yuwei, et al.
Published: (2025)
by: Yang, Yuwei, et al.
Published: (2025)
ADAptation: Reconstruction-based Unsupervised Active Learning for Breast Ultrasound Diagnosis
by: Duan, Yaofei, et al.
Published: (2025)
by: Duan, Yaofei, et al.
Published: (2025)
JAQ: Joint Efficient Architecture Design and Low-Bit Quantization with Hardware-Software Co-Exploration
by: Wang, Mingzi, et al.
Published: (2025)
by: Wang, Mingzi, et al.
Published: (2025)
Evaluating Time Awareness and Cross-modal Active Perception of Large Models via 4D Escape Room Task
by: Dong, Yurui, et al.
Published: (2026)
by: Dong, Yurui, et al.
Published: (2026)
Look-Closer-Then-Diagnose: Confidence-Aware Ultrasound VQA via Active Zooming
by: Zhou, Yue, et al.
Published: (2026)
by: Zhou, Yue, et al.
Published: (2026)
Perceive, Verify and Understand Long Video: Multi-Granular Perception and Active Verification via Interactive Agents
by: Li, Jiahua, et al.
Published: (2025)
by: Li, Jiahua, et al.
Published: (2025)
Seek-CAD: A Self-refined Generative Modeling for 3D Parametric CAD Using Local Inference via DeepSeek
by: Li, Xueyang, et al.
Published: (2025)
by: Li, Xueyang, et al.
Published: (2025)
UMind-VL: A Generalist Ultrasound Vision-Language Model for Unified Grounded Perception and Comprehensive Interpretation
by: Chen, Dengbo, et al.
Published: (2025)
by: Chen, Dengbo, et al.
Published: (2025)
Extreme Amodal Face Detection
by: Song, Changlin, et al.
Published: (2025)
by: Song, Changlin, et al.
Published: (2025)
Scaling Laws for Deepfake Detection
by: Wang, Wenhao, et al.
Published: (2025)
by: Wang, Wenhao, et al.
Published: (2025)
ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models
by: Wang, Ziyue, et al.
Published: (2024)
by: Wang, Ziyue, et al.
Published: (2024)
Spatial-VLN: Zero-Shot Vision-and-Language Navigation With Explicit Spatial Perception and Exploration
by: Yue, Lu, et al.
Published: (2026)
by: Yue, Lu, et al.
Published: (2026)
ComGS: Efficient 3D Object-Scene Composition via Surface Octahedral Probes
by: Gao, Jian, et al.
Published: (2025)
by: Gao, Jian, et al.
Published: (2025)
FailureAtlas:Mapping the Failure Landscape of T2I Models via Active Exploration
by: Chen, Muxi, et al.
Published: (2025)
by: Chen, Muxi, et al.
Published: (2025)
Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO
by: Zhu, Muzhi, et al.
Published: (2025)
by: Zhu, Muzhi, et al.
Published: (2025)
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers
by: Leng, Xingjian, et al.
Published: (2025)
by: Leng, Xingjian, et al.
Published: (2025)
Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation
by: Zhu, Ziyu, et al.
Published: (2025)
by: Zhu, Ziyu, et al.
Published: (2025)
Mamba-CAD: State Space Model For 3D Computer-Aided Design Generative Modeling
by: Li, Xueyang, et al.
Published: (2026)
by: Li, Xueyang, et al.
Published: (2026)
ESAM++: Efficient Online 3D Perception on the Edge
by: Liu, Qin, et al.
Published: (2026)
by: Liu, Qin, et al.
Published: (2026)
MaFreeI2P: A Matching-Free Image-to-Point Cloud Registration Paradigm with Active Camera Pose Retrieval
by: Yao, Gongxin, et al.
Published: (2024)
by: Yao, Gongxin, et al.
Published: (2024)
Communication-Efficient Collaborative Perception via Information Filling with Codebook
by: Hu, Yue, et al.
Published: (2024)
by: Hu, Yue, et al.
Published: (2024)
Learn 3D VQA Better with Active Selection and Reannotation
by: Zhou, Shengli, et al.
Published: (2025)
by: Zhou, Shengli, et al.
Published: (2025)
Pragmatic Communication in Multi-Agent Collaborative Perception
by: Hu, Yue, et al.
Published: (2024)
by: Hu, Yue, et al.
Published: (2024)
CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation
by: Li, Jiahao, et al.
Published: (2025)
by: Li, Jiahao, et al.
Published: (2025)
Similar Items
-
HandCraft: Anatomically Correct Restoration of Malformed Hands in Diffusion Generated Images
by: Qin, Zhenyue, et al.
Published: (2024) -
Learning Camera Movement Control from Real-World Drone Videos
by: Hou, Yunzhong, et al.
Published: (2024) -
Decision-Driven Semantic Object Exploration for Legged Robots via Confidence-Calibrated Perception and Topological Subgoal Selection
by: Zhao, Guoyang, et al.
Published: (2025) -
Authentic Emotion Mapping: Benchmarking Facial Expressions in Real News
by: Zhang, Qixuan, et al.
Published: (2024) -
Mind the Rarities: Can Rare Skin Diseases Be Reliably Diagnosed via Diagnostic Reasoning?
by: Liu, Yang, et al.
Published: (2026)