Saved in:
| Main Authors: | Xiao, Qinfeng, Mei, Guofeng, Liu, Qilong, Yi, Chenyuan, Poiesi, Fabio, Zhang, Jian, Yang, Bo, Kit-lun, Yick |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.07652 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Universal 3D Shape Matching via Coarse-to-Fine Language Guidance
by: Xiao, Qinfeng, et al.
Published: (2026)
by: Xiao, Qinfeng, et al.
Published: (2026)
Vocabulary-Free 3D Instance Segmentation with Vision and Language Assistant
by: Mei, Guofeng, et al.
Published: (2024)
by: Mei, Guofeng, et al.
Published: (2024)
Geometrically-driven Aggregation for Zero-shot 3D Point Cloud Understanding
by: Mei, Guofeng, et al.
Published: (2023)
by: Mei, Guofeng, et al.
Published: (2023)
Multimodal Fusion SLAM with Fourier Attention
by: Zhou, Youjie, et al.
Published: (2025)
by: Zhou, Youjie, et al.
Published: (2025)
Fully-Geometric Cross-Attention for Point Cloud Registration
by: Wang, Weijie, et al.
Published: (2025)
by: Wang, Weijie, et al.
Published: (2025)
PerLA: Perceptive 3D Language Assistant
by: Mei, Guofeng, et al.
Published: (2024)
by: Mei, Guofeng, et al.
Published: (2024)
Efficient Encoder-Free Fourier-based 3D Large Multimodal Model
by: Mei, Guofeng, et al.
Published: (2026)
by: Mei, Guofeng, et al.
Published: (2026)
Masked Clustering Prediction for Unsupervised Point Cloud Pre-training
by: Ren, Bin, et al.
Published: (2025)
by: Ren, Bin, et al.
Published: (2025)
ZeroReg: Zero-Shot Point Cloud Registration with Foundation Models
by: Wang, Weijie, et al.
Published: (2023)
by: Wang, Weijie, et al.
Published: (2023)
Free-form language-based robotic reasoning and grasping
by: Jiao, Runyu, et al.
Published: (2025)
by: Jiao, Runyu, et al.
Published: (2025)
Obstruction reasoning for robotic grasping
by: Jiao, Runyu, et al.
Published: (2025)
by: Jiao, Runyu, et al.
Published: (2025)
Action-guided generation of 3D functionality segmentation data
by: Corsetti, Jaime, et al.
Published: (2025)
by: Corsetti, Jaime, et al.
Published: (2025)
Self-Supervised and Generalizable Tokenization for CLIP-Based 3D Understanding
by: Mei, Guofeng, et al.
Published: (2025)
by: Mei, Guofeng, et al.
Published: (2025)
Accurate and efficient zero-shot 6D pose estimation with frozen foundation models
by: Caraffa, Andrea, et al.
Published: (2025)
by: Caraffa, Andrea, et al.
Published: (2025)
$S^3$: Synonymous Semantic Space for Improving Zero-Shot Generalization of Vision-Language Models
by: Yin, Xiaojie, et al.
Published: (2024)
by: Yin, Xiaojie, et al.
Published: (2024)
Leveraging Confident Image Regions for Source-Free Domain-Adaptive Object Detection
by: Mekhalfi, Mohamed Lamine, et al.
Published: (2025)
by: Mekhalfi, Mohamed Lamine, et al.
Published: (2025)
GSTran: Joint Geometric and Semantic Coherence for Point Cloud Segmentation
by: Li, Abiao, et al.
Published: (2024)
by: Li, Abiao, et al.
Published: (2024)
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding
by: Li, Jinlong, et al.
Published: (2025)
by: Li, Jinlong, et al.
Published: (2025)
FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models
by: Caraffa, Andrea, et al.
Published: (2023)
by: Caraffa, Andrea, et al.
Published: (2023)
Distilling 3D distinctive local descriptors for 6D pose estimation
by: Hamza, Amir, et al.
Published: (2025)
by: Hamza, Amir, et al.
Published: (2025)
Exploring Fine-grained Retail Product Discrimination with Zero-shot Object Classification Using Vision-Language Models
by: Tur, Anil Osman, et al.
Published: (2024)
by: Tur, Anil Osman, et al.
Published: (2024)
Rethinking Scanning Strategies with Vision Mamba in Semantic Segmentation of Remote Sensing Imagery: An Experimental Study
by: Zhu, Qinfeng, et al.
Published: (2024)
by: Zhu, Qinfeng, et al.
Published: (2024)
Graph Matching Optimization Network for Point Cloud Registration
by: Wu, Qianliang, et al.
Published: (2023)
by: Wu, Qianliang, et al.
Published: (2023)
Constrained Prompt Enhancement for Improving Zero-Shot Generalization of Vision-Language Models
by: Yin, Xiaojie, et al.
Published: (2025)
by: Yin, Xiaojie, et al.
Published: (2025)
View-on-Graph: Zero-shot 3D Visual Grounding via Vision-Language Reasoning on Scene Graphs
by: Liu, Yuanyuan, et al.
Published: (2025)
by: Liu, Yuanyuan, et al.
Published: (2025)
SOCO: Benchmarking Semantic Object Correspondence in Vision Foundation Models
by: Dünkel, Olaf, et al.
Published: (2026)
by: Dünkel, Olaf, et al.
Published: (2026)
Learning SO(3)-Invariant Semantic Correspondence via Local Shape Transform
by: Park, Chunghyun, et al.
Published: (2024)
by: Park, Chunghyun, et al.
Published: (2024)
Open-vocabulary object 6D pose estimation
by: Corsetti, Jaime, et al.
Published: (2023)
by: Corsetti, Jaime, et al.
Published: (2023)
Functionality understanding and segmentation in 3D scenes
by: Corsetti, Jaime, et al.
Published: (2024)
by: Corsetti, Jaime, et al.
Published: (2024)
An analysis of vision-language models for fabric retrieval
by: Giuliari, Francesco, et al.
Published: (2025)
by: Giuliari, Francesco, et al.
Published: (2025)
Generative 6D Pose Estimation via Conditional Flow Matching
by: Hamza, Amir, et al.
Published: (2026)
by: Hamza, Amir, et al.
Published: (2026)
Novel class discovery meets foundation models for 3D semantic segmentation
by: Riz, Luigi, et al.
Published: (2023)
by: Riz, Luigi, et al.
Published: (2023)
OpenHype: Hyperbolic Embeddings for Hierarchical Open-Vocabulary Radiance Fields
by: Weijler, Lisa, et al.
Published: (2025)
by: Weijler, Lisa, et al.
Published: (2025)
On Unsupervised Partial Shape Correspondence
by: Bracha, Amit, et al.
Published: (2023)
by: Bracha, Amit, et al.
Published: (2023)
Toward Semantic-Agnostic and Shape-Aware Vision-Language Segmentation Models
by: Seutin, Corentin, et al.
Published: (2026)
by: Seutin, Corentin, et al.
Published: (2026)
6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model
by: Bortolon, Matteo, et al.
Published: (2024)
by: Bortolon, Matteo, et al.
Published: (2024)
Towards Cross-View Point Correspondence in Vision-Language Models
by: Wang, Yipu, et al.
Published: (2025)
by: Wang, Yipu, et al.
Published: (2025)
Shape-of-You: Fused Gromov-Wasserstein Optimal Transport for Semantic Correspondence in-the-Wild
by: Im, Jiin, et al.
Published: (2026)
by: Im, Jiin, et al.
Published: (2026)
IFFNeRF: Initialisation Free and Fast 6DoF pose estimation from a single image and a NeRF model
by: Bortolon, Matteo, et al.
Published: (2024)
by: Bortolon, Matteo, et al.
Published: (2024)
CAS-IQA: Teaching Vision-Language Models for Synthetic Angiography Quality Assessment
by: Wang, Bo, et al.
Published: (2025)
by: Wang, Bo, et al.
Published: (2025)
Similar Items
-
Universal 3D Shape Matching via Coarse-to-Fine Language Guidance
by: Xiao, Qinfeng, et al.
Published: (2026) -
Vocabulary-Free 3D Instance Segmentation with Vision and Language Assistant
by: Mei, Guofeng, et al.
Published: (2024) -
Geometrically-driven Aggregation for Zero-shot 3D Point Cloud Understanding
by: Mei, Guofeng, et al.
Published: (2023) -
Multimodal Fusion SLAM with Fourier Attention
by: Zhou, Youjie, et al.
Published: (2025) -
Fully-Geometric Cross-Attention for Point Cloud Registration
by: Wang, Weijie, et al.
Published: (2025)