Saved in:
| Main Authors: | Corsetti, Jaime, Giuliari, Francesco, Boscaini, Davide, Hermosilla, Pedro, Pilzer, Andrea, Mei, Guofeng, Delitzas, Alexandros, Engelmann, Francis, Poiesi, Fabio |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.23230 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Functionality understanding and segmentation in 3D scenes
by: Corsetti, Jaime, et al.
Published: (2024)
by: Corsetti, Jaime, et al.
Published: (2024)
High-resolution open-vocabulary object 6D pose estimation
by: Corsetti, Jaime, et al.
Published: (2024)
by: Corsetti, Jaime, et al.
Published: (2024)
Revisiting Fully Convolutional Geometric Features for Object 6D Pose Estimation
by: Jaime Corsetti Davide Boscaini Fabio Poiesi
Published: (2026)
by: Jaime Corsetti Davide Boscaini Fabio Poiesi
Published: (2026)
Open-vocabulary object 6D pose estimation
by: Corsetti, Jaime, et al.
Published: (2023)
by: Corsetti, Jaime, et al.
Published: (2023)
Accurate and efficient zero-shot 6D pose estimation with frozen foundation models
by: Caraffa, Andrea, et al.
Published: (2025)
by: Caraffa, Andrea, et al.
Published: (2025)
Distilling 3D distinctive local descriptors for 6D pose estimation
by: Hamza, Amir, et al.
Published: (2025)
by: Hamza, Amir, et al.
Published: (2025)
FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models
by: Caraffa, Andrea, et al.
Published: (2023)
by: Caraffa, Andrea, et al.
Published: (2023)
Leveraging Confident Image Regions for Source-Free Domain-Adaptive Object Detection
by: Mekhalfi, Mohamed Lamine, et al.
Published: (2025)
by: Mekhalfi, Mohamed Lamine, et al.
Published: (2025)
Generative 6D Pose Estimation via Conditional Flow Matching
by: Hamza, Amir, et al.
Published: (2026)
by: Hamza, Amir, et al.
Published: (2026)
Free-form language-based robotic reasoning and grasping
by: Jiao, Runyu, et al.
Published: (2025)
by: Jiao, Runyu, et al.
Published: (2025)
Obstruction reasoning for robotic grasping
by: Jiao, Runyu, et al.
Published: (2025)
by: Jiao, Runyu, et al.
Published: (2025)
3D Part Segmentation via Geometric Aggregation of 2D Visual Features
by: Garosi, Marco, et al.
Published: (2024)
by: Garosi, Marco, et al.
Published: (2024)
An analysis of vision-language models for fabric retrieval
by: Giuliari, Francesco, et al.
Published: (2025)
by: Giuliari, Francesco, et al.
Published: (2025)
Vocabulary-Free 3D Instance Segmentation with Vision and Language Assistant
by: Mei, Guofeng, et al.
Published: (2024)
by: Mei, Guofeng, et al.
Published: (2024)
Geometrically-driven Aggregation for Zero-shot 3D Point Cloud Understanding
by: Mei, Guofeng, et al.
Published: (2023)
by: Mei, Guofeng, et al.
Published: (2023)
Search3D: Hierarchical Open-Vocabulary 3D Segmentation
by: Takmaz, Ayca, et al.
Published: (2024)
by: Takmaz, Ayca, et al.
Published: (2024)
AI-driven visual monitoring of industrial assembly tasks
by: Nardon, Mattia, et al.
Published: (2025)
by: Nardon, Mattia, et al.
Published: (2025)
Hierarchical and Holistic Open-Vocabulary Functional 3D Scene Graphs for Indoor Spaces
by: Hu, Xinggang, et al.
Published: (2026)
by: Hu, Xinggang, et al.
Published: (2026)
Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces
by: Zhang, Chenyangguang, et al.
Published: (2025)
by: Zhang, Chenyangguang, et al.
Published: (2025)
OpenHype: Hyperbolic Embeddings for Hierarchical Open-Vocabulary Radiance Fields
by: Weijler, Lisa, et al.
Published: (2025)
by: Weijler, Lisa, et al.
Published: (2025)
PerLA: Perceptive 3D Language Assistant
by: Mei, Guofeng, et al.
Published: (2024)
by: Mei, Guofeng, et al.
Published: (2024)
Multimodal Fusion SLAM with Fourier Attention
by: Zhou, Youjie, et al.
Published: (2025)
by: Zhou, Youjie, et al.
Published: (2025)
Efficient Encoder-Free Fourier-based 3D Large Multimodal Model
by: Mei, Guofeng, et al.
Published: (2026)
by: Mei, Guofeng, et al.
Published: (2026)
Exploring Fine-grained Retail Product Discrimination with Zero-shot Object Classification Using Vision-Language Models
by: Tur, Anil Osman, et al.
Published: (2024)
by: Tur, Anil Osman, et al.
Published: (2024)
Fully-Geometric Cross-Attention for Point Cloud Registration
by: Wang, Weijie, et al.
Published: (2025)
by: Wang, Weijie, et al.
Published: (2025)
REACT3D: Recovering Articulations for Interactive Physical 3D Scenes
by: Huang, Zhao, et al.
Published: (2025)
by: Huang, Zhao, et al.
Published: (2025)
Novel class discovery meets foundation models for 3D semantic segmentation
by: Riz, Luigi, et al.
Published: (2023)
by: Riz, Luigi, et al.
Published: (2023)
Masked Clustering Prediction for Unsupervised Point Cloud Pre-training
by: Ren, Bin, et al.
Published: (2025)
by: Ren, Bin, et al.
Published: (2025)
Action-Guided Attention for Video Action Anticipation
by: Tai, Tsung-Ming, et al.
Published: (2026)
by: Tai, Tsung-Ming, et al.
Published: (2026)
CHIP: A multi-sensor dataset for 6D pose estimation of chairs in industrial settings
by: Nardon, Mattia, et al.
Published: (2025)
by: Nardon, Mattia, et al.
Published: (2025)
FunRec: Reconstructing Functional 3D Scenes from Egocentric Interaction Videos
by: Delitzas, Alexandros, et al.
Published: (2026)
by: Delitzas, Alexandros, et al.
Published: (2026)
ZeroReg: Zero-Shot Point Cloud Registration with Foundation Models
by: Wang, Weijie, et al.
Published: (2023)
by: Wang, Weijie, et al.
Published: (2023)
GLASS: Graph and Vision-Language Assisted Semantic Shape Correspondence
by: Xiao, Qinfeng, et al.
Published: (2026)
by: Xiao, Qinfeng, et al.
Published: (2026)
SIGHT: Synthesizing Image-Text Conditioned and Geometry-Guided 3D Hand-Object Trajectories
by: Gavryushin, Alexey, et al.
Published: (2025)
by: Gavryushin, Alexey, et al.
Published: (2025)
Self-Supervised and Generalizable Tokenization for CLIP-Based 3D Understanding
by: Mei, Guofeng, et al.
Published: (2025)
by: Mei, Guofeng, et al.
Published: (2025)
Controllable Egocentric Video Generation via Occlusion-Aware Sparse 3D Hand Joints
by: Zhang, Chenyangguang, et al.
Published: (2026)
by: Zhang, Chenyangguang, et al.
Published: (2026)
Efficient Continuous Group Convolutions for Local SE(3) Equivariance in 3D Point Clouds
by: Weijler, Lisa, et al.
Published: (2025)
by: Weijler, Lisa, et al.
Published: (2025)
DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly
by: Scarpellini, Gianluca, et al.
Published: (2024)
by: Scarpellini, Gianluca, et al.
Published: (2024)
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding
by: Li, Jinlong, et al.
Published: (2025)
by: Li, Jinlong, et al.
Published: (2025)
When Good and Reproducible Results are a Giant with Feet of Clay: The Importance of Software Quality in NLP
by: Papi, Sara, et al.
Published: (2023)
by: Papi, Sara, et al.
Published: (2023)
Similar Items
-
Functionality understanding and segmentation in 3D scenes
by: Corsetti, Jaime, et al.
Published: (2024) -
High-resolution open-vocabulary object 6D pose estimation
by: Corsetti, Jaime, et al.
Published: (2024) -
Revisiting Fully Convolutional Geometric Features for Object 6D Pose Estimation
by: Jaime Corsetti Davide Boscaini Fabio Poiesi
Published: (2026) -
Open-vocabulary object 6D pose estimation
by: Corsetti, Jaime, et al.
Published: (2023) -
Accurate and efficient zero-shot 6D pose estimation with frozen foundation models
by: Caraffa, Andrea, et al.
Published: (2025)