:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Corsetti, Jaime, Giuliari, Francesco, Boscaini, Davide, Hermosilla, Pedro, Pilzer, Andrea, Mei, Guofeng, Delitzas, Alexandros, Engelmann, Francis, Poiesi, Fabio
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2511.23230
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Functionality understanding and segmentation in 3D scenes
by: Corsetti, Jaime, et al.
Published: (2024)

High-resolution open-vocabulary object 6D pose estimation
by: Corsetti, Jaime, et al.
Published: (2024)

Revisiting Fully Convolutional Geometric Features for Object 6D Pose Estimation
by: Jaime Corsetti Davide Boscaini Fabio Poiesi
Published: (2026)

Open-vocabulary object 6D pose estimation
by: Corsetti, Jaime, et al.
Published: (2023)

Accurate and efficient zero-shot 6D pose estimation with frozen foundation models
by: Caraffa, Andrea, et al.
Published: (2025)

Distilling 3D distinctive local descriptors for 6D pose estimation
by: Hamza, Amir, et al.
Published: (2025)

FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models
by: Caraffa, Andrea, et al.
Published: (2023)

Leveraging Confident Image Regions for Source-Free Domain-Adaptive Object Detection
by: Mekhalfi, Mohamed Lamine, et al.
Published: (2025)

Generative 6D Pose Estimation via Conditional Flow Matching
by: Hamza, Amir, et al.
Published: (2026)

Free-form language-based robotic reasoning and grasping
by: Jiao, Runyu, et al.
Published: (2025)

Obstruction reasoning for robotic grasping
by: Jiao, Runyu, et al.
Published: (2025)

3D Part Segmentation via Geometric Aggregation of 2D Visual Features
by: Garosi, Marco, et al.
Published: (2024)

An analysis of vision-language models for fabric retrieval
by: Giuliari, Francesco, et al.
Published: (2025)

Vocabulary-Free 3D Instance Segmentation with Vision and Language Assistant
by: Mei, Guofeng, et al.
Published: (2024)

Geometrically-driven Aggregation for Zero-shot 3D Point Cloud Understanding
by: Mei, Guofeng, et al.
Published: (2023)

Search3D: Hierarchical Open-Vocabulary 3D Segmentation
by: Takmaz, Ayca, et al.
Published: (2024)

AI-driven visual monitoring of industrial assembly tasks
by: Nardon, Mattia, et al.
Published: (2025)

Hierarchical and Holistic Open-Vocabulary Functional 3D Scene Graphs for Indoor Spaces
by: Hu, Xinggang, et al.
Published: (2026)

Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces
by: Zhang, Chenyangguang, et al.
Published: (2025)

OpenHype: Hyperbolic Embeddings for Hierarchical Open-Vocabulary Radiance Fields
by: Weijler, Lisa, et al.
Published: (2025)

PerLA: Perceptive 3D Language Assistant
by: Mei, Guofeng, et al.
Published: (2024)

Multimodal Fusion SLAM with Fourier Attention
by: Zhou, Youjie, et al.
Published: (2025)

Efficient Encoder-Free Fourier-based 3D Large Multimodal Model
by: Mei, Guofeng, et al.
Published: (2026)

Exploring Fine-grained Retail Product Discrimination with Zero-shot Object Classification Using Vision-Language Models
by: Tur, Anil Osman, et al.
Published: (2024)

Fully-Geometric Cross-Attention for Point Cloud Registration
by: Wang, Weijie, et al.
Published: (2025)

REACT3D: Recovering Articulations for Interactive Physical 3D Scenes
by: Huang, Zhao, et al.
Published: (2025)

Novel class discovery meets foundation models for 3D semantic segmentation
by: Riz, Luigi, et al.
Published: (2023)

Masked Clustering Prediction for Unsupervised Point Cloud Pre-training
by: Ren, Bin, et al.
Published: (2025)

Action-Guided Attention for Video Action Anticipation
by: Tai, Tsung-Ming, et al.
Published: (2026)

CHIP: A multi-sensor dataset for 6D pose estimation of chairs in industrial settings
by: Nardon, Mattia, et al.
Published: (2025)

FunRec: Reconstructing Functional 3D Scenes from Egocentric Interaction Videos
by: Delitzas, Alexandros, et al.
Published: (2026)

ZeroReg: Zero-Shot Point Cloud Registration with Foundation Models
by: Wang, Weijie, et al.
Published: (2023)

GLASS: Graph and Vision-Language Assisted Semantic Shape Correspondence
by: Xiao, Qinfeng, et al.
Published: (2026)

SIGHT: Synthesizing Image-Text Conditioned and Geometry-Guided 3D Hand-Object Trajectories
by: Gavryushin, Alexey, et al.
Published: (2025)

Self-Supervised and Generalizable Tokenization for CLIP-Based 3D Understanding
by: Mei, Guofeng, et al.
Published: (2025)

Controllable Egocentric Video Generation via Occlusion-Aware Sparse 3D Hand Joints
by: Zhang, Chenyangguang, et al.
Published: (2026)

Efficient Continuous Group Convolutions for Local SE(3) Equivariance in 3D Point Clouds
by: Weijler, Lisa, et al.
Published: (2025)

DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly
by: Scarpellini, Gianluca, et al.
Published: (2024)

Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding
by: Li, Jinlong, et al.
Published: (2025)

When Good and Reproducible Results are a Giant with Feet of Clay: The Importance of Software Quality in NLP
by: Papi, Sara, et al.
Published: (2023)