:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Garosi, Marco, Tedoldi, Riccardo, Boscaini, Davide, Mancini, Massimiliano, Sebe, Nicu, Poiesi, Fabio
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2412.04247
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Accurate and efficient zero-shot 6D pose estimation with frozen foundation models
by: Caraffa, Andrea, et al.
Published: (2025)

Distilling 3D distinctive local descriptors for 6D pose estimation
by: Hamza, Amir, et al.
Published: (2025)

Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding
by: Li, Jinlong, et al.
Published: (2025)

FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models
by: Caraffa, Andrea, et al.
Published: (2023)

Functionality understanding and segmentation in 3D scenes
by: Corsetti, Jaime, et al.
Published: (2024)

Generative 6D Pose Estimation via Conditional Flow Matching
by: Hamza, Amir, et al.
Published: (2026)

Leveraging Confident Image Regions for Source-Free Domain-Adaptive Object Detection
by: Mekhalfi, Mohamed Lamine, et al.
Published: (2025)

Open-vocabulary object 6D pose estimation
by: Corsetti, Jaime, et al.
Published: (2023)

Fully-Geometric Cross-Attention for Point Cloud Registration
by: Wang, Weijie, et al.
Published: (2025)

High-resolution open-vocabulary object 6D pose estimation
by: Corsetti, Jaime, et al.
Published: (2024)

Geometrically-driven Aggregation for Zero-shot 3D Point Cloud Understanding
by: Mei, Guofeng, et al.
Published: (2023)

Action-guided generation of 3D functionality segmentation data
by: Corsetti, Jaime, et al.
Published: (2025)

AI-driven visual monitoring of industrial assembly tasks
by: Nardon, Mattia, et al.
Published: (2025)

Large Multimodal Models as General In-Context Classifiers
by: Garosi, Marco, et al.
Published: (2026)

Compositional Caching for Training-free Open-vocabulary Attribute Detection
by: Garosi, Marco, et al.
Published: (2025)

Self-Supervised and Generalizable Tokenization for CLIP-Based 3D Understanding
by: Mei, Guofeng, et al.
Published: (2025)

Masked Clustering Prediction for Unsupervised Point Cloud Pre-training
by: Ren, Bin, et al.
Published: (2025)

Revisiting Fully Convolutional Geometric Features for Object 6D Pose Estimation
by: Jaime Corsetti Davide Boscaini Fabio Poiesi
Published: (2026)

Cues3D: Unleashing the Power of Sole NeRF for Consistent and Unique Instances in Open-Vocabulary 3D Panoptic Segmentation
by: Xue, Feng, et al.
Published: (2025)

Safe Vision-Language Models via Unsafe Weights Manipulation
by: D'Incà, Moreno, et al.
Published: (2025)

GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models
by: D'Incà, Moreno, et al.
Published: (2024)

ZeroReg: Zero-Shot Point Cloud Registration with Foundation Models
by: Wang, Weijie, et al.
Published: (2023)

Vocabulary-Free 3D Instance Segmentation with Vision and Language Assistant
by: Mei, Guofeng, et al.
Published: (2024)

3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance
by: Xu, Xiaoxu, et al.
Published: (2024)

LESS: Label-Efficient and Single-Stage Referring 3D Segmentation
by: Liu, Xuexun, et al.
Published: (2024)

Exploring Fine-grained Retail Product Discrimination with Zero-shot Object Classification Using Vision-Language Models
by: Tur, Anil Osman, et al.
Published: (2024)

RankFeat&RankWeight: Rank-1 Feature/Weight Removal for Out-of-distribution Detection
by: Song, Yue, et al.
Published: (2023)

High-Fidelity 3D Facial Avatar Synthesis with Controllable Fine-Grained Expressions
by: He, Yikang, et al.
Published: (2026)

CHIP: A multi-sensor dataset for 6D pose estimation of chairs in industrial settings
by: Nardon, Mattia, et al.
Published: (2025)

3D Weakly Supervised Semantic Segmentation via Class-Aware and Geometry-Guided Pseudo-Label Refinement
by: Xu, Xiaoxu, et al.
Published: (2025)

AlignCAT: Visual-Linguistic Alignment of Category and Attribute for Weakly Supervised Visual Grounding
by: Wang, Yidan, et al.
Published: (2025)

PoInit-of-View: Poisoning Initialization of Views Transfers Across Multiple 3D Reconstruction Systems
by: Wang, Weijie, et al.
Published: (2026)

CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIP
by: Xing, Songlong, et al.
Published: (2025)

Novel class discovery meets foundation models for 3D semantic segmentation
by: Riz, Luigi, et al.
Published: (2023)

Orthogonal Projection Subspace to Aggregate Online Prior-knowledge for Continual Test-time Adaptation
by: Li, Jinlong, et al.
Published: (2025)

FreeInsert: Disentangled Text-Guided Object Insertion in 3D Gaussian Scene without Spatial Priors
by: Li, Chenxi, et al.
Published: (2025)

Token Reduction via Local and Global Contexts Optimization for Efficient Video Large Language Models
by: Li, Jinlong, et al.
Published: (2026)

Hierarchical Visual Prompt Learning for Continual Video Instance Segmentation
by: Dong, Jiahua, et al.
Published: (2025)

6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model
by: Bortolon, Matteo, et al.
Published: (2024)

Asymmetric GANs for Image-to-Image Translation
by: Tang, Hao, et al.
Published: (2019)