:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Gou, Dongqiang, He, Xuming
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.17647
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Object Affordance Recognition and Grounding via Multi-scale Cross-modal Representation Learning
by: Wan, Xinhang, et al.
Published: (2025)

Prototype-Aware Multimodal Alignment for Open-Vocabulary Visual Grounding
by: Xie, Jiangnan, et al.
Published: (2025)

DAG: Unleash the Potential of Diffusion Model for Open-Vocabulary 3D Affordance Grounding
by: Wang, Hanqing, et al.
Published: (2025)

Open-Vocabulary Semantic Part Segmentation of 3D Human
by: Suzuki, Keito, et al.
Published: (2025)

GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding
by: Shao, Yawen, et al.
Published: (2024)

Affogato: Learning Open-Vocabulary Affordance Grounding with Automated Data Generation at Scale
by: Lee, Junha, et al.
Published: (2025)

3D-AffordanceLLM: Harnessing Large Language Models for Open-Vocabulary Affordance Detection in 3D Worlds
by: Chu, Hengshuo, et al.
Published: (2025)

Weakly-Supervised Affordance Grounding Guided by Part-Level Semantic Priors
by: Xu, Peiran, et al.
Published: (2025)

MARIS: Marine Open-Vocabulary Instance Segmentation with Geometric Enhancement and Semantic Alignment
by: Li, Bingyu, et al.
Published: (2025)

Semantic Alignment in Hyperbolic Space for Open-Vocabulary Semantic Segmentation
by: Truong, Hoang M., et al.
Published: (2026)

Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding
by: Li, Ruihuang, et al.
Published: (2024)

AffordGrasp: Cross-Modal Diffusion for Affordance-Aware Grasp Synthesis
by: Wu, Xiaofei, et al.
Published: (2026)

Learning 2D Invariant Affordance Knowledge for 3D Affordance Grounding
by: Gao, Xianqiang, et al.
Published: (2024)

Lost in Translation? Vocabulary Alignment for Source-Free Adaptation in Open-Vocabulary Semantic Segmentation
by: Mazzucco, Silvio, et al.
Published: (2025)

In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
by: Kang, Dahyun, et al.
Published: (2024)

GeoGuide: Hierarchical Geometric Guidance for Open-Vocabulary 3D Semantic Segmentation
by: Tao, Xujing, et al.
Published: (2026)

LangHOPS: Language Grounded Hierarchical Open-Vocabulary Part Segmentation
by: Miao, Yang, et al.
Published: (2025)

From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models
by: Li, Rongjie, et al.
Published: (2024)

Task-Aware 3D Affordance Segmentation via 2D Guidance and Geometric Refinement
by: He, Lian, et al.
Published: (2025)

VoxAfford: Multi-Scale Voxel-Token Fusion for Open-Vocabulary 3D Affordance Detection
by: Sun, Haowen, et al.
Published: (2026)

Open-Vocabulary Object Detection via Neighboring Region Attention Alignment
by: Qiang, Sunyuan, et al.
Published: (2024)

OV-SCAN: Semantically Consistent Alignment for Novel Object Discovery in Open-Vocabulary 3D Object Detection
by: Chow, Adrian, et al.
Published: (2025)

Open-Vocabulary Federated Learning with Multimodal Prototyping
by: Zeng, Huimin, et al.
Published: (2024)

ExpAlign: Expectation-Guided Vision-Language Alignment for Open-Vocabulary Grounding
by: Hu, Junyi, et al.
Published: (2026)

SLAck: Semantic, Location, and Appearance Aware Open-Vocabulary Tracking
by: Li, Siyuan, et al.
Published: (2024)

GLOVER: Generalizable Open-Vocabulary Affordance Reasoning for Task-Oriented Grasping
by: Ma, Teli, et al.
Published: (2024)

Affostruction: 3D Affordance Grounding with Generative Reconstruction
by: Park, Chunghyun, et al.
Published: (2026)

Grounding by Remembering: Cross-Scene and In-Scene Memory for 3D Functional Affordances
by: Wang, Qirui, et al.
Published: (2026)

Open-Vocabulary SAM3D: Towards Training-free Open-Vocabulary 3D Scene Understanding
by: Tai, Hanchen, et al.
Published: (2024)

SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding
by: Li, Rong, et al.
Published: (2024)

VAGNet: Grounding 3D Affordance from Human-Object Interactions in Videos
by: Mao, Aihua, et al.
Published: (2026)

FGAseg: Fine-Grained Pixel-Text Alignment for Open-Vocabulary Semantic Segmentation
by: Li, Bingyu, et al.
Published: (2025)

Interpretable Affordance Detection on 3D Point Clouds with Probabilistic Prototypes
by: Li, Maximilian Xiling, et al.
Published: (2025)

PCA-Seg: Revisiting Cost Aggregation for Open-Vocabulary Semantic and Part Segmentation
by: Yin, Jianjian, et al.
Published: (2026)

OpenVidVRD: Open-Vocabulary Video Visual Relation Detection via Prompt-Driven Semantic Space Alignment
by: Liu, Qi, et al.
Published: (2025)

Open-Vocabulary Indoor Object Grounding with 3D Hierarchical Scene Graph
by: Linok, Sergey, et al.
Published: (2025)

Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
by: Yuan, Zhihao, et al.
Published: (2023)

From Open-Vocabulary to Vocabulary-Free Semantic Segmentation
by: Reichard, Klara, et al.
Published: (2025)

Hierarchical Cross-Modal Alignment for Open-Vocabulary 3D Object Detection
by: Zhao, Youjun, et al.
Published: (2025)

PGOV3D: Open-Vocabulary 3D Semantic Segmentation with Partial-to-Global Curriculum
by: Zhang, Shiqi, et al.
Published: (2025)