:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Hu, Junyi, Bai, Tian, Wu, Fengyi, Li, Wenyan, Peng, Zhenming, Zhang, Yi
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2601.22666
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

P$^2$HCT: Plug-and-Play Hierarchical C2F Transformer for Multi-Scale Feature Fusion
by: Hu, Junyi, et al.
Published: (2025)

GALA: Guided Attention with Language Alignment for Open Vocabulary Gaussian Splatting
by: Alegret, Elena, et al.
Published: (2025)

Unified Embedding Alignment for Open-Vocabulary Video Instance Segmentation
by: Fang, Hao, et al.
Published: (2024)

Prototype-Aware Multimodal Alignment for Open-Vocabulary Visual Grounding
by: Xie, Jiangnan, et al.
Published: (2025)

Decomposed Vision-Language Alignment for Fine-Grained Open-Vocabulary Segmentation
by: Wang, Chenhao, et al.
Published: (2026)

Neural Spatial-Temporal Tensor Representation for Infrared Small Target Detection
by: Wu, Fengyi, et al.
Published: (2024)

InfoCLIP: Bridging Vision-Language Pretraining and Open-Vocabulary Semantic Segmentation via Information-Theoretic Alignment Transfer
by: Yuan, Muyao, et al.
Published: (2025)

Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction
by: Li, Yunheng, et al.
Published: (2024)

RPCANet++: Deep Interpretable Robust PCA for Sparse Object Segmentation
by: Wu, Fengyi, et al.
Published: (2025)

LAGO: Language-Guided Adaptive Object-Region Focus for Zero-Shot Visual-Text Alignment
by: Hu, Junyi, et al.
Published: (2026)

Open-Vocabulary Video Anomaly Detection
by: Wu, Peng, et al.
Published: (2023)

OVS-DINO: Open-Vocabulary Segmentation via Structure-Aligned SAM-DINO with Language Guidance
by: Zeng, Haoxi, et al.
Published: (2026)

LangHOPS: Language Grounded Hierarchical Open-Vocabulary Part Segmentation
by: Miao, Yang, et al.
Published: (2025)

Open-Vocabulary Object Detection via Neighboring Region Attention Alignment
by: Qiang, Sunyuan, et al.
Published: (2024)

Part-Aware Open-Vocabulary 3D Affordance Grounding via Prototypical Semantic and Geometric Alignment
by: Gou, Dongqiang, et al.
Published: (2026)

Bilateral Collaboration with Large Vision-Language Models for Open Vocabulary Human-Object Interaction Detection
by: Hu, Yupeng, et al.
Published: (2025)

MSR-Align: Policy-Grounded Multimodal Alignment for Safety-Aware Reasoning in Vision-Language Models
by: Xia, Yinan, et al.
Published: (2025)

DRPCA-Net: Make Robust PCA Great Again for Infrared Small Target Detection
by: Xiong, Zihao, et al.
Published: (2025)

Thermal-Det: Language-Guided Cross-Modal Distillation for Open-Vocabulary Thermal Object Detection
by: Ranasinghe, Yasiru, et al.
Published: (2026)

Open-Vocabulary Camouflaged Object Segmentation with Cascaded Vision Language Models
by: Zhao, Kai, et al.
Published: (2025)

Beyond-Labels: Advancing Open-Vocabulary Segmentation With Vision-Language Models
by: Rahman, Muhammad Atta ur, et al.
Published: (2025)

Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding
by: Li, Ruihuang, et al.
Published: (2024)

CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
by: Wu, Size, et al.
Published: (2023)

Lost in Translation? Vocabulary Alignment for Source-Free Adaptation in Open-Vocabulary Semantic Segmentation
by: Mazzucco, Silvio, et al.
Published: (2025)

ComAlign: Compositional Alignment in Vision-Language Models
by: Abdollah, Ali, et al.
Published: (2024)

Modest-Align: Data-Efficient Alignment for Vision-Language Models
by: Liu, Jiaxiang, et al.
Published: (2025)

From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models
by: Li, Rongjie, et al.
Published: (2024)

World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models
by: Ma, Ziqiao, et al.
Published: (2023)

Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding
by: Wasim, Syed Talal, et al.
Published: (2023)

In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
by: Kang, Dahyun, et al.
Published: (2024)

ExpPortrait: Expressive Portrait Generation via Personalized Representation
by: Wang, Junyi, et al.
Published: (2026)

Exploring Vision-Language Models for Open-Vocabulary Zero-Shot Action Segmentation
by: Unmesh, Asim, et al.
Published: (2026)

Test-Time Adaptation of Vision-Language Models for Open-Vocabulary Semantic Segmentation
by: Noori, Mehrdad, et al.
Published: (2025)

Adapting Vision-Language Model with Fine-grained Semantics for Open-Vocabulary Segmentation
by: Chng, Yong Xien, et al.
Published: (2024)

Leveraging Vision-Language Models for Open-Vocabulary Instance Segmentation and Tracking
by: Pätzold, Bastian, et al.
Published: (2025)

AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models
by: Wu, Yuhang, et al.
Published: (2024)

Renovating Names in Open-Vocabulary Segmentation Benchmarks
by: Huang, Haiwen, et al.
Published: (2024)

Enhancing Open-Vocabulary Object Detection through Multi-Level Fine-Grained Visual-Language Alignment
by: Zhang, Tianyi, et al.
Published: (2026)

Semantic Alignment in Hyperbolic Space for Open-Vocabulary Semantic Segmentation
by: Truong, Hoang M., et al.
Published: (2026)

Denoise and Align: Diffusion-Driven Foreground Knowledge Prompting for Open-Vocabulary Temporal Action Detection
by: Zhu, Sa, et al.
Published: (2026)