Saved in:
| Main Authors: | Peng, Bo, Hu, Yuanwei, Liu, Bo, Chen, Ling, Lu, Jie, Fang, Zhen |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.09586 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Respecting Modality Gap in Post-hoc Out-of-distribution Detection with Pre-trained Vision-Language Models
by: Hu, Yuanwei, et al.
Published: (2026)
by: Hu, Yuanwei, et al.
Published: (2026)
On the Provable Importance of Gradients for Language-Assisted Image Clustering
by: Peng, Bo, et al.
Published: (2025)
by: Peng, Bo, et al.
Published: (2025)
Debiased Negative Mining Improves Out-of-distribution Detection with Pre-trained Vision-Language Models
by: Peng, Bo, et al.
Published: (2026)
by: Peng, Bo, et al.
Published: (2026)
Delving into Latent Spectral Biasing of Video VAEs for Superior Diffusability
by: Liu, Shizhan, et al.
Published: (2025)
by: Liu, Shizhan, et al.
Published: (2025)
Delving into Out-of-Distribution Detection with Medical Vision-Language Models
by: Ju, Lie, et al.
Published: (2025)
by: Ju, Lie, et al.
Published: (2025)
CELLO: Causal Evaluation of Large Vision-Language Models
by: Chen, Meiqi, et al.
Published: (2024)
by: Chen, Meiqi, et al.
Published: (2024)
Negative Label Guided OOD Detection with Pretrained Vision-Language Models
by: Jiang, Xue, et al.
Published: (2024)
by: Jiang, Xue, et al.
Published: (2024)
Delve into Visual Contrastive Decoding for Hallucination Mitigation of Large Vision-Language Models
by: Lee, Yi-Lun, et al.
Published: (2024)
by: Lee, Yi-Lun, et al.
Published: (2024)
On the Learnability of Out-of-distribution Detection
by: Fang, Zhen, et al.
Published: (2024)
by: Fang, Zhen, et al.
Published: (2024)
DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution
by: Zhao, Yuzhong, et al.
Published: (2024)
by: Zhao, Yuzhong, et al.
Published: (2024)
Multi-Token Enhancing for Vision Representation Learning
by: Li, Zhong-Yu, et al.
Published: (2024)
by: Li, Zhong-Yu, et al.
Published: (2024)
Delving Deep into Semantic Relation Distillation
by: Yan, Zhaoyi, et al.
Published: (2025)
by: Yan, Zhaoyi, et al.
Published: (2025)
MedSAM3: Delving into Segment Anything with Medical Concepts
by: Liu, Anglin, et al.
Published: (2025)
by: Liu, Anglin, et al.
Published: (2025)
FADE: Few-shot/zero-shot Anomaly Detection Engine using Large Vision-Language Model
by: Li, Yuanwei, et al.
Published: (2024)
by: Li, Yuanwei, et al.
Published: (2024)
SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situation
by: Du, Hao, et al.
Published: (2025)
by: Du, Hao, et al.
Published: (2025)
HSCP: A Two-Stage Spectral Clustering Framework for Resource-Constrained UAV Identification
by: Wang, Maoyu, et al.
Published: (2025)
by: Wang, Maoyu, et al.
Published: (2025)
Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models
by: Wang, Jiayu, et al.
Published: (2024)
by: Wang, Jiayu, et al.
Published: (2024)
Beyond Language: Grounding Referring Expressions with Hand Pointing in Egocentric Vision
by: Li, Ling, et al.
Published: (2026)
by: Li, Ling, et al.
Published: (2026)
SPEX: A Vision-Language Model for Land Cover Extraction on Spectral Remote Sensing Images
by: Si, Dongchen, et al.
Published: (2025)
by: Si, Dongchen, et al.
Published: (2025)
Delving into Mapping Uncertainty for Mapless Trajectory Prediction
by: Zhang, Zongzheng, et al.
Published: (2025)
by: Zhang, Zongzheng, et al.
Published: (2025)
CLIPSym: Delving into Symmetry Detection with CLIP
by: Yang, Tinghan, et al.
Published: (2025)
by: Yang, Tinghan, et al.
Published: (2025)
A Generative Framework for Self-Supervised Facial Representation Learning
by: He, Ruian, et al.
Published: (2023)
by: He, Ruian, et al.
Published: (2023)
Hyperspectral Image Classification via Efficient Global Spectral Supertoken Clustering
by: Liu, Peifu, et al.
Published: (2026)
by: Liu, Peifu, et al.
Published: (2026)
HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding
by: Xia, Peng, et al.
Published: (2023)
by: Xia, Peng, et al.
Published: (2023)
Beyond Visual Cues: Synchronously Exploring Target-Centric Semantics for Vision-Language Tracking
by: Ge, Jiawei, et al.
Published: (2023)
by: Ge, Jiawei, et al.
Published: (2023)
Delving into the Trajectory Long-tail Distribution for Muti-object Tracking
by: Chen, Sijia, et al.
Published: (2024)
by: Chen, Sijia, et al.
Published: (2024)
OmniEarth: A Benchmark for Evaluating Vision-Language Models in Geospatial Tasks
by: Fu, Ronghao, et al.
Published: (2026)
by: Fu, Ronghao, et al.
Published: (2026)
Look Before Acting: Enhancing Vision Foundation Representations for Vision-Language-Action Models
by: Luo, Yulin, et al.
Published: (2026)
by: Luo, Yulin, et al.
Published: (2026)
LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation
by: Wang, Jiahao, et al.
Published: (2025)
by: Wang, Jiahao, et al.
Published: (2025)
Bridging the Modality Gap in Roadside LiDAR: A Training-Free Vision-Language Model Framework for Vehicle Classification
by: Li, Yiqiao, et al.
Published: (2026)
by: Li, Yiqiao, et al.
Published: (2026)
Delving into Dark Regions for Robust Shadow Detection
by: Guan, Huankang, et al.
Published: (2024)
by: Guan, Huankang, et al.
Published: (2024)
CAS-IQA: Teaching Vision-Language Models for Synthetic Angiography Quality Assessment
by: Wang, Bo, et al.
Published: (2025)
by: Wang, Bo, et al.
Published: (2025)
GeoDiT: A Diffusion-based Vision-Language Model for Geospatial Understanding
by: Liu, Jiaqi, et al.
Published: (2025)
by: Liu, Jiaqi, et al.
Published: (2025)
Backdooring Vision-Language Models with Out-Of-Distribution Data
by: Lyu, Weimin, et al.
Published: (2024)
by: Lyu, Weimin, et al.
Published: (2024)
Dynamic Rank Adaptation for Vision-Language Models
by: Wang, Jiahui, et al.
Published: (2025)
by: Wang, Jiahui, et al.
Published: (2025)
TrojVLM: Backdoor Attack Against Vision Language Models
by: Lyu, Weimin, et al.
Published: (2024)
by: Lyu, Weimin, et al.
Published: (2024)
From Abstraction to Instantiation: Learning Behavioral Representation for Vision-Language-Action Model
by: Hu, Bing, et al.
Published: (2026)
by: Hu, Bing, et al.
Published: (2026)
SR$^{2}$-Net: A General Plug-and-Play Model for Spectral Refinement in Hyperspectral Image Super-Resolution
by: He, Ji-Xuan, et al.
Published: (2026)
by: He, Ji-Xuan, et al.
Published: (2026)
Balancing Complementarity and Consistency via Delayed Activation in Incomplete Multi-view Clustering
by: Li, Bo
Published: (2024)
by: Li, Bo
Published: (2024)
MiraGe: Multimodal Discriminative Representation Learning for Generalizable AI-Generated Image Detection
by: Shi, Kuo, et al.
Published: (2025)
by: Shi, Kuo, et al.
Published: (2025)
Similar Items
-
Respecting Modality Gap in Post-hoc Out-of-distribution Detection with Pre-trained Vision-Language Models
by: Hu, Yuanwei, et al.
Published: (2026) -
On the Provable Importance of Gradients for Language-Assisted Image Clustering
by: Peng, Bo, et al.
Published: (2025) -
Debiased Negative Mining Improves Out-of-distribution Detection with Pre-trained Vision-Language Models
by: Peng, Bo, et al.
Published: (2026) -
Delving into Latent Spectral Biasing of Video VAEs for Superior Diffusability
by: Liu, Shizhan, et al.
Published: (2025) -
Delving into Out-of-Distribution Detection with Medical Vision-Language Models
by: Ju, Lie, et al.
Published: (2025)