:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yang, Yi, Zhang, Qingwen, Ikemura, Kei, Batool, Nazre, Folkesson, John
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2405.20991
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

AutoScale: Linear Scalarization Guided by Multi-Task Optimization Metrics
by: Yang, Yi, et al.
Published: (2025)

Score-Based Multibeam Point Cloud Denoising
by: Ling, Li, et al.
Published: (2024)

Exploring Aleatoric Uncertainty in Object Detection via Vision Foundation Models
by: Cui, Peng, et al.
Published: (2024)

Vision Foundation Model Embedding-Based Semantic Anomaly Detection
by: Ronecker, Max Peter, et al.
Published: (2025)

Visual Perturbation and Adaptive Hard Negative Contrastive Learning for Compositional Reasoning in Vision-Language Models
by: Huang, Xin, et al.
Published: (2025)

Towards Long-Range 3D Object Detection for Autonomous Vehicles
by: Khoche, Ajinkya, et al.
Published: (2023)

Sherlock: Self-Correcting Reasoning in Vision-Language Models
by: Ding, Yi, et al.
Published: (2025)

RadFlag: A Black-Box Hallucination Detection Method for Medical Vision Language Models
by: Zhang, Serena, et al.
Published: (2024)

FLAVARS: A Multimodal Foundational Language and Vision Alignment Model for Remote Sensing
by: Corley, Isaac, et al.
Published: (2025)

CellVTA: Enhancing Vision Foundation Models for Accurate Cell Segmentation and Classification
by: Yang, Yang, et al.
Published: (2025)

Detecting and Preventing Hallucinations in Large Vision Language Models
by: Gunjal, Anisha, et al.
Published: (2023)

CFM: Language-aligned Concept Foundation Model for Vision
by: Wittenmayer, Kai, et al.
Published: (2026)

Being-M0.5: A Real-Time Controllable Vision-Language-Motion Model
by: Cao, Bin, et al.
Published: (2025)

Continuous Object State Recognition for Cooking Robots Using Pre-Trained Vision-Language Models and Black-box Optimization
by: Kawaharazuka, Kento, et al.
Published: (2024)

Calibrated and Robust Foundation Models for Vision-Language and Medical Image Tasks Under Distribution Shift
by: Khan, Behraj, et al.
Published: (2025)

MoFM: A Large-Scale Human Motion Foundation Model
by: Baharani, Mohammadreza, et al.
Published: (2025)

Structuring GUI Elements through Vision Language Models: Towards Action Space Generation
by: Xu, Yi, et al.
Published: (2025)

VFM-VAE: Vision Foundation Models Can Be Good Tokenizers for Latent Diffusion Models
by: Bi, Tianci, et al.
Published: (2025)

Pedestrian Intention Prediction via Vision-Language Foundation Models
by: Azarmi, Mohsen, et al.
Published: (2025)

ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time
by: Ding, Yi, et al.
Published: (2024)

Vision Foundation Models in Remote Sensing: A Survey
by: Lu, Siqi, et al.
Published: (2024)

Revisiting Active Learning in the Era of Vision Foundation Models
by: Gupte, Sanket Rajan, et al.
Published: (2024)

Federated Learning from Vision-Language Foundation Models: Theoretical Analysis and Method
by: Pan, Bikang, et al.
Published: (2024)

Right Predictions, Misleading Explanations: On the Vulnerability of Vision-Language Model Explanations
by: Babadi, Narges, et al.
Published: (2026)

HandsOnVLM: Vision-Language Models for Hand-Object Interaction Prediction
by: Bao, Chen, et al.
Published: (2024)

Debiased Negative Mining Improves Out-of-distribution Detection with Pre-trained Vision-Language Models
by: Peng, Bo, et al.
Published: (2026)

Training-Free Test-Time Adaptation with Brownian Distance Covariance in Vision-Language Models
by: Zhang, Yi, et al.
Published: (2026)

Negative Label Guided OOD Detection with Pretrained Vision-Language Models
by: Jiang, Xue, et al.
Published: (2024)

Bridging Vision and Language Spaces with Assignment Prediction
by: Park, Jungin, et al.
Published: (2024)

Learning Self-Correction in Vision-Language Models via Rollout Augmentation
by: Ding, Yi, et al.
Published: (2026)

TLDR: Token-Level Detective Reward Model for Large Vision Language Models
by: Fu, Deqing, et al.
Published: (2024)

Scaling Parallel Sequence Models to Foundation-Scale Vision Encoders
by: Jiang, Yitong, et al.
Published: (2026)

You Never Know: Quantization Induces Inconsistent Biases in Vision-Language Foundation Models
by: Slyman, Eric, et al.
Published: (2024)

Bridge the Modality and Capability Gaps in Vision-Language Model Selection
by: Yi, Chao, et al.
Published: (2024)

AdaVFM: Adaptive Vision Foundation Models for Edge Intelligence via LLM-Guided Execution
by: Zhao, Yiwei, et al.
Published: (2026)

Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models
by: Zhou, Andy, et al.
Published: (2023)

Post-pre-training for Modality Alignment in Vision-Language Foundation Models
by: Yamaguchi, Shin'ya, et al.
Published: (2025)

Neutral-Reference Prompting for Vision-Language Models
by: Tian, Senmao, et al.
Published: (2026)

HaloProbe: Bayesian Detection and Mitigation of Object Hallucinations in Vision-Language Models
by: Zohrabi, Reihaneh, et al.
Published: (2026)

Spurious Feature Eraser: Stabilizing Test-Time Adaptation for Vision-Language Foundation Model
by: Ma, Huan, et al.
Published: (2024)