Saved in:
| Main Authors: | Yang, Yi, Zhang, Qingwen, Ikemura, Kei, Batool, Nazre, Folkesson, John |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.20991 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AutoScale: Linear Scalarization Guided by Multi-Task Optimization Metrics
by: Yang, Yi, et al.
Published: (2025)
by: Yang, Yi, et al.
Published: (2025)
Score-Based Multibeam Point Cloud Denoising
by: Ling, Li, et al.
Published: (2024)
by: Ling, Li, et al.
Published: (2024)
Exploring Aleatoric Uncertainty in Object Detection via Vision Foundation Models
by: Cui, Peng, et al.
Published: (2024)
by: Cui, Peng, et al.
Published: (2024)
Vision Foundation Model Embedding-Based Semantic Anomaly Detection
by: Ronecker, Max Peter, et al.
Published: (2025)
by: Ronecker, Max Peter, et al.
Published: (2025)
Visual Perturbation and Adaptive Hard Negative Contrastive Learning for Compositional Reasoning in Vision-Language Models
by: Huang, Xin, et al.
Published: (2025)
by: Huang, Xin, et al.
Published: (2025)
Towards Long-Range 3D Object Detection for Autonomous Vehicles
by: Khoche, Ajinkya, et al.
Published: (2023)
by: Khoche, Ajinkya, et al.
Published: (2023)
Sherlock: Self-Correcting Reasoning in Vision-Language Models
by: Ding, Yi, et al.
Published: (2025)
by: Ding, Yi, et al.
Published: (2025)
RadFlag: A Black-Box Hallucination Detection Method for Medical Vision Language Models
by: Zhang, Serena, et al.
Published: (2024)
by: Zhang, Serena, et al.
Published: (2024)
FLAVARS: A Multimodal Foundational Language and Vision Alignment Model for Remote Sensing
by: Corley, Isaac, et al.
Published: (2025)
by: Corley, Isaac, et al.
Published: (2025)
CellVTA: Enhancing Vision Foundation Models for Accurate Cell Segmentation and Classification
by: Yang, Yang, et al.
Published: (2025)
by: Yang, Yang, et al.
Published: (2025)
Detecting and Preventing Hallucinations in Large Vision Language Models
by: Gunjal, Anisha, et al.
Published: (2023)
by: Gunjal, Anisha, et al.
Published: (2023)
CFM: Language-aligned Concept Foundation Model for Vision
by: Wittenmayer, Kai, et al.
Published: (2026)
by: Wittenmayer, Kai, et al.
Published: (2026)
Being-M0.5: A Real-Time Controllable Vision-Language-Motion Model
by: Cao, Bin, et al.
Published: (2025)
by: Cao, Bin, et al.
Published: (2025)
Continuous Object State Recognition for Cooking Robots Using Pre-Trained Vision-Language Models and Black-box Optimization
by: Kawaharazuka, Kento, et al.
Published: (2024)
by: Kawaharazuka, Kento, et al.
Published: (2024)
Calibrated and Robust Foundation Models for Vision-Language and Medical Image Tasks Under Distribution Shift
by: Khan, Behraj, et al.
Published: (2025)
by: Khan, Behraj, et al.
Published: (2025)
MoFM: A Large-Scale Human Motion Foundation Model
by: Baharani, Mohammadreza, et al.
Published: (2025)
by: Baharani, Mohammadreza, et al.
Published: (2025)
Structuring GUI Elements through Vision Language Models: Towards Action Space Generation
by: Xu, Yi, et al.
Published: (2025)
by: Xu, Yi, et al.
Published: (2025)
VFM-VAE: Vision Foundation Models Can Be Good Tokenizers for Latent Diffusion Models
by: Bi, Tianci, et al.
Published: (2025)
by: Bi, Tianci, et al.
Published: (2025)
Pedestrian Intention Prediction via Vision-Language Foundation Models
by: Azarmi, Mohsen, et al.
Published: (2025)
by: Azarmi, Mohsen, et al.
Published: (2025)
ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time
by: Ding, Yi, et al.
Published: (2024)
by: Ding, Yi, et al.
Published: (2024)
Vision Foundation Models in Remote Sensing: A Survey
by: Lu, Siqi, et al.
Published: (2024)
by: Lu, Siqi, et al.
Published: (2024)
Revisiting Active Learning in the Era of Vision Foundation Models
by: Gupte, Sanket Rajan, et al.
Published: (2024)
by: Gupte, Sanket Rajan, et al.
Published: (2024)
Federated Learning from Vision-Language Foundation Models: Theoretical Analysis and Method
by: Pan, Bikang, et al.
Published: (2024)
by: Pan, Bikang, et al.
Published: (2024)
Right Predictions, Misleading Explanations: On the Vulnerability of Vision-Language Model Explanations
by: Babadi, Narges, et al.
Published: (2026)
by: Babadi, Narges, et al.
Published: (2026)
HandsOnVLM: Vision-Language Models for Hand-Object Interaction Prediction
by: Bao, Chen, et al.
Published: (2024)
by: Bao, Chen, et al.
Published: (2024)
Debiased Negative Mining Improves Out-of-distribution Detection with Pre-trained Vision-Language Models
by: Peng, Bo, et al.
Published: (2026)
by: Peng, Bo, et al.
Published: (2026)
Training-Free Test-Time Adaptation with Brownian Distance Covariance in Vision-Language Models
by: Zhang, Yi, et al.
Published: (2026)
by: Zhang, Yi, et al.
Published: (2026)
Negative Label Guided OOD Detection with Pretrained Vision-Language Models
by: Jiang, Xue, et al.
Published: (2024)
by: Jiang, Xue, et al.
Published: (2024)
Bridging Vision and Language Spaces with Assignment Prediction
by: Park, Jungin, et al.
Published: (2024)
by: Park, Jungin, et al.
Published: (2024)
Learning Self-Correction in Vision-Language Models via Rollout Augmentation
by: Ding, Yi, et al.
Published: (2026)
by: Ding, Yi, et al.
Published: (2026)
TLDR: Token-Level Detective Reward Model for Large Vision Language Models
by: Fu, Deqing, et al.
Published: (2024)
by: Fu, Deqing, et al.
Published: (2024)
Scaling Parallel Sequence Models to Foundation-Scale Vision Encoders
by: Jiang, Yitong, et al.
Published: (2026)
by: Jiang, Yitong, et al.
Published: (2026)
You Never Know: Quantization Induces Inconsistent Biases in Vision-Language Foundation Models
by: Slyman, Eric, et al.
Published: (2024)
by: Slyman, Eric, et al.
Published: (2024)
Bridge the Modality and Capability Gaps in Vision-Language Model Selection
by: Yi, Chao, et al.
Published: (2024)
by: Yi, Chao, et al.
Published: (2024)
AdaVFM: Adaptive Vision Foundation Models for Edge Intelligence via LLM-Guided Execution
by: Zhao, Yiwei, et al.
Published: (2026)
by: Zhao, Yiwei, et al.
Published: (2026)
Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models
by: Zhou, Andy, et al.
Published: (2023)
by: Zhou, Andy, et al.
Published: (2023)
Post-pre-training for Modality Alignment in Vision-Language Foundation Models
by: Yamaguchi, Shin'ya, et al.
Published: (2025)
by: Yamaguchi, Shin'ya, et al.
Published: (2025)
Neutral-Reference Prompting for Vision-Language Models
by: Tian, Senmao, et al.
Published: (2026)
by: Tian, Senmao, et al.
Published: (2026)
HaloProbe: Bayesian Detection and Mitigation of Object Hallucinations in Vision-Language Models
by: Zohrabi, Reihaneh, et al.
Published: (2026)
by: Zohrabi, Reihaneh, et al.
Published: (2026)
Spurious Feature Eraser: Stabilizing Test-Time Adaptation for Vision-Language Foundation Model
by: Ma, Huan, et al.
Published: (2024)
by: Ma, Huan, et al.
Published: (2024)
Similar Items
-
AutoScale: Linear Scalarization Guided by Multi-Task Optimization Metrics
by: Yang, Yi, et al.
Published: (2025) -
Score-Based Multibeam Point Cloud Denoising
by: Ling, Li, et al.
Published: (2024) -
Exploring Aleatoric Uncertainty in Object Detection via Vision Foundation Models
by: Cui, Peng, et al.
Published: (2024) -
Vision Foundation Model Embedding-Based Semantic Anomaly Detection
by: Ronecker, Max Peter, et al.
Published: (2025) -
Visual Perturbation and Adaptive Hard Negative Contrastive Learning for Compositional Reasoning in Vision-Language Models
by: Huang, Xin, et al.
Published: (2025)