:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Alsinglawi, Belal, McCarthy, Chris, Webb, Sara, Fluke, Christopher, Saidy, Navid Toosy
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2504.05575
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DBLP: Noise Bridge Consistency Distillation For Efficient And Reliable Adversarial Purification
by: Huang, Chihan, et al.
Published: (2025)

Linked Adapters: Linking Past and Future to Present for Effective Continual Learning
by: Chandra, Dupati Srikar, et al.
Published: (2024)

SteerVLM: Robust Model Control through Lightweight Activation Steering for Vision Language Models
by: Sivakumar, Anushka, et al.
Published: (2025)

Beyond ImageNet: Understanding Cross-Dataset Robustness of Lightweight Vision Models
by: Zhang, Weidong, et al.
Published: (2025)

Hallucination-Aware Multimodal Benchmark for Gastrointestinal Image Analysis with Large Vision-Language Models
by: Khanal, Bidur, et al.
Published: (2025)

Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models
by: Rajabi, Navid, et al.
Published: (2023)

Q-GroundCAM: Quantifying Grounding in Vision Language Models via GradCAM
by: Rajabi, Navid, et al.
Published: (2024)

Linear Alignment of Vision-language Models for Image Captioning
by: Paischer, Fabian, et al.
Published: (2023)

A Lightweight Neural Architecture Search Model for Medical Image Classification
by: Xie, Lunchen, et al.
Published: (2024)

Improving Position Encoding of Transformers for Multivariate Time Series Classification
by: Foumani, Navid Mohammadi, et al.
Published: (2023)

A Scalable Machine Learning Pipeline for Building Footprint Detection in Historical Maps
by: McCarthy, Annemarie
Published: (2025)

Patent Figure Classification using Large Vision-language Models
by: Awale, Sushil, et al.
Published: (2025)

neuralCAD-Edit: An Expert Benchmark for Multimodal-Instructed 3D CAD Model Editing
by: Perrett, Toby, et al.
Published: (2026)

Pre-training of Lightweight Vision Transformers on Small Datasets with Minimally Scaled Images
by: Tan, Jen Hong
Published: (2024)

Do Vision Foundation Models Enhance Domain Generalization in Medical Image Segmentation?
by: Cekmeceli, Kerem, et al.
Published: (2024)

GSR-BENCH: A Benchmark for Grounded Spatial Reasoning Evaluation via Multimodal LLMs
by: Rajabi, Navid, et al.
Published: (2024)

A Lightweight Medical Image Classification Framework via Self-Supervised Contrastive Learning and Quantum-Enhanced Feature Modeling
by: Xia, Jingsong, et al.
Published: (2026)

ImageNet-Think-250K: A Large-Scale Synthetic Dataset for Multimodal Reasoning for Vision Language Models
by: Chitty-Venkata, Krishna Teja, et al.
Published: (2025)

Uncertainty-Aware Vision-Language Segmentation for Medical Imaging
by: Das, Aryan, et al.
Published: (2026)

LightMedSeg: Lightweight 3D Medical Image Segmentation with Learned Spatial Anchors
by: Tyagi, Kavyansh, et al.
Published: (2026)

Scaling Parallel Sequence Models to Foundation-Scale Vision Encoders
by: Jiang, Yitong, et al.
Published: (2026)

Visual Modality Prompt for Adapting Vision-Language Object Detectors
by: Medeiros, Heitor R., et al.
Published: (2024)

Lost in the Hype: Revealing and Dissecting the Performance Degradation of Medical Multimodal Large Language Models in Image Classification
by: Zhu, Xun, et al.
Published: (2026)

Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification
by: Huang, Wenxuan, et al.
Published: (2024)

MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models
by: Xia, Peng, et al.
Published: (2024)

Hierarchical Safety Realignment: Lightweight Restoration of Safety in Pruned Large Vision-Language Models
by: Li, Yue, et al.
Published: (2025)

Multimodal Autoregressive Pre-training of Large Vision Encoders
by: Fini, Enrico, et al.
Published: (2024)

Boosting Vision Semantic Density with Anatomy Normality Modeling for Medical Vision-language Pre-training
by: Cao, Weiwei, et al.
Published: (2025)

Calibrated and Robust Foundation Models for Vision-Language and Medical Image Tasks Under Distribution Shift
by: Khan, Behraj, et al.
Published: (2025)

IMPACT: A Generic Semantic Loss for Multimodal Medical Image Registration
by: Boussot, Valentin, et al.
Published: (2025)

An Efficient Medical Image Classification Method Based on a Lightweight Improved ConvNeXt-Tiny Architecture
by: Xia, Jingsong, et al.
Published: (2025)

HamVision: Hamiltonian Dynamics as Inductive Bias for Medical Image Analysis
by: Mabrok, Mohamed A
Published: (2026)

Medical Vision Language Models as Policies for Robotic Surgery
by: Muppidi, Akshay, et al.
Published: (2025)

Bridging Compressed Image Latents and Multimodal Large Language Models
by: Kao, Chia-Hao, et al.
Published: (2024)

ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection
by: Sun, Zhihao, et al.
Published: (2024)

LoG-VMamba: Local-Global Vision Mamba for Medical Image Segmentation
by: Dang, Trung Dinh Quoc, et al.
Published: (2024)

A Simple Data Augmentation Strategy for Text-in-Image Scientific VQA
by: Shoer, Belal, et al.
Published: (2025)

Towards Large-scale Chemical Reaction Image Parsing via a Multimodal Large Language Model
by: Chen, Yufan, et al.
Published: (2025)

Multitask Multimodal Self-Supervised Learning for Medical Images
by: Simionescu, Cristian
Published: (2025)

Lightweight Unsupervised Federated Learning with Pretrained Vision Language Model
by: Yan, Hao, et al.
Published: (2024)