Saved in:
| Main Authors: | Alsinglawi, Belal, McCarthy, Chris, Webb, Sara, Fluke, Christopher, Saidy, Navid Toosy |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.05575 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DBLP: Noise Bridge Consistency Distillation For Efficient And Reliable Adversarial Purification
by: Huang, Chihan, et al.
Published: (2025)
by: Huang, Chihan, et al.
Published: (2025)
Linked Adapters: Linking Past and Future to Present for Effective Continual Learning
by: Chandra, Dupati Srikar, et al.
Published: (2024)
by: Chandra, Dupati Srikar, et al.
Published: (2024)
SteerVLM: Robust Model Control through Lightweight Activation Steering for Vision Language Models
by: Sivakumar, Anushka, et al.
Published: (2025)
by: Sivakumar, Anushka, et al.
Published: (2025)
Beyond ImageNet: Understanding Cross-Dataset Robustness of Lightweight Vision Models
by: Zhang, Weidong, et al.
Published: (2025)
by: Zhang, Weidong, et al.
Published: (2025)
Hallucination-Aware Multimodal Benchmark for Gastrointestinal Image Analysis with Large Vision-Language Models
by: Khanal, Bidur, et al.
Published: (2025)
by: Khanal, Bidur, et al.
Published: (2025)
Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models
by: Rajabi, Navid, et al.
Published: (2023)
by: Rajabi, Navid, et al.
Published: (2023)
Q-GroundCAM: Quantifying Grounding in Vision Language Models via GradCAM
by: Rajabi, Navid, et al.
Published: (2024)
by: Rajabi, Navid, et al.
Published: (2024)
Linear Alignment of Vision-language Models for Image Captioning
by: Paischer, Fabian, et al.
Published: (2023)
by: Paischer, Fabian, et al.
Published: (2023)
A Lightweight Neural Architecture Search Model for Medical Image Classification
by: Xie, Lunchen, et al.
Published: (2024)
by: Xie, Lunchen, et al.
Published: (2024)
Improving Position Encoding of Transformers for Multivariate Time Series Classification
by: Foumani, Navid Mohammadi, et al.
Published: (2023)
by: Foumani, Navid Mohammadi, et al.
Published: (2023)
A Scalable Machine Learning Pipeline for Building Footprint Detection in Historical Maps
by: McCarthy, Annemarie
Published: (2025)
by: McCarthy, Annemarie
Published: (2025)
Patent Figure Classification using Large Vision-language Models
by: Awale, Sushil, et al.
Published: (2025)
by: Awale, Sushil, et al.
Published: (2025)
neuralCAD-Edit: An Expert Benchmark for Multimodal-Instructed 3D CAD Model Editing
by: Perrett, Toby, et al.
Published: (2026)
by: Perrett, Toby, et al.
Published: (2026)
Pre-training of Lightweight Vision Transformers on Small Datasets with Minimally Scaled Images
by: Tan, Jen Hong
Published: (2024)
by: Tan, Jen Hong
Published: (2024)
Do Vision Foundation Models Enhance Domain Generalization in Medical Image Segmentation?
by: Cekmeceli, Kerem, et al.
Published: (2024)
by: Cekmeceli, Kerem, et al.
Published: (2024)
GSR-BENCH: A Benchmark for Grounded Spatial Reasoning Evaluation via Multimodal LLMs
by: Rajabi, Navid, et al.
Published: (2024)
by: Rajabi, Navid, et al.
Published: (2024)
A Lightweight Medical Image Classification Framework via Self-Supervised Contrastive Learning and Quantum-Enhanced Feature Modeling
by: Xia, Jingsong, et al.
Published: (2026)
by: Xia, Jingsong, et al.
Published: (2026)
ImageNet-Think-250K: A Large-Scale Synthetic Dataset for Multimodal Reasoning for Vision Language Models
by: Chitty-Venkata, Krishna Teja, et al.
Published: (2025)
by: Chitty-Venkata, Krishna Teja, et al.
Published: (2025)
Uncertainty-Aware Vision-Language Segmentation for Medical Imaging
by: Das, Aryan, et al.
Published: (2026)
by: Das, Aryan, et al.
Published: (2026)
LightMedSeg: Lightweight 3D Medical Image Segmentation with Learned Spatial Anchors
by: Tyagi, Kavyansh, et al.
Published: (2026)
by: Tyagi, Kavyansh, et al.
Published: (2026)
Scaling Parallel Sequence Models to Foundation-Scale Vision Encoders
by: Jiang, Yitong, et al.
Published: (2026)
by: Jiang, Yitong, et al.
Published: (2026)
Visual Modality Prompt for Adapting Vision-Language Object Detectors
by: Medeiros, Heitor R., et al.
Published: (2024)
by: Medeiros, Heitor R., et al.
Published: (2024)
Lost in the Hype: Revealing and Dissecting the Performance Degradation of Medical Multimodal Large Language Models in Image Classification
by: Zhu, Xun, et al.
Published: (2026)
by: Zhu, Xun, et al.
Published: (2026)
Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification
by: Huang, Wenxuan, et al.
Published: (2024)
by: Huang, Wenxuan, et al.
Published: (2024)
MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models
by: Xia, Peng, et al.
Published: (2024)
by: Xia, Peng, et al.
Published: (2024)
Hierarchical Safety Realignment: Lightweight Restoration of Safety in Pruned Large Vision-Language Models
by: Li, Yue, et al.
Published: (2025)
by: Li, Yue, et al.
Published: (2025)
Multimodal Autoregressive Pre-training of Large Vision Encoders
by: Fini, Enrico, et al.
Published: (2024)
by: Fini, Enrico, et al.
Published: (2024)
Boosting Vision Semantic Density with Anatomy Normality Modeling for Medical Vision-language Pre-training
by: Cao, Weiwei, et al.
Published: (2025)
by: Cao, Weiwei, et al.
Published: (2025)
Calibrated and Robust Foundation Models for Vision-Language and Medical Image Tasks Under Distribution Shift
by: Khan, Behraj, et al.
Published: (2025)
by: Khan, Behraj, et al.
Published: (2025)
IMPACT: A Generic Semantic Loss for Multimodal Medical Image Registration
by: Boussot, Valentin, et al.
Published: (2025)
by: Boussot, Valentin, et al.
Published: (2025)
An Efficient Medical Image Classification Method Based on a Lightweight Improved ConvNeXt-Tiny Architecture
by: Xia, Jingsong, et al.
Published: (2025)
by: Xia, Jingsong, et al.
Published: (2025)
HamVision: Hamiltonian Dynamics as Inductive Bias for Medical Image Analysis
by: Mabrok, Mohamed A
Published: (2026)
by: Mabrok, Mohamed A
Published: (2026)
Medical Vision Language Models as Policies for Robotic Surgery
by: Muppidi, Akshay, et al.
Published: (2025)
by: Muppidi, Akshay, et al.
Published: (2025)
Bridging Compressed Image Latents and Multimodal Large Language Models
by: Kao, Chia-Hao, et al.
Published: (2024)
by: Kao, Chia-Hao, et al.
Published: (2024)
ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection
by: Sun, Zhihao, et al.
Published: (2024)
by: Sun, Zhihao, et al.
Published: (2024)
LoG-VMamba: Local-Global Vision Mamba for Medical Image Segmentation
by: Dang, Trung Dinh Quoc, et al.
Published: (2024)
by: Dang, Trung Dinh Quoc, et al.
Published: (2024)
A Simple Data Augmentation Strategy for Text-in-Image Scientific VQA
by: Shoer, Belal, et al.
Published: (2025)
by: Shoer, Belal, et al.
Published: (2025)
Towards Large-scale Chemical Reaction Image Parsing via a Multimodal Large Language Model
by: Chen, Yufan, et al.
Published: (2025)
by: Chen, Yufan, et al.
Published: (2025)
Multitask Multimodal Self-Supervised Learning for Medical Images
by: Simionescu, Cristian
Published: (2025)
by: Simionescu, Cristian
Published: (2025)
Lightweight Unsupervised Federated Learning with Pretrained Vision Language Model
by: Yan, Hao, et al.
Published: (2024)
by: Yan, Hao, et al.
Published: (2024)
Similar Items
-
DBLP: Noise Bridge Consistency Distillation For Efficient And Reliable Adversarial Purification
by: Huang, Chihan, et al.
Published: (2025) -
Linked Adapters: Linking Past and Future to Present for Effective Continual Learning
by: Chandra, Dupati Srikar, et al.
Published: (2024) -
SteerVLM: Robust Model Control through Lightweight Activation Steering for Vision Language Models
by: Sivakumar, Anushka, et al.
Published: (2025) -
Beyond ImageNet: Understanding Cross-Dataset Robustness of Lightweight Vision Models
by: Zhang, Weidong, et al.
Published: (2025) -
Hallucination-Aware Multimodal Benchmark for Gastrointestinal Image Analysis with Large Vision-Language Models
by: Khanal, Bidur, et al.
Published: (2025)