:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ismithdeen, Mohamed Insaf, Khattak, Muhammad Uzair, Khan, Salman
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2509.03986
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

AIN: The Arabic INclusive Large Multimodal Model
by: Heakl, Ahmed, et al.
Published: (2025)

Compositional Chain-of-Thought Prompting for Large Multimodal Models
by: Mitra, Chancharik, et al.
Published: (2023)

ORAL: Prompting Your Large-Scale LoRAs via Conditional Recurrent Diffusion
by: Khan, Rana Muhammad Shahroz, et al.
Published: (2025)

Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge
by: Lin, Yuanze, et al.
Published: (2024)

WorldCache: Content-Aware Caching for Accelerated Video World Models
by: Nawaz, Umair, et al.
Published: (2026)

Unaligning Everything: Or Aligning Any Text to Any Image in Multimodal Models
by: Salman, Shaeke, et al.
Published: (2024)

ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
by: Cai, Mu, et al.
Published: (2023)

By My Eyes: Grounding Multimodal Large Language Models with Sensor Data via Visual Prompting
by: Yoon, Hyungjun, et al.
Published: (2024)

A Multimodal Memes Classification: A Survey and Open Research Issues
by: Afridi, Tariq Habib, et al.
Published: (2020)

A Survey on Multimodal Large Language Models
by: Yin, Shukang, et al.
Published: (2023)

Anomaly-Aware Vision-Language Adapters for Zero-Shot Anomaly Detection
by: Aqeel, Muhammad, et al.
Published: (2026)

Visual Question Decomposition on Multimodal Large Language Models
by: Zhang, Haowei, et al.
Published: (2024)

Woodpecker: Hallucination Correction for Multimodal Large Language Models
by: Yin, Shukang, et al.
Published: (2023)

Weighted Multi-Prompt Learning with Description-free Large Language Model Distillation
by: Lee, Sua, et al.
Published: (2025)

Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models
by: Wang, Hengyi, et al.
Published: (2024)

Many-Shot In-Context Learning in Multimodal Foundation Models
by: Jiang, Yixing, et al.
Published: (2024)

Towards Visual Text Grounding of Multimodal Large Language Model
by: Li, Ming, et al.
Published: (2025)

Ovis: Structural Embedding Alignment for Multimodal Large Language Model
by: Lu, Shiyin, et al.
Published: (2024)

MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
by: Yu, Weihao, et al.
Published: (2023)

A Concept-Based Explainability Framework for Large Multimodal Models
by: Parekh, Jayneel, et al.
Published: (2024)

Prompting Medical Large Vision-Language Models to Diagnose Pathologies by Visual Question Answering
by: Guo, Danfeng, et al.
Published: (2024)

MoPE: Mixture of Prompt Experts for Parameter-Efficient and Scalable Multimodal Fusion
by: Jiang, Ruixiang, et al.
Published: (2024)

Benchmarking the Thinking Mode of Multimodal Large Language Models in Clinical Tasks
by: Hong, Jindong, et al.
Published: (2025)

Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models
by: Miyai, Atsuyuki, et al.
Published: (2024)

MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
by: Liu, Jihao, et al.
Published: (2024)

Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
by: Ma, Chuofan, et al.
Published: (2024)

mDPO: Conditional Preference Optimization for Multimodal Large Language Models
by: Wang, Fei, et al.
Published: (2024)

Adapting Large Multimodal Models to Distribution Shifts: The Role of In-Context Learning
by: Zhou, Guanglin, et al.
Published: (2024)

Modality-Balancing Preference Optimization of Large Multimodal Models by Adversarial Negative Mining
by: Liu, Chenxi, et al.
Published: (2025)

Robust Adaptation of Large Multimodal Models for Retrieval Augmented Hateful Meme Detection
by: Mei, Jingbiao, et al.
Published: (2025)

MapIQ: Evaluating Multimodal Large Language Models for Map Question Answering
by: Srivastava, Varun, et al.
Published: (2025)

Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
by: Huang, Wenxuan, et al.
Published: (2025)

Uncertainty Quantification for Multimodal Large Language Models with Incoherence-adjusted Semantic Volume
by: Lau, Gregory Kang Ruey, et al.
Published: (2026)

KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models
by: Yiu, Eunice, et al.
Published: (2024)

SCITUNE: Aligning Large Language Models with Human-Curated Scientific Multimodal Instructions
by: Horawalavithana, Sameera, et al.
Published: (2023)

Efficient Self-Improvement in Multimodal Large Language Models: A Model-Level Judge-Free Approach
by: Deng, Shijian, et al.
Published: (2024)

ExpVG: Investigating the Design Space of Visual Grounding in Multimodal Large Language Model
by: Kang, Weitai, et al.
Published: (2025)

Diagnosing Shoulder Disorders Using Multimodal Large Language Models and Consumer-Grade Cameras
by: Hong, Jindong, et al.
Published: (2025)

Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models
by: Chen, Zijun, et al.
Published: (2024)

MultiTrust: A Comprehensive Benchmark Towards Trustworthy Multimodal Large Language Models
by: Zhang, Yichi, et al.
Published: (2024)