Saved in:
| Main Authors: | Ismithdeen, Mohamed Insaf, Khattak, Muhammad Uzair, Khan, Salman |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.03986 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AIN: The Arabic INclusive Large Multimodal Model
by: Heakl, Ahmed, et al.
Published: (2025)
by: Heakl, Ahmed, et al.
Published: (2025)
Compositional Chain-of-Thought Prompting for Large Multimodal Models
by: Mitra, Chancharik, et al.
Published: (2023)
by: Mitra, Chancharik, et al.
Published: (2023)
ORAL: Prompting Your Large-Scale LoRAs via Conditional Recurrent Diffusion
by: Khan, Rana Muhammad Shahroz, et al.
Published: (2025)
by: Khan, Rana Muhammad Shahroz, et al.
Published: (2025)
Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge
by: Lin, Yuanze, et al.
Published: (2024)
by: Lin, Yuanze, et al.
Published: (2024)
WorldCache: Content-Aware Caching for Accelerated Video World Models
by: Nawaz, Umair, et al.
Published: (2026)
by: Nawaz, Umair, et al.
Published: (2026)
Unaligning Everything: Or Aligning Any Text to Any Image in Multimodal Models
by: Salman, Shaeke, et al.
Published: (2024)
by: Salman, Shaeke, et al.
Published: (2024)
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
by: Cai, Mu, et al.
Published: (2023)
by: Cai, Mu, et al.
Published: (2023)
By My Eyes: Grounding Multimodal Large Language Models with Sensor Data via Visual Prompting
by: Yoon, Hyungjun, et al.
Published: (2024)
by: Yoon, Hyungjun, et al.
Published: (2024)
A Multimodal Memes Classification: A Survey and Open Research Issues
by: Afridi, Tariq Habib, et al.
Published: (2020)
by: Afridi, Tariq Habib, et al.
Published: (2020)
A Survey on Multimodal Large Language Models
by: Yin, Shukang, et al.
Published: (2023)
by: Yin, Shukang, et al.
Published: (2023)
Anomaly-Aware Vision-Language Adapters for Zero-Shot Anomaly Detection
by: Aqeel, Muhammad, et al.
Published: (2026)
by: Aqeel, Muhammad, et al.
Published: (2026)
Visual Question Decomposition on Multimodal Large Language Models
by: Zhang, Haowei, et al.
Published: (2024)
by: Zhang, Haowei, et al.
Published: (2024)
Woodpecker: Hallucination Correction for Multimodal Large Language Models
by: Yin, Shukang, et al.
Published: (2023)
by: Yin, Shukang, et al.
Published: (2023)
Weighted Multi-Prompt Learning with Description-free Large Language Model Distillation
by: Lee, Sua, et al.
Published: (2025)
by: Lee, Sua, et al.
Published: (2025)
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models
by: Wang, Hengyi, et al.
Published: (2024)
by: Wang, Hengyi, et al.
Published: (2024)
Many-Shot In-Context Learning in Multimodal Foundation Models
by: Jiang, Yixing, et al.
Published: (2024)
by: Jiang, Yixing, et al.
Published: (2024)
Towards Visual Text Grounding of Multimodal Large Language Model
by: Li, Ming, et al.
Published: (2025)
by: Li, Ming, et al.
Published: (2025)
Ovis: Structural Embedding Alignment for Multimodal Large Language Model
by: Lu, Shiyin, et al.
Published: (2024)
by: Lu, Shiyin, et al.
Published: (2024)
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
by: Yu, Weihao, et al.
Published: (2023)
by: Yu, Weihao, et al.
Published: (2023)
A Concept-Based Explainability Framework for Large Multimodal Models
by: Parekh, Jayneel, et al.
Published: (2024)
by: Parekh, Jayneel, et al.
Published: (2024)
Prompting Medical Large Vision-Language Models to Diagnose Pathologies by Visual Question Answering
by: Guo, Danfeng, et al.
Published: (2024)
by: Guo, Danfeng, et al.
Published: (2024)
MoPE: Mixture of Prompt Experts for Parameter-Efficient and Scalable Multimodal Fusion
by: Jiang, Ruixiang, et al.
Published: (2024)
by: Jiang, Ruixiang, et al.
Published: (2024)
Benchmarking the Thinking Mode of Multimodal Large Language Models in Clinical Tasks
by: Hong, Jindong, et al.
Published: (2025)
by: Hong, Jindong, et al.
Published: (2025)
Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models
by: Miyai, Atsuyuki, et al.
Published: (2024)
by: Miyai, Atsuyuki, et al.
Published: (2024)
MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
by: Liu, Jihao, et al.
Published: (2024)
by: Liu, Jihao, et al.
Published: (2024)
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
by: Ma, Chuofan, et al.
Published: (2024)
by: Ma, Chuofan, et al.
Published: (2024)
mDPO: Conditional Preference Optimization for Multimodal Large Language Models
by: Wang, Fei, et al.
Published: (2024)
by: Wang, Fei, et al.
Published: (2024)
Adapting Large Multimodal Models to Distribution Shifts: The Role of In-Context Learning
by: Zhou, Guanglin, et al.
Published: (2024)
by: Zhou, Guanglin, et al.
Published: (2024)
Modality-Balancing Preference Optimization of Large Multimodal Models by Adversarial Negative Mining
by: Liu, Chenxi, et al.
Published: (2025)
by: Liu, Chenxi, et al.
Published: (2025)
Robust Adaptation of Large Multimodal Models for Retrieval Augmented Hateful Meme Detection
by: Mei, Jingbiao, et al.
Published: (2025)
by: Mei, Jingbiao, et al.
Published: (2025)
MapIQ: Evaluating Multimodal Large Language Models for Map Question Answering
by: Srivastava, Varun, et al.
Published: (2025)
by: Srivastava, Varun, et al.
Published: (2025)
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
by: Huang, Wenxuan, et al.
Published: (2025)
by: Huang, Wenxuan, et al.
Published: (2025)
Uncertainty Quantification for Multimodal Large Language Models with Incoherence-adjusted Semantic Volume
by: Lau, Gregory Kang Ruey, et al.
Published: (2026)
by: Lau, Gregory Kang Ruey, et al.
Published: (2026)
KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models
by: Yiu, Eunice, et al.
Published: (2024)
by: Yiu, Eunice, et al.
Published: (2024)
SCITUNE: Aligning Large Language Models with Human-Curated Scientific Multimodal Instructions
by: Horawalavithana, Sameera, et al.
Published: (2023)
by: Horawalavithana, Sameera, et al.
Published: (2023)
Efficient Self-Improvement in Multimodal Large Language Models: A Model-Level Judge-Free Approach
by: Deng, Shijian, et al.
Published: (2024)
by: Deng, Shijian, et al.
Published: (2024)
ExpVG: Investigating the Design Space of Visual Grounding in Multimodal Large Language Model
by: Kang, Weitai, et al.
Published: (2025)
by: Kang, Weitai, et al.
Published: (2025)
Diagnosing Shoulder Disorders Using Multimodal Large Language Models and Consumer-Grade Cameras
by: Hong, Jindong, et al.
Published: (2025)
by: Hong, Jindong, et al.
Published: (2025)
Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models
by: Chen, Zijun, et al.
Published: (2024)
by: Chen, Zijun, et al.
Published: (2024)
MultiTrust: A Comprehensive Benchmark Towards Trustworthy Multimodal Large Language Models
by: Zhang, Yichi, et al.
Published: (2024)
by: Zhang, Yichi, et al.
Published: (2024)
Similar Items
-
AIN: The Arabic INclusive Large Multimodal Model
by: Heakl, Ahmed, et al.
Published: (2025) -
Compositional Chain-of-Thought Prompting for Large Multimodal Models
by: Mitra, Chancharik, et al.
Published: (2023) -
ORAL: Prompting Your Large-Scale LoRAs via Conditional Recurrent Diffusion
by: Khan, Rana Muhammad Shahroz, et al.
Published: (2025) -
Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge
by: Lin, Yuanze, et al.
Published: (2024) -
WorldCache: Content-Aware Caching for Accelerated Video World Models
by: Nawaz, Umair, et al.
Published: (2026)