:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Madaan, Divyam, Muhunthan, Varshan, Cho, Kyunghyun, Chopra, Sumit
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2509.23499
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Jointly Modeling Inter- & Intra-Modality Dependencies for Multi-modal Learning
by: Madaan, Divyam, et al.
Published: (2024)

Temporal Generalization: A Reality Check
by: Madaan, Divyam, et al.
Published: (2025)

Characterizing the Predictive Impact of Modalities with Supervised Latent-Variable Modeling
by: Madaan, Divyam, et al.
Published: (2026)

HIST-AID: Leveraging Historical Patient Reports for Enhanced Multi-Modal Automatic Diagnosis
by: Huang, Haoxu, et al.
Published: (2024)

BloomVQA: Assessing Hierarchical Multi-modal Comprehension
by: Gong, Yunye, et al.
Published: (2023)

MuMA-ToM: Multi-modal Multi-Agent Theory of Mind
by: Shi, Haojun, et al.
Published: (2024)

What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models
by: Zhang, Letian, et al.
Published: (2023)

The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs
by: Li, Hong, et al.
Published: (2024)

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
by: Liu, Dongyang, et al.
Published: (2024)

A training regime to learn unified representations from complementary breast imaging modalities
by: Sharma, Umang, et al.
Published: (2024)

GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery
by: Wang, Enguang, et al.
Published: (2024)

Mitigating the Modality Gap: Few-Shot Out-of-Distribution Detection with Multi-modal Prototypes and Image Bias Estimation
by: Wang, Yimu, et al.
Published: (2025)

ElectroVizQA: How well do Multi-modal LLMs perform in Electronics Visual Question Answering?
by: Meshram, Pragati Shuddhodhan, et al.
Published: (2024)

MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning
by: Kumar, Somnath, et al.
Published: (2024)

Concepts or Skills? Rethinking Instruction Selection for Multi-modal Models
by: Bai, Andrew, et al.
Published: (2025)

Multi-level and Multi-modal Action Anticipation
by: Kim, Seulgi, et al.
Published: (2025)

MIS-ME: A Multi-modal Framework for Soil Moisture Estimation
by: Rakib, Mohammed, et al.
Published: (2024)

Simultaneous Long-tailed Recognition and Multi-modal Fusion for Highly Imbalanced Multi-modal Data
by: Yoon, Heegeon, et al.
Published: (2026)

Multi-modal Preference Alignment Remedies Degradation of Visual Instruction Tuning on Language Models
by: Li, Shengzhi, et al.
Published: (2024)

Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models
by: He, Hulingxiao, et al.
Published: (2025)

Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal Models
by: Kim, Donghoon, et al.
Published: (2024)

From Consistency to Complementarity: Aligned and Disentangled Multi-modal Learning for Time Series Understanding and Reasoning
by: Ni, Hang, et al.
Published: (2026)

Plane Geometry Problem Solving with Multi-modal Reasoning: A Survey
by: Cho, Seunghyuk, et al.
Published: (2025)

Learnable Cross-modal Knowledge Distillation for Multi-modal Learning with Missing Modality
by: Wang, Hu, et al.
Published: (2023)

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
by: Zhang, Renrui, et al.
Published: (2024)

Large Multi-modal Models Can Interpret Features in Large Multi-modal Models
by: Zhang, Kaichen, et al.
Published: (2024)

Towards Multi-modal Transformers in Federated Learning
by: Sun, Guangyu, et al.
Published: (2024)

Multi-modal learning for geospatial vegetation forecasting
by: Benson, Vitus, et al.
Published: (2023)

Multi-modal Data Binding for Survival Analysis Modeling with Incomplete Data and Annotations
by: Qu, Linhao, et al.
Published: (2024)

Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models?
by: Wang, Yanbo, et al.
Published: (2025)

Multi-modal Representation Learning for Cross-modal Prediction of Continuous Weather Patterns from Discrete Low-Dimensional Data
by: Qayyum, Alif Bin Abdul, et al.
Published: (2024)

Multi-modal Semantic Understanding with Contrastive Cross-modal Feature Alignment
by: Zhang, Ming, et al.
Published: (2024)

Cross-modal RAG: Sub-dimensional Text-to-Image Retrieval-Augmented Generation
by: Zhu, Mengdan, et al.
Published: (2025)

FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering
by: Huang, Chengyue, et al.
Published: (2025)

cadrille: Multi-modal CAD Reconstruction with Reinforcement Learning
by: Kolodiazhnyi, Maksim, et al.
Published: (2025)

Multi-level Cross-modal Alignment for Image Clustering
by: Qiu, Liping, et al.
Published: (2024)

Fairness in Multi-modal Medical Diagnosis with Demonstration Selection
by: Li, Dawei, et al.
Published: (2025)

Cross-modal Causal Relation Alignment for Video Question Grounding
by: Chen, Weixing, et al.
Published: (2025)

Multi-modal Co-learning for Earth Observation: Enhancing single-modality models via modality collaboration
by: Mena, Francisco, et al.
Published: (2025)

On the Multi-modal Vulnerability of Diffusion Models
by: Yang, Dingcheng, et al.
Published: (2024)