Saved in:
| Main Authors: | Madaan, Divyam, Muhunthan, Varshan, Cho, Kyunghyun, Chopra, Sumit |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.23499 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Jointly Modeling Inter- & Intra-Modality Dependencies for Multi-modal Learning
by: Madaan, Divyam, et al.
Published: (2024)
by: Madaan, Divyam, et al.
Published: (2024)
Temporal Generalization: A Reality Check
by: Madaan, Divyam, et al.
Published: (2025)
by: Madaan, Divyam, et al.
Published: (2025)
Characterizing the Predictive Impact of Modalities with Supervised Latent-Variable Modeling
by: Madaan, Divyam, et al.
Published: (2026)
by: Madaan, Divyam, et al.
Published: (2026)
HIST-AID: Leveraging Historical Patient Reports for Enhanced Multi-Modal Automatic Diagnosis
by: Huang, Haoxu, et al.
Published: (2024)
by: Huang, Haoxu, et al.
Published: (2024)
BloomVQA: Assessing Hierarchical Multi-modal Comprehension
by: Gong, Yunye, et al.
Published: (2023)
by: Gong, Yunye, et al.
Published: (2023)
MuMA-ToM: Multi-modal Multi-Agent Theory of Mind
by: Shi, Haojun, et al.
Published: (2024)
by: Shi, Haojun, et al.
Published: (2024)
What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models
by: Zhang, Letian, et al.
Published: (2023)
by: Zhang, Letian, et al.
Published: (2023)
The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs
by: Li, Hong, et al.
Published: (2024)
by: Li, Hong, et al.
Published: (2024)
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
by: Liu, Dongyang, et al.
Published: (2024)
by: Liu, Dongyang, et al.
Published: (2024)
A training regime to learn unified representations from complementary breast imaging modalities
by: Sharma, Umang, et al.
Published: (2024)
by: Sharma, Umang, et al.
Published: (2024)
GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery
by: Wang, Enguang, et al.
Published: (2024)
by: Wang, Enguang, et al.
Published: (2024)
Mitigating the Modality Gap: Few-Shot Out-of-Distribution Detection with Multi-modal Prototypes and Image Bias Estimation
by: Wang, Yimu, et al.
Published: (2025)
by: Wang, Yimu, et al.
Published: (2025)
ElectroVizQA: How well do Multi-modal LLMs perform in Electronics Visual Question Answering?
by: Meshram, Pragati Shuddhodhan, et al.
Published: (2024)
by: Meshram, Pragati Shuddhodhan, et al.
Published: (2024)
MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning
by: Kumar, Somnath, et al.
Published: (2024)
by: Kumar, Somnath, et al.
Published: (2024)
Concepts or Skills? Rethinking Instruction Selection for Multi-modal Models
by: Bai, Andrew, et al.
Published: (2025)
by: Bai, Andrew, et al.
Published: (2025)
Multi-level and Multi-modal Action Anticipation
by: Kim, Seulgi, et al.
Published: (2025)
by: Kim, Seulgi, et al.
Published: (2025)
MIS-ME: A Multi-modal Framework for Soil Moisture Estimation
by: Rakib, Mohammed, et al.
Published: (2024)
by: Rakib, Mohammed, et al.
Published: (2024)
Simultaneous Long-tailed Recognition and Multi-modal Fusion for Highly Imbalanced Multi-modal Data
by: Yoon, Heegeon, et al.
Published: (2026)
by: Yoon, Heegeon, et al.
Published: (2026)
Multi-modal Preference Alignment Remedies Degradation of Visual Instruction Tuning on Language Models
by: Li, Shengzhi, et al.
Published: (2024)
by: Li, Shengzhi, et al.
Published: (2024)
Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models
by: He, Hulingxiao, et al.
Published: (2025)
by: He, Hulingxiao, et al.
Published: (2025)
Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal Models
by: Kim, Donghoon, et al.
Published: (2024)
by: Kim, Donghoon, et al.
Published: (2024)
From Consistency to Complementarity: Aligned and Disentangled Multi-modal Learning for Time Series Understanding and Reasoning
by: Ni, Hang, et al.
Published: (2026)
by: Ni, Hang, et al.
Published: (2026)
Plane Geometry Problem Solving with Multi-modal Reasoning: A Survey
by: Cho, Seunghyuk, et al.
Published: (2025)
by: Cho, Seunghyuk, et al.
Published: (2025)
Learnable Cross-modal Knowledge Distillation for Multi-modal Learning with Missing Modality
by: Wang, Hu, et al.
Published: (2023)
by: Wang, Hu, et al.
Published: (2023)
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
by: Zhang, Renrui, et al.
Published: (2024)
by: Zhang, Renrui, et al.
Published: (2024)
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models
by: Zhang, Kaichen, et al.
Published: (2024)
by: Zhang, Kaichen, et al.
Published: (2024)
Towards Multi-modal Transformers in Federated Learning
by: Sun, Guangyu, et al.
Published: (2024)
by: Sun, Guangyu, et al.
Published: (2024)
Multi-modal learning for geospatial vegetation forecasting
by: Benson, Vitus, et al.
Published: (2023)
by: Benson, Vitus, et al.
Published: (2023)
Multi-modal Data Binding for Survival Analysis Modeling with Incomplete Data and Annotations
by: Qu, Linhao, et al.
Published: (2024)
by: Qu, Linhao, et al.
Published: (2024)
Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models?
by: Wang, Yanbo, et al.
Published: (2025)
by: Wang, Yanbo, et al.
Published: (2025)
Multi-modal Representation Learning for Cross-modal Prediction of Continuous Weather Patterns from Discrete Low-Dimensional Data
by: Qayyum, Alif Bin Abdul, et al.
Published: (2024)
by: Qayyum, Alif Bin Abdul, et al.
Published: (2024)
Multi-modal Semantic Understanding with Contrastive Cross-modal Feature Alignment
by: Zhang, Ming, et al.
Published: (2024)
by: Zhang, Ming, et al.
Published: (2024)
Cross-modal RAG: Sub-dimensional Text-to-Image Retrieval-Augmented Generation
by: Zhu, Mengdan, et al.
Published: (2025)
by: Zhu, Mengdan, et al.
Published: (2025)
FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering
by: Huang, Chengyue, et al.
Published: (2025)
by: Huang, Chengyue, et al.
Published: (2025)
cadrille: Multi-modal CAD Reconstruction with Reinforcement Learning
by: Kolodiazhnyi, Maksim, et al.
Published: (2025)
by: Kolodiazhnyi, Maksim, et al.
Published: (2025)
Multi-level Cross-modal Alignment for Image Clustering
by: Qiu, Liping, et al.
Published: (2024)
by: Qiu, Liping, et al.
Published: (2024)
Fairness in Multi-modal Medical Diagnosis with Demonstration Selection
by: Li, Dawei, et al.
Published: (2025)
by: Li, Dawei, et al.
Published: (2025)
Cross-modal Causal Relation Alignment for Video Question Grounding
by: Chen, Weixing, et al.
Published: (2025)
by: Chen, Weixing, et al.
Published: (2025)
Multi-modal Co-learning for Earth Observation: Enhancing single-modality models via modality collaboration
by: Mena, Francisco, et al.
Published: (2025)
by: Mena, Francisco, et al.
Published: (2025)
On the Multi-modal Vulnerability of Diffusion Models
by: Yang, Dingcheng, et al.
Published: (2024)
by: Yang, Dingcheng, et al.
Published: (2024)
Similar Items
-
Jointly Modeling Inter- & Intra-Modality Dependencies for Multi-modal Learning
by: Madaan, Divyam, et al.
Published: (2024) -
Temporal Generalization: A Reality Check
by: Madaan, Divyam, et al.
Published: (2025) -
Characterizing the Predictive Impact of Modalities with Supervised Latent-Variable Modeling
by: Madaan, Divyam, et al.
Published: (2026) -
HIST-AID: Leveraging Historical Patient Reports for Enhanced Multi-Modal Automatic Diagnosis
by: Huang, Haoxu, et al.
Published: (2024) -
BloomVQA: Assessing Hierarchical Multi-modal Comprehension
by: Gong, Yunye, et al.
Published: (2023)