Saved in:
| Main Authors: | Suharitdamrong, Wish, Alex, Tony, Awais, Muhammad, Ahmed, Sara |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.03314 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
PAL: Probing Audio Encoders via LLMs -- Audio Information Transfer into LLMs
by: Alex, Tony, et al.
Published: (2025)
by: Alex, Tony, et al.
Published: (2025)
Domain Adaptation Without the Compute Burden for Efficient Whole Slide Image Analysis
by: Marikkar, Umar, et al.
Published: (2026)
by: Marikkar, Umar, et al.
Published: (2026)
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
by: Wu, Jialin, et al.
Published: (2023)
by: Wu, Jialin, et al.
Published: (2023)
Expressive and Generalizable Low-rank Adaptation for Large Models via Slow Cascaded Learning
by: Li, Siwei, et al.
Published: (2024)
by: Li, Siwei, et al.
Published: (2024)
CoLA: Collaborative Low-Rank Adaptation
by: Zhou, Yiyun, et al.
Published: (2025)
by: Zhou, Yiyun, et al.
Published: (2025)
CoLA: Conditional Dropout and Language-driven Robust Dual-modal Salient Object Detection
by: Hao, Shuang, et al.
Published: (2024)
by: Hao, Shuang, et al.
Published: (2024)
DeLoRA: Decoupling Angles and Strength in Low-rank Adaptation
by: Bini, Massimo, et al.
Published: (2025)
by: Bini, Massimo, et al.
Published: (2025)
Dynamic Context-oriented Decomposition for Task-aware Low-rank Adaptation with Less Forgetting and Faster Convergence
by: Yang, Yibo, et al.
Published: (2025)
by: Yang, Yibo, et al.
Published: (2025)
Leveraging Entity Information for Cross-Modality Correlation Learning: The Entity-Guided Multimodal Summarization
by: Zhang, Yanghai, et al.
Published: (2024)
by: Zhang, Yanghai, et al.
Published: (2024)
Low-Rank Adaptation with Task-Relevant Feature Enhancement for Fine-tuning Language Models
by: Li, Changqun, et al.
Published: (2024)
by: Li, Changqun, et al.
Published: (2024)
CROME: Cross-Modal Adapters for Efficient Multimodal LLM
by: Ebrahimi, Sayna, et al.
Published: (2024)
by: Ebrahimi, Sayna, et al.
Published: (2024)
Vision-Language Models Create Cross-Modal Task Representations
by: Luo, Grace, et al.
Published: (2024)
by: Luo, Grace, et al.
Published: (2024)
LowCLIP: Adapting the CLIP Model Architecture for Low-Resource Languages in Multimodal Image Retrieval Task
by: Asgarov, Ali, et al.
Published: (2024)
by: Asgarov, Ali, et al.
Published: (2024)
Shared and Private Information Learning in Multimodal Sentiment Analysis with Deep Modal Alignment and Self-supervised Multi-Task Learning
by: Lai, Songning, et al.
Published: (2023)
by: Lai, Songning, et al.
Published: (2023)
$\mathcal{V}isi\mathcal{P}runer$: Decoding Discontinuous Cross-Modal Dynamics for Efficient Multimodal LLMs
by: Fan, Yingqi, et al.
Published: (2025)
by: Fan, Yingqi, et al.
Published: (2025)
CoLA: A Choice Leakage Attack Framework to Expose Privacy Risks in Subset Training
by: Li, Qi, et al.
Published: (2026)
by: Li, Qi, et al.
Published: (2026)
CMAP: Cross-Modal Adaptive Prompting for Multi-Domain Task-Incremental Learning
by: Mandalika, Sriram
Published: (2026)
by: Mandalika, Sriram
Published: (2026)
Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data
by: Zhang, Yuhui, et al.
Published: (2024)
by: Zhang, Yuhui, et al.
Published: (2024)
Analyzing Reasoning Consistency in Large Multimodal Models under Cross-Modal Conflicts
by: Zhu, Zhihao, et al.
Published: (2026)
by: Zhu, Zhihao, et al.
Published: (2026)
Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate
by: Huang, Qidong, et al.
Published: (2024)
by: Huang, Qidong, et al.
Published: (2024)
MoExtend: Tuning New Experts for Modality and Task Extension
by: Zhong, Shanshan, et al.
Published: (2024)
by: Zhong, Shanshan, et al.
Published: (2024)
Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks
by: Ashraf, Tajamul, et al.
Published: (2025)
by: Ashraf, Tajamul, et al.
Published: (2025)
CoTasks: Chain-of-Thought based Video Instruction Tuning Tasks
by: Wang, Yanan, et al.
Published: (2025)
by: Wang, Yanan, et al.
Published: (2025)
One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion
by: Cheng, Chunyang, et al.
Published: (2025)
by: Cheng, Chunyang, et al.
Published: (2025)
Efficient Stitchable Task Adaptation
by: He, Haoyu, et al.
Published: (2023)
by: He, Haoyu, et al.
Published: (2023)
Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval
by: Wang, Yabing, et al.
Published: (2024)
by: Wang, Yabing, et al.
Published: (2024)
MANTA: Cross-Modal Semantic Alignment and Information-Theoretic Optimization for Long-form Multimodal Understanding
by: Zhong, Ziqi, et al.
Published: (2025)
by: Zhong, Ziqi, et al.
Published: (2025)
Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space
by: Verma, Gaurav, et al.
Published: (2024)
by: Verma, Gaurav, et al.
Published: (2024)
DoRA: Weight-Decomposed Low-Rank Adaptation
by: Liu, Shih-Yang, et al.
Published: (2024)
by: Liu, Shih-Yang, et al.
Published: (2024)
TASO: Task-Aligned Sparse Optimization for Parameter-Efficient Model Adaptation
by: Miao, Daiye, et al.
Published: (2025)
by: Miao, Daiye, et al.
Published: (2025)
Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition
by: Guo, Zirun, et al.
Published: (2024)
by: Guo, Zirun, et al.
Published: (2024)
Anthropogenic Regional Adaptation in Multimodal Vision-Language Model
by: Cahyawijaya, Samuel, et al.
Published: (2026)
by: Cahyawijaya, Samuel, et al.
Published: (2026)
Cross-Modal Rationale Transfer for Explainable Humanitarian Classification on Social Media
by: Nguyen, Thi Huyen, et al.
Published: (2026)
by: Nguyen, Thi Huyen, et al.
Published: (2026)
Cross-Modal Obfuscation for Jailbreak Attacks on Large Vision-Language Models
by: Jiang, Lei, et al.
Published: (2025)
by: Jiang, Lei, et al.
Published: (2025)
Cross-Modal Retrieval for Motion and Text via DropTriple Loss
by: Yan, Sheng, et al.
Published: (2023)
by: Yan, Sheng, et al.
Published: (2023)
Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models
by: Zhu, Tinghui, et al.
Published: (2024)
by: Zhu, Tinghui, et al.
Published: (2024)
Evaluating Cross-Modal Reasoning Ability and Problem Characteristics with Multimodal Item Response Theory
by: Uebayashi, Shunki, et al.
Published: (2026)
by: Uebayashi, Shunki, et al.
Published: (2026)
Summarization of Multimodal Presentations with Vision-Language Models: Study of the Effect of Modalities and Structure
by: Gigant, Théo, et al.
Published: (2025)
by: Gigant, Théo, et al.
Published: (2025)
Co-AttenDWG: Co-Attentive Dimension-Wise Gating and Expert Fusion for Multi-Modal Offensive Content Detection
by: Hossain, Md. Mithun, et al.
Published: (2025)
by: Hossain, Md. Mithun, et al.
Published: (2025)
Understanding Multimodal Procedural Knowledge by Sequencing Multimodal Instructional Manuals
by: Wu, Te-Lin, et al.
Published: (2021)
by: Wu, Te-Lin, et al.
Published: (2021)
Similar Items
-
PAL: Probing Audio Encoders via LLMs -- Audio Information Transfer into LLMs
by: Alex, Tony, et al.
Published: (2025) -
Domain Adaptation Without the Compute Burden for Efficient Whole Slide Image Analysis
by: Marikkar, Umar, et al.
Published: (2026) -
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
by: Wu, Jialin, et al.
Published: (2023) -
Expressive and Generalizable Low-rank Adaptation for Large Models via Slow Cascaded Learning
by: Li, Siwei, et al.
Published: (2024) -
CoLA: Collaborative Low-Rank Adaptation
by: Zhou, Yiyun, et al.
Published: (2025)