:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Suharitdamrong, Wish, Alex, Tony, Awais, Muhammad, Ahmed, Sara
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Computation and Language
Online Access:	https://arxiv.org/abs/2604.03314
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

PAL: Probing Audio Encoders via LLMs -- Audio Information Transfer into LLMs
by: Alex, Tony, et al.
Published: (2025)

Domain Adaptation Without the Compute Burden for Efficient Whole Slide Image Analysis
by: Marikkar, Umar, et al.
Published: (2026)

Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
by: Wu, Jialin, et al.
Published: (2023)

Expressive and Generalizable Low-rank Adaptation for Large Models via Slow Cascaded Learning
by: Li, Siwei, et al.
Published: (2024)

CoLA: Collaborative Low-Rank Adaptation
by: Zhou, Yiyun, et al.
Published: (2025)

CoLA: Conditional Dropout and Language-driven Robust Dual-modal Salient Object Detection
by: Hao, Shuang, et al.
Published: (2024)

DeLoRA: Decoupling Angles and Strength in Low-rank Adaptation
by: Bini, Massimo, et al.
Published: (2025)

Dynamic Context-oriented Decomposition for Task-aware Low-rank Adaptation with Less Forgetting and Faster Convergence
by: Yang, Yibo, et al.
Published: (2025)

Leveraging Entity Information for Cross-Modality Correlation Learning: The Entity-Guided Multimodal Summarization
by: Zhang, Yanghai, et al.
Published: (2024)

Low-Rank Adaptation with Task-Relevant Feature Enhancement for Fine-tuning Language Models
by: Li, Changqun, et al.
Published: (2024)

CROME: Cross-Modal Adapters for Efficient Multimodal LLM
by: Ebrahimi, Sayna, et al.
Published: (2024)

Vision-Language Models Create Cross-Modal Task Representations
by: Luo, Grace, et al.
Published: (2024)

LowCLIP: Adapting the CLIP Model Architecture for Low-Resource Languages in Multimodal Image Retrieval Task
by: Asgarov, Ali, et al.
Published: (2024)

Shared and Private Information Learning in Multimodal Sentiment Analysis with Deep Modal Alignment and Self-supervised Multi-Task Learning
by: Lai, Songning, et al.
Published: (2023)

$\mathcal{V}isi\mathcal{P}runer$: Decoding Discontinuous Cross-Modal Dynamics for Efficient Multimodal LLMs
by: Fan, Yingqi, et al.
Published: (2025)

CoLA: A Choice Leakage Attack Framework to Expose Privacy Risks in Subset Training
by: Li, Qi, et al.
Published: (2026)

CMAP: Cross-Modal Adaptive Prompting for Multi-Domain Task-Incremental Learning
by: Mandalika, Sriram
Published: (2026)

Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data
by: Zhang, Yuhui, et al.
Published: (2024)

Analyzing Reasoning Consistency in Large Multimodal Models under Cross-Modal Conflicts
by: Zhu, Zhihao, et al.
Published: (2026)

Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate
by: Huang, Qidong, et al.
Published: (2024)

MoExtend: Tuning New Experts for Modality and Task Extension
by: Zhong, Shanshan, et al.
Published: (2024)

Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks
by: Ashraf, Tajamul, et al.
Published: (2025)

CoTasks: Chain-of-Thought based Video Instruction Tuning Tasks
by: Wang, Yanan, et al.
Published: (2025)

One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion
by: Cheng, Chunyang, et al.
Published: (2025)

Efficient Stitchable Task Adaptation
by: He, Haoyu, et al.
Published: (2023)

Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval
by: Wang, Yabing, et al.
Published: (2024)

MANTA: Cross-Modal Semantic Alignment and Information-Theoretic Optimization for Long-form Multimodal Understanding
by: Zhong, Ziqi, et al.
Published: (2025)

Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space
by: Verma, Gaurav, et al.
Published: (2024)

DoRA: Weight-Decomposed Low-Rank Adaptation
by: Liu, Shih-Yang, et al.
Published: (2024)

TASO: Task-Aligned Sparse Optimization for Parameter-Efficient Model Adaptation
by: Miao, Daiye, et al.
Published: (2025)

Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition
by: Guo, Zirun, et al.
Published: (2024)

Anthropogenic Regional Adaptation in Multimodal Vision-Language Model
by: Cahyawijaya, Samuel, et al.
Published: (2026)

Cross-Modal Rationale Transfer for Explainable Humanitarian Classification on Social Media
by: Nguyen, Thi Huyen, et al.
Published: (2026)

Cross-Modal Obfuscation for Jailbreak Attacks on Large Vision-Language Models
by: Jiang, Lei, et al.
Published: (2025)

Cross-Modal Retrieval for Motion and Text via DropTriple Loss
by: Yan, Sheng, et al.
Published: (2023)

Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models
by: Zhu, Tinghui, et al.
Published: (2024)

Evaluating Cross-Modal Reasoning Ability and Problem Characteristics with Multimodal Item Response Theory
by: Uebayashi, Shunki, et al.
Published: (2026)

Summarization of Multimodal Presentations with Vision-Language Models: Study of the Effect of Modalities and Structure
by: Gigant, Théo, et al.
Published: (2025)

Co-AttenDWG: Co-Attentive Dimension-Wise Gating and Expert Fusion for Multi-Modal Offensive Content Detection
by: Hossain, Md. Mithun, et al.
Published: (2025)

Understanding Multimodal Procedural Knowledge by Sequencing Multimodal Instructional Manuals
by: Wu, Te-Lin, et al.
Published: (2021)