Saved in:
| Main Authors: | Shi, Yi, Meng, Wenlong, Guo, Zhenyuan, Wei, Chengkun, Chen, Wenzhi |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.11126 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SignBind-LLM: Multi-Stage Modality Fusion for Sign Language Translation
by: Thomas, Marshall, et al.
Published: (2025)
by: Thomas, Marshall, et al.
Published: (2025)
Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition
by: Guo, Zirun, et al.
Published: (2024)
by: Guo, Zirun, et al.
Published: (2024)
Be Cautious When Merging Unfamiliar LLMs: A Phishing Model Capable of Stealing Privacy
by: Guo, Zhenyuan, et al.
Published: (2025)
by: Guo, Zhenyuan, et al.
Published: (2025)
HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding
by: Yang, Rui, et al.
Published: (2025)
by: Yang, Rui, et al.
Published: (2025)
DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning
by: Du, Mengfei, et al.
Published: (2024)
by: Du, Mengfei, et al.
Published: (2024)
Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models
by: Xu, Runsen, et al.
Published: (2025)
by: Xu, Runsen, et al.
Published: (2025)
MMViR: A Multi-Modal and Multi-Granularity Representation for Long-range Video Understanding
by: Li, Zizhong, et al.
Published: (2026)
by: Li, Zizhong, et al.
Published: (2026)
Head-wise Modality Specialization within MLLMs for Robust Fake News Detection under Missing Modality
by: Qian, Kai, et al.
Published: (2026)
by: Qian, Kai, et al.
Published: (2026)
Multi-Modal Multi-Granularity Tokenizer for Chu Bamboo Slip Scripts
by: Chen, Yingfa, et al.
Published: (2024)
by: Chen, Yingfa, et al.
Published: (2024)
MER-Bench: A Comprehensive Benchmark for Multimodal Meme Reappraisal
by: Nie, Yiqi, et al.
Published: (2026)
by: Nie, Yiqi, et al.
Published: (2026)
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
by: Ye, Hanrong, et al.
Published: (2025)
by: Ye, Hanrong, et al.
Published: (2025)
SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation
by: Chen, Yi-Chia, et al.
Published: (2024)
by: Chen, Yi-Chia, et al.
Published: (2024)
MMGR: Multi-Modal Generative Reasoning
by: Cai, Zefan, et al.
Published: (2025)
by: Cai, Zefan, et al.
Published: (2025)
Insight: A Multi-Modal Diagnostic Pipeline using LLMs for Ocular Surface Disease Diagnosis
by: Yeh, Chun-Hsiao, et al.
Published: (2024)
by: Yeh, Chun-Hsiao, et al.
Published: (2024)
LoMo: Local Modality Substitution for Deeper Vision-Language Fusion
by: Han, Feng, et al.
Published: (2026)
by: Han, Feng, et al.
Published: (2026)
HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context
by: Yang, Qize, et al.
Published: (2025)
by: Yang, Qize, et al.
Published: (2025)
Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning
by: Jiang, Chen, et al.
Published: (2023)
by: Jiang, Chen, et al.
Published: (2023)
Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding
by: Liu, Yi, et al.
Published: (2024)
by: Liu, Yi, et al.
Published: (2024)
A Two-Stage Dual-Modality Model for Facial Emotional Expression Recognition
by: Sun, Jiajun, et al.
Published: (2026)
by: Sun, Jiajun, et al.
Published: (2026)
LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning
by: Dai, Yifan, et al.
Published: (2026)
by: Dai, Yifan, et al.
Published: (2026)
Beyond Meme Templates: Limitations of Visual Similarity Measures in Meme Matching
by: Hazman, Muzhaffar, et al.
Published: (2025)
by: Hazman, Muzhaffar, et al.
Published: (2025)
Modality-Specific Enhancement and Complementary Fusion for Semi-Supervised Multi-Modal Brain Tumor Segmentation
by: Chung, Tien-Dat, et al.
Published: (2025)
by: Chung, Tien-Dat, et al.
Published: (2025)
Is Extending Modality The Right Path Towards Omni-Modality?
by: Zhu, Tinghui, et al.
Published: (2025)
by: Zhu, Tinghui, et al.
Published: (2025)
Co-AttenDWG: Co-Attentive Dimension-Wise Gating and Expert Fusion for Multi-Modal Offensive Content Detection
by: Hossain, Md. Mithun, et al.
Published: (2025)
by: Hossain, Md. Mithun, et al.
Published: (2025)
Text-centric Alignment for Multi-Modality Learning
by: Tsai, Yun-Da, et al.
Published: (2024)
by: Tsai, Yun-Da, et al.
Published: (2024)
Multi-Modal Explainable Medical AI Assistant for Trustworthy Human-AI Collaboration
by: Yang, Honglong, et al.
Published: (2025)
by: Yang, Honglong, et al.
Published: (2025)
Prompt Highlighter: Interactive Control for Multi-Modal LLMs
by: Zhang, Yuechen, et al.
Published: (2023)
by: Zhang, Yuechen, et al.
Published: (2023)
Mixture-of-Mamba: Enhancing Multi-Modal State-Space Models with Modality-Aware Sparsity
by: Liang, Weixin, et al.
Published: (2025)
by: Liang, Weixin, et al.
Published: (2025)
Modularized Networks for Few-shot Hateful Meme Detection
by: Cao, Rui, et al.
Published: (2024)
by: Cao, Rui, et al.
Published: (2024)
A Spectrum Evaluation Benchmark for Medical Multi-Modal Large Language Models
by: Liu, Jie, et al.
Published: (2024)
by: Liu, Jie, et al.
Published: (2024)
APLe: Token-Wise Adaptive for Multi-Modal Prompt Learning
by: Cao, Guiming, et al.
Published: (2024)
by: Cao, Guiming, et al.
Published: (2024)
Otter: A Multi-Modal Model with In-Context Instruction Tuning
by: Li, Bo, et al.
Published: (2023)
by: Li, Bo, et al.
Published: (2023)
ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing
by: Xing, Long, et al.
Published: (2025)
by: Xing, Long, et al.
Published: (2025)
PRISM2: Unlocking Multi-Modal General Pathology AI with Clinical Dialogue
by: Vorontsov, Eugene, et al.
Published: (2025)
by: Vorontsov, Eugene, et al.
Published: (2025)
StyleMotif: Multi-Modal Motion Stylization using Style-Content Cross Fusion
by: Guo, Ziyu, et al.
Published: (2025)
by: Guo, Ziyu, et al.
Published: (2025)
Revisiting Multi-Modal LLM Evaluation
by: Lu, Jian, et al.
Published: (2024)
by: Lu, Jian, et al.
Published: (2024)
Watch Out Your Album! On the Inadvertent Privacy Memorization in Multi-Modal Large Language Models
by: Ju, Tianjie, et al.
Published: (2025)
by: Ju, Tianjie, et al.
Published: (2025)
Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate
by: Huang, Qidong, et al.
Published: (2024)
by: Huang, Qidong, et al.
Published: (2024)
Multi-Modal Semantic Parsing for the Interpretation of Tombstone Inscriptions
by: Zhang, Xiao, et al.
Published: (2025)
by: Zhang, Xiao, et al.
Published: (2025)
Text-Video Retrieval via Variational Multi-Modal Hypergraph Networks
by: Li, Qian, et al.
Published: (2024)
by: Li, Qian, et al.
Published: (2024)
Similar Items
-
SignBind-LLM: Multi-Stage Modality Fusion for Sign Language Translation
by: Thomas, Marshall, et al.
Published: (2025) -
Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition
by: Guo, Zirun, et al.
Published: (2024) -
Be Cautious When Merging Unfamiliar LLMs: A Phishing Model Capable of Stealing Privacy
by: Guo, Zhenyuan, et al.
Published: (2025) -
HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding
by: Yang, Rui, et al.
Published: (2025) -
DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning
by: Du, Mengfei, et al.
Published: (2024)