Saved in:
| Main Authors: | Yin, Haojie, Feng, Chengcheng, Liu, Tianyi, Zhang, Tianqi, Huang, Kaizhu |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.26513 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Towards Faithful Reasoning in Comics for Small MLLMs
by: Feng, Chengcheng, et al.
Published: (2026)
by: Feng, Chengcheng, et al.
Published: (2026)
M3: 3D-Spatial MultiModal Memory
by: Zou, Xueyan, et al.
Published: (2025)
by: Zou, Xueyan, et al.
Published: (2025)
Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark
by: Hao, Yunzhuo, et al.
Published: (2025)
by: Hao, Yunzhuo, et al.
Published: (2025)
MultiModal Action Conditioned Video Generation
by: Li, Yichen, et al.
Published: (2025)
by: Li, Yichen, et al.
Published: (2025)
MultiModal Fine-tuning with Synthetic Captions
by: Enomoto, Shohei, et al.
Published: (2026)
by: Enomoto, Shohei, et al.
Published: (2026)
ProM3E: Probabilistic Masked MultiModal Embedding Model for Ecology
by: Sastry, Srikumar, et al.
Published: (2025)
by: Sastry, Srikumar, et al.
Published: (2025)
M3R: Localized Rainfall Nowcasting with Meteorology-Informed MultiModal Attention
by: Panta, Sanjeev, et al.
Published: (2026)
by: Panta, Sanjeev, et al.
Published: (2026)
M3FAS: An Accurate and Robust MultiModal Mobile Face Anti-Spoofing System
by: Kong, Chenqi, et al.
Published: (2023)
by: Kong, Chenqi, et al.
Published: (2023)
ControlEdit: A MultiModal Local Clothing Image Editing Method
by: Cheng, Di, et al.
Published: (2024)
by: Cheng, Di, et al.
Published: (2024)
CapS-Adapter: Caption-based MultiModal Adapter in Zero-Shot Classification
by: Wang, Qijie, et al.
Published: (2024)
by: Wang, Qijie, et al.
Published: (2024)
LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models
by: Qiu, Han, et al.
Published: (2024)
by: Qiu, Han, et al.
Published: (2024)
MMA-Diffusion: MultiModal Attack on Diffusion Models
by: Yang, Yijun, et al.
Published: (2023)
by: Yang, Yijun, et al.
Published: (2023)
M$^2$CD: A Unified MultiModal Framework for Optical-SAR Change Detection with Mixture of Experts and Self-Distillation
by: Liu, Ziyuan, et al.
Published: (2025)
by: Liu, Ziyuan, et al.
Published: (2025)
TeamPath: Building MultiModal Pathology Experts with Reasoning AI Copilots
by: Liu, Tianyu, et al.
Published: (2025)
by: Liu, Tianyu, et al.
Published: (2025)
Mind the Gap: Promoting Missing Modality Brain Tumor Segmentation with Alignment
by: Liu, Tianyi, et al.
Published: (2024)
by: Liu, Tianyi, et al.
Published: (2024)
DMAF-Net: An Effective Modality Rebalancing Framework for Incomplete Multi-Modal Medical Image Segmentation
by: Lan, Libin, et al.
Published: (2025)
by: Lan, Libin, et al.
Published: (2025)
MMCORE: MultiModal COnnection with Representation Aligned Latent Embeddings
by: Li, Zijie, et al.
Published: (2026)
by: Li, Zijie, et al.
Published: (2026)
MM-R5: MultiModal Reasoning-Enhanced ReRanker via Reinforcement Learning for Document Retrieval
by: Xu, Mingjun, et al.
Published: (2025)
by: Xu, Mingjun, et al.
Published: (2025)
MMRPT: MultiModal Reinforcement Pre-Training via Masked Vision-Dependent Reasoning
by: Zheng, Xuhui, et al.
Published: (2025)
by: Zheng, Xuhui, et al.
Published: (2025)
HAMMR: HierArchical MultiModal React agents for generic VQA
by: Castrejon, Lluis, et al.
Published: (2024)
by: Castrejon, Lluis, et al.
Published: (2024)
Rethinking Information Loss in Medical Image Segmentation with Various-sized Targets
by: Liu, Tianyi, et al.
Published: (2024)
by: Liu, Tianyi, et al.
Published: (2024)
Omni-Customizer: End-to-End MultiModal Customization for Joint Audio-Video Generation
by: Chen, Yuheng, et al.
Published: (2026)
by: Chen, Yuheng, et al.
Published: (2026)
MedMAP: Promoting Incomplete Multi-modal Brain Tumor Segmentation with Alignment
by: Liu, Tianyi, et al.
Published: (2024)
by: Liu, Tianyi, et al.
Published: (2024)
CharGen: High Accurate Character-Level Visual Text Generation Model with MultiModal Encoder
by: Ma, Lichen, et al.
Published: (2024)
by: Ma, Lichen, et al.
Published: (2024)
From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding
by: Zou, Heqing, et al.
Published: (2024)
by: Zou, Heqing, et al.
Published: (2024)
Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models
by: Zeng, Zhen, et al.
Published: (2024)
by: Zeng, Zhen, et al.
Published: (2024)
MM-MoralBench: A MultiModal Moral Evaluation Benchmark for Large Vision-Language Models
by: Yan, Bei, et al.
Published: (2024)
by: Yan, Bei, et al.
Published: (2024)
Rebalancing Multi-Label Class-Incremental Learning
by: Du, Kaile, et al.
Published: (2024)
by: Du, Kaile, et al.
Published: (2024)
MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild
by: Chumachenko, Kateryna, et al.
Published: (2024)
by: Chumachenko, Kateryna, et al.
Published: (2024)
GeoSDF: Plane Geometry Diagram Synthesis via Signed Distance Field
by: Zhang, Chengrui, et al.
Published: (2025)
by: Zhang, Chengrui, et al.
Published: (2025)
TriLoRA: Integrating SVD for Advanced Style Personalization in Text-to-Image Generation
by: Feng, Chengcheng, et al.
Published: (2024)
by: Feng, Chengcheng, et al.
Published: (2024)
Frequency-enhanced Multi-granularity Context Network for Efficient Vertebrae Segmentation
by: Shi, Jian, et al.
Published: (2025)
by: Shi, Jian, et al.
Published: (2025)
MDReID: Modality-Decoupled Learning for Any-to-Any Multi-Modal Object Re-Identification
by: Feng, Yingying, et al.
Published: (2025)
by: Feng, Yingying, et al.
Published: (2025)
BFANet: Revisiting 3D Semantic Segmentation with Boundary Feature Analysis
by: Zhao, Weiguang, et al.
Published: (2025)
by: Zhao, Weiguang, et al.
Published: (2025)
Controlled Data Rebalancing in Multi-Task Learning for Real-World Image Super-Resolution
by: Lin, Shuchen, et al.
Published: (2025)
by: Lin, Shuchen, et al.
Published: (2025)
Vision Transformer based Random Walk for Group Re-Identification
by: Zhang, Guoqing, et al.
Published: (2024)
by: Zhang, Guoqing, et al.
Published: (2024)
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling
by: Yang, Jian, et al.
Published: (2024)
by: Yang, Jian, et al.
Published: (2024)
W-Net: One-Shot Arbitrary-Style Chinese Character Generation with Deep Neural Networks
by: Jiang, Haochuan, et al.
Published: (2024)
by: Jiang, Haochuan, et al.
Published: (2024)
Rethinking Multi-domain Generalization with A General Learning Objective
by: Tan, Zhaorui, et al.
Published: (2024)
by: Tan, Zhaorui, et al.
Published: (2024)
DrFuse: Learning Disentangled Representation for Clinical Multi-Modal Fusion with Missing Modality and Modal Inconsistency
by: Yao, Wenfang, et al.
Published: (2024)
by: Yao, Wenfang, et al.
Published: (2024)
Similar Items
-
Towards Faithful Reasoning in Comics for Small MLLMs
by: Feng, Chengcheng, et al.
Published: (2026) -
M3: 3D-Spatial MultiModal Memory
by: Zou, Xueyan, et al.
Published: (2025) -
Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark
by: Hao, Yunzhuo, et al.
Published: (2025) -
MultiModal Action Conditioned Video Generation
by: Li, Yichen, et al.
Published: (2025) -
MultiModal Fine-tuning with Synthetic Captions
by: Enomoto, Shohei, et al.
Published: (2026)