:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yin, Haojie, Feng, Chengcheng, Liu, Tianyi, Zhang, Tianqi, Huang, Kaizhu
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2605.26513
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Towards Faithful Reasoning in Comics for Small MLLMs
by: Feng, Chengcheng, et al.
Published: (2026)

M3: 3D-Spatial MultiModal Memory
by: Zou, Xueyan, et al.
Published: (2025)

Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark
by: Hao, Yunzhuo, et al.
Published: (2025)

MultiModal Action Conditioned Video Generation
by: Li, Yichen, et al.
Published: (2025)

MultiModal Fine-tuning with Synthetic Captions
by: Enomoto, Shohei, et al.
Published: (2026)

ProM3E: Probabilistic Masked MultiModal Embedding Model for Ecology
by: Sastry, Srikumar, et al.
Published: (2025)

M3R: Localized Rainfall Nowcasting with Meteorology-Informed MultiModal Attention
by: Panta, Sanjeev, et al.
Published: (2026)

M3FAS: An Accurate and Robust MultiModal Mobile Face Anti-Spoofing System
by: Kong, Chenqi, et al.
Published: (2023)

ControlEdit: A MultiModal Local Clothing Image Editing Method
by: Cheng, Di, et al.
Published: (2024)

CapS-Adapter: Caption-based MultiModal Adapter in Zero-Shot Classification
by: Wang, Qijie, et al.
Published: (2024)

LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models
by: Qiu, Han, et al.
Published: (2024)

MMA-Diffusion: MultiModal Attack on Diffusion Models
by: Yang, Yijun, et al.
Published: (2023)

M$^2$CD: A Unified MultiModal Framework for Optical-SAR Change Detection with Mixture of Experts and Self-Distillation
by: Liu, Ziyuan, et al.
Published: (2025)

TeamPath: Building MultiModal Pathology Experts with Reasoning AI Copilots
by: Liu, Tianyu, et al.
Published: (2025)

Mind the Gap: Promoting Missing Modality Brain Tumor Segmentation with Alignment
by: Liu, Tianyi, et al.
Published: (2024)

DMAF-Net: An Effective Modality Rebalancing Framework for Incomplete Multi-Modal Medical Image Segmentation
by: Lan, Libin, et al.
Published: (2025)

MMCORE: MultiModal COnnection with Representation Aligned Latent Embeddings
by: Li, Zijie, et al.
Published: (2026)

MM-R5: MultiModal Reasoning-Enhanced ReRanker via Reinforcement Learning for Document Retrieval
by: Xu, Mingjun, et al.
Published: (2025)

MMRPT: MultiModal Reinforcement Pre-Training via Masked Vision-Dependent Reasoning
by: Zheng, Xuhui, et al.
Published: (2025)

HAMMR: HierArchical MultiModal React agents for generic VQA
by: Castrejon, Lluis, et al.
Published: (2024)

Rethinking Information Loss in Medical Image Segmentation with Various-sized Targets
by: Liu, Tianyi, et al.
Published: (2024)

Omni-Customizer: End-to-End MultiModal Customization for Joint Audio-Video Generation
by: Chen, Yuheng, et al.
Published: (2026)

MedMAP: Promoting Incomplete Multi-modal Brain Tumor Segmentation with Alignment
by: Liu, Tianyi, et al.
Published: (2024)

CharGen: High Accurate Character-Level Visual Text Generation Model with MultiModal Encoder
by: Ma, Lichen, et al.
Published: (2024)

From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding
by: Zou, Heqing, et al.
Published: (2024)

Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models
by: Zeng, Zhen, et al.
Published: (2024)

MM-MoralBench: A MultiModal Moral Evaluation Benchmark for Large Vision-Language Models
by: Yan, Bei, et al.
Published: (2024)

Rebalancing Multi-Label Class-Incremental Learning
by: Du, Kaile, et al.
Published: (2024)

MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild
by: Chumachenko, Kateryna, et al.
Published: (2024)

GeoSDF: Plane Geometry Diagram Synthesis via Signed Distance Field
by: Zhang, Chengrui, et al.
Published: (2025)

TriLoRA: Integrating SVD for Advanced Style Personalization in Text-to-Image Generation
by: Feng, Chengcheng, et al.
Published: (2024)

Frequency-enhanced Multi-granularity Context Network for Efficient Vertebrae Segmentation
by: Shi, Jian, et al.
Published: (2025)

MDReID: Modality-Decoupled Learning for Any-to-Any Multi-Modal Object Re-Identification
by: Feng, Yingying, et al.
Published: (2025)

BFANet: Revisiting 3D Semantic Segmentation with Boundary Feature Analysis
by: Zhao, Weiguang, et al.
Published: (2025)

Controlled Data Rebalancing in Multi-Task Learning for Real-World Image Super-Resolution
by: Lin, Shuchen, et al.
Published: (2025)

Vision Transformer based Random Walk for Group Re-Identification
by: Zhang, Guoqing, et al.
Published: (2024)

MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling
by: Yang, Jian, et al.
Published: (2024)

W-Net: One-Shot Arbitrary-Style Chinese Character Generation with Deep Neural Networks
by: Jiang, Haochuan, et al.
Published: (2024)

Rethinking Multi-domain Generalization with A General Learning Objective
by: Tan, Zhaorui, et al.
Published: (2024)

DrFuse: Learning Disentangled Representation for Clinical Multi-Modal Fusion with Missing Modality and Modal Inconsistency
by: Yao, Wenfang, et al.
Published: (2024)