:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xi, Suyang, Yang, Chenxi, Ding, Hong, Ni, Yiqing, Liu, Catherine C., Liu, Yunhao, Zhang, Chengqi
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2510.10426
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Multimodal Medical Image Binding via Shared Text Embeddings
by: Liu, Yunhao, et al.
Published: (2025)

Retrieval Augmented Comic Image Generation
by: Shui, Yunhao, et al.
Published: (2025)

IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual Prompting
by: Zhang, Tao, et al.
Published: (2025)

Auto DragGAN: Editing the Generative Image Manifold in an Autoregressive Manner
by: Cai, Pengxiang, et al.
Published: (2024)

RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition
by: Liu, Ziyu, et al.
Published: (2024)

RelativeFlow: Taming Medical Image Denoising Learning with Noisy Reference
by: Liu, Yuxin, et al.
Published: (2026)

ODI-Bench: Can MLLMs Understand Immersive Omnidirectional Environments?
by: Yang, Liu, et al.
Published: (2025)

SimpleOCR: Rendering Visualized Questions to Teach MLLMs to Read
by: Peng, Yibo, et al.
Published: (2026)

T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation
by: Sun, Kaiyue, et al.
Published: (2025)

Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation
by: Huang, Zhe, et al.
Published: (2025)

Retrieval Augmented Image Harmonization
by: Wang, Haolin, et al.
Published: (2024)

Revisiting MLLMs: An In-Depth Analysis of Image Classification Abilities
by: Liu, Huan, et al.
Published: (2024)

FreeRet: MLLMs as Training-Free Retrievers
by: Zhu, Yuhan, et al.
Published: (2025)

UniMC: Taming Diffusion Transformer for Unified Keypoint-Guided Multi-Class Image Generation
by: Guo, Qin, et al.
Published: (2025)

OFA-Diffusion Compression: Compressing Diffusion Model in One-Shot Manner
by: Jiang, Haoyang, et al.
Published: (2026)

ITIScore: An Image-to-Text-to-Image Rating Framework for the Image Captioning Ability of MLLMs
by: Xu, Zitong, et al.
Published: (2026)

Open Multimodal Retrieval-Augmented Factual Image Generation
by: Tian, Yang, et al.
Published: (2025)

Retrieval Augmented Recipe Generation
by: Liu, Guoshan, et al.
Published: (2024)

Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation
by: Wang, Baisen, et al.
Published: (2024)

Taming Video Models for 3D and 4D Generation via Zero-Shot Camera Control
by: Song, Chenxi, et al.
Published: (2025)

RAGAR: Retrieval Augmented Personalized Image Generation Guided by Recommendation
by: Ling, Run, et al.
Published: (2025)

FunBench: Benchmarking Fundus Reading Skills of MLLMs
by: Wei, Qijie, et al.
Published: (2025)

MLLMs-Augmented Visual-Language Representation Learning
by: Liu, Yanqing, et al.
Published: (2023)

Knowledge Completes the Vision: A Multimodal Entity-aware Retrieval-Augmented Generation Framework for News Image Captioning
by: You, Xiaoxing, et al.
Published: (2025)

AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
by: Qi, Jingyuan, et al.
Published: (2025)

Taming Generative Diffusion Prior for Universal Blind Image Restoration
by: Tu, Siwei, et al.
Published: (2024)

Shapley Values-enabled Progressive Pseudo Bag Augmentation for Whole Slide Image Classification
by: Yan, Renao, et al.
Published: (2023)

Can MLLMs Understand the Deep Implication Behind Chinese Images?
by: Zhang, Chenhao, et al.
Published: (2024)

Struct2D: A Perception-Guided Framework for Spatial Reasoning in MLLMs
by: Zhu, Fangrui, et al.
Published: (2025)

Image Generation Diversity Issues and How to Tame Them
by: Dombrowski, Mischa, et al.
Published: (2024)

ColorFlow: Retrieval-Augmented Image Sequence Colorization
by: Zhuang, Junhao, et al.
Published: (2024)

MedLVR: Latent Visual Reasoning for Reliable Medical Visual Question Answering
by: Xi, Suyang, et al.
Published: (2026)

ArtHOI: Taming Foundation Models for Monocular 4D Reconstruction of Hand-Articulated-Object Interactions
by: Wang, Zikai, et al.
Published: (2026)

TACO: Taming Diffusion for in-the-wild Video Amodal Completion
by: Lu, Ruijie, et al.
Published: (2025)

Adapting MLLMs for Nuanced Video Retrieval
by: Bagad, Piyush, et al.
Published: (2025)

SOWing Information: Cultivating Contextual Coherence with MLLMs in Image Generation
by: Pei, Yuhan, et al.
Published: (2024)

U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs
by: Li, Xiaojie, et al.
Published: (2025)

Causal-Adapter: Taming Text-to-Image Diffusion for Faithful Counterfactual Generation
by: Tong, Lei, et al.
Published: (2025)

Defense Against Adversarial Attacks on No-Reference Image Quality Models with Gradient Norm Regularization
by: Liu, Yujia, et al.
Published: (2024)

SLQ: Bridging Modalities via Shared Latent Queries for Retrieval with Frozen MLLMs
by: Lou, Haoran, et al.
Published: (2026)