Saved in:
| Main Authors: | Xi, Suyang, Yang, Chenxi, Ding, Hong, Ni, Yiqing, Liu, Catherine C., Liu, Yunhao, Zhang, Chengqi |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.10426 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Multimodal Medical Image Binding via Shared Text Embeddings
by: Liu, Yunhao, et al.
Published: (2025)
by: Liu, Yunhao, et al.
Published: (2025)
Retrieval Augmented Comic Image Generation
by: Shui, Yunhao, et al.
Published: (2025)
by: Shui, Yunhao, et al.
Published: (2025)
IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual Prompting
by: Zhang, Tao, et al.
Published: (2025)
by: Zhang, Tao, et al.
Published: (2025)
Auto DragGAN: Editing the Generative Image Manifold in an Autoregressive Manner
by: Cai, Pengxiang, et al.
Published: (2024)
by: Cai, Pengxiang, et al.
Published: (2024)
RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition
by: Liu, Ziyu, et al.
Published: (2024)
by: Liu, Ziyu, et al.
Published: (2024)
RelativeFlow: Taming Medical Image Denoising Learning with Noisy Reference
by: Liu, Yuxin, et al.
Published: (2026)
by: Liu, Yuxin, et al.
Published: (2026)
ODI-Bench: Can MLLMs Understand Immersive Omnidirectional Environments?
by: Yang, Liu, et al.
Published: (2025)
by: Yang, Liu, et al.
Published: (2025)
SimpleOCR: Rendering Visualized Questions to Teach MLLMs to Read
by: Peng, Yibo, et al.
Published: (2026)
by: Peng, Yibo, et al.
Published: (2026)
T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation
by: Sun, Kaiyue, et al.
Published: (2025)
by: Sun, Kaiyue, et al.
Published: (2025)
Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation
by: Huang, Zhe, et al.
Published: (2025)
by: Huang, Zhe, et al.
Published: (2025)
Retrieval Augmented Image Harmonization
by: Wang, Haolin, et al.
Published: (2024)
by: Wang, Haolin, et al.
Published: (2024)
Revisiting MLLMs: An In-Depth Analysis of Image Classification Abilities
by: Liu, Huan, et al.
Published: (2024)
by: Liu, Huan, et al.
Published: (2024)
FreeRet: MLLMs as Training-Free Retrievers
by: Zhu, Yuhan, et al.
Published: (2025)
by: Zhu, Yuhan, et al.
Published: (2025)
UniMC: Taming Diffusion Transformer for Unified Keypoint-Guided Multi-Class Image Generation
by: Guo, Qin, et al.
Published: (2025)
by: Guo, Qin, et al.
Published: (2025)
OFA-Diffusion Compression: Compressing Diffusion Model in One-Shot Manner
by: Jiang, Haoyang, et al.
Published: (2026)
by: Jiang, Haoyang, et al.
Published: (2026)
ITIScore: An Image-to-Text-to-Image Rating Framework for the Image Captioning Ability of MLLMs
by: Xu, Zitong, et al.
Published: (2026)
by: Xu, Zitong, et al.
Published: (2026)
Open Multimodal Retrieval-Augmented Factual Image Generation
by: Tian, Yang, et al.
Published: (2025)
by: Tian, Yang, et al.
Published: (2025)
Retrieval Augmented Recipe Generation
by: Liu, Guoshan, et al.
Published: (2024)
by: Liu, Guoshan, et al.
Published: (2024)
Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation
by: Wang, Baisen, et al.
Published: (2024)
by: Wang, Baisen, et al.
Published: (2024)
Taming Video Models for 3D and 4D Generation via Zero-Shot Camera Control
by: Song, Chenxi, et al.
Published: (2025)
by: Song, Chenxi, et al.
Published: (2025)
RAGAR: Retrieval Augmented Personalized Image Generation Guided by Recommendation
by: Ling, Run, et al.
Published: (2025)
by: Ling, Run, et al.
Published: (2025)
FunBench: Benchmarking Fundus Reading Skills of MLLMs
by: Wei, Qijie, et al.
Published: (2025)
by: Wei, Qijie, et al.
Published: (2025)
MLLMs-Augmented Visual-Language Representation Learning
by: Liu, Yanqing, et al.
Published: (2023)
by: Liu, Yanqing, et al.
Published: (2023)
Knowledge Completes the Vision: A Multimodal Entity-aware Retrieval-Augmented Generation Framework for News Image Captioning
by: You, Xiaoxing, et al.
Published: (2025)
by: You, Xiaoxing, et al.
Published: (2025)
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
by: Qi, Jingyuan, et al.
Published: (2025)
by: Qi, Jingyuan, et al.
Published: (2025)
Taming Generative Diffusion Prior for Universal Blind Image Restoration
by: Tu, Siwei, et al.
Published: (2024)
by: Tu, Siwei, et al.
Published: (2024)
Shapley Values-enabled Progressive Pseudo Bag Augmentation for Whole Slide Image Classification
by: Yan, Renao, et al.
Published: (2023)
by: Yan, Renao, et al.
Published: (2023)
Can MLLMs Understand the Deep Implication Behind Chinese Images?
by: Zhang, Chenhao, et al.
Published: (2024)
by: Zhang, Chenhao, et al.
Published: (2024)
Struct2D: A Perception-Guided Framework for Spatial Reasoning in MLLMs
by: Zhu, Fangrui, et al.
Published: (2025)
by: Zhu, Fangrui, et al.
Published: (2025)
Image Generation Diversity Issues and How to Tame Them
by: Dombrowski, Mischa, et al.
Published: (2024)
by: Dombrowski, Mischa, et al.
Published: (2024)
ColorFlow: Retrieval-Augmented Image Sequence Colorization
by: Zhuang, Junhao, et al.
Published: (2024)
by: Zhuang, Junhao, et al.
Published: (2024)
MedLVR: Latent Visual Reasoning for Reliable Medical Visual Question Answering
by: Xi, Suyang, et al.
Published: (2026)
by: Xi, Suyang, et al.
Published: (2026)
ArtHOI: Taming Foundation Models for Monocular 4D Reconstruction of Hand-Articulated-Object Interactions
by: Wang, Zikai, et al.
Published: (2026)
by: Wang, Zikai, et al.
Published: (2026)
TACO: Taming Diffusion for in-the-wild Video Amodal Completion
by: Lu, Ruijie, et al.
Published: (2025)
by: Lu, Ruijie, et al.
Published: (2025)
Adapting MLLMs for Nuanced Video Retrieval
by: Bagad, Piyush, et al.
Published: (2025)
by: Bagad, Piyush, et al.
Published: (2025)
SOWing Information: Cultivating Contextual Coherence with MLLMs in Image Generation
by: Pei, Yuhan, et al.
Published: (2024)
by: Pei, Yuhan, et al.
Published: (2024)
U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs
by: Li, Xiaojie, et al.
Published: (2025)
by: Li, Xiaojie, et al.
Published: (2025)
Causal-Adapter: Taming Text-to-Image Diffusion for Faithful Counterfactual Generation
by: Tong, Lei, et al.
Published: (2025)
by: Tong, Lei, et al.
Published: (2025)
Defense Against Adversarial Attacks on No-Reference Image Quality Models with Gradient Norm Regularization
by: Liu, Yujia, et al.
Published: (2024)
by: Liu, Yujia, et al.
Published: (2024)
SLQ: Bridging Modalities via Shared Latent Queries for Retrieval with Frozen MLLMs
by: Lou, Haoran, et al.
Published: (2026)
by: Lou, Haoran, et al.
Published: (2026)
Similar Items
-
Multimodal Medical Image Binding via Shared Text Embeddings
by: Liu, Yunhao, et al.
Published: (2025) -
Retrieval Augmented Comic Image Generation
by: Shui, Yunhao, et al.
Published: (2025) -
IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual Prompting
by: Zhang, Tao, et al.
Published: (2025) -
Auto DragGAN: Editing the Generative Image Manifold in an Autoregressive Manner
by: Cai, Pengxiang, et al.
Published: (2024) -
RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition
by: Liu, Ziyu, et al.
Published: (2024)