:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Kaishen, Huang, Heng
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.27332
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ImAgent: A Unified Multimodal Agent Framework for Test-Time Scalable Image Generation
by: Wang, Kaishen, et al.
Published: (2025)

Unified Reward Model for Multimodal Understanding and Generation
by: Wang, Yibin, et al.
Published: (2025)

Towards Understanding Unsafe Video Generation
by: Pang, Yan, et al.
Published: (2024)

To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now
by: Zhang, Yimeng, et al.
Published: (2023)

Learning to Generate via Understanding: Understanding-Driven Intrinsic Rewarding for Unified Multimodal Models
by: Pan, Jiadong, et al.
Published: (2026)

UniModel: A Visual-Only Framework for Unified Multimodal Understanding and Generation
by: Zhang, Chi, et al.
Published: (2025)

GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation
by: Wang, Zhanyu, et al.
Published: (2023)

MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
by: Xie, Wulin, et al.
Published: (2025)

Why Text Prevails: Vision May Undermine Multimodal Medical Decision Making
by: Dai, Siyuan, et al.
Published: (2025)

Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
by: Zhao, Shanshan, et al.
Published: (2025)

Period-LLM: Extending the Periodic Capability of Multimodal Large Language Model
by: Zhang, Yuting, et al.
Published: (2025)

Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
by: Liao, Kang, et al.
Published: (2025)

Steering Visual Generation in Unified Multimodal Models with Understanding Supervision
by: Liu, Zeyu, et al.
Published: (2026)

Math Blind: Failures in Diagram Understanding Undermine Reasoning in MLLMs
by: Sun, Yanpeng, et al.
Published: (2025)

CLEAR: Unlocking Generative Potential for Degraded Image Understanding in Unified Multimodal Models
by: Hao, Xiangzhao, et al.
Published: (2026)

No Safe Dose: How Training Data Drives Unsafe Image Generation
by: Friedrich, Felix, et al.
Published: (2026)

Beyond the Safety Tax: Mitigating Unsafe Text-to-Image Generation via External Safety Rectification
by: Meng, Xiangtao, et al.
Published: (2025)

OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models
by: Zou, Jialv, et al.
Published: (2025)

Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
by: Wu, Size, et al.
Published: (2025)

UniEval: Unified Holistic Evaluation for Unified Multimodal Understanding and Generation
by: Li, Yi, et al.
Published: (2025)

LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model
by: AI, Inclusion, et al.
Published: (2026)

InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing
by: Tian, Changyao, et al.
Published: (2026)

Omni-Weather: A Unified Multimodal Model for Weather Radar Understanding and Generation
by: Zhou, Zhiwang, et al.
Published: (2025)

NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation
by: Zhang, Huichao, et al.
Published: (2026)

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
by: Xie, Jinheng, et al.
Published: (2024)

Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device
by: Shaker, Abdelrahman, et al.
Published: (2026)

Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation
by: Li, Shufan, et al.
Published: (2025)

VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model
by: Zhuang, Xianwei, et al.
Published: (2025)

MedXChat: A Unified Multimodal Large Language Model Framework towards CXRs Understanding and Generation
by: Yang, Ling, et al.
Published: (2023)

HBridge: H-Shape Bridging of Heterogeneous Experts for Unified Multimodal Understanding and Generation
by: Wang, Xiang, et al.
Published: (2025)

UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing
by: Tang, Hao, et al.
Published: (2025)

FEALLM: Advancing Facial Emotion Analysis in Multimodal Large Language Models with Emotional Synergy and Reasoning
by: Hu, Zhuozhao, et al.
Published: (2025)

QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation
by: Zhao, Yue, et al.
Published: (2025)

UniCMs: A Unified Consistency Model For Efficient Multimodal Generation and Understanding
by: Xu, Chenkai, et al.
Published: (2025)

Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion
by: Li, Lijiang, et al.
Published: (2026)

Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation
by: Lu, Yanzuo, et al.
Published: (2025)

EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture
by: He, Xin, et al.
Published: (2025)

OpenUni: A Simple Baseline for Unified Multimodal Understanding and Generation
by: Wu, Size, et al.
Published: (2025)

UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation
by: Li, Teng, et al.
Published: (2025)

Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
by: Jiang, Jingjing, et al.
Published: (2025)