Saved in:
| Main Authors: | Wang, Kaishen, Huang, Heng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.27332 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ImAgent: A Unified Multimodal Agent Framework for Test-Time Scalable Image Generation
by: Wang, Kaishen, et al.
Published: (2025)
by: Wang, Kaishen, et al.
Published: (2025)
Unified Reward Model for Multimodal Understanding and Generation
by: Wang, Yibin, et al.
Published: (2025)
by: Wang, Yibin, et al.
Published: (2025)
Towards Understanding Unsafe Video Generation
by: Pang, Yan, et al.
Published: (2024)
by: Pang, Yan, et al.
Published: (2024)
To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now
by: Zhang, Yimeng, et al.
Published: (2023)
by: Zhang, Yimeng, et al.
Published: (2023)
Learning to Generate via Understanding: Understanding-Driven Intrinsic Rewarding for Unified Multimodal Models
by: Pan, Jiadong, et al.
Published: (2026)
by: Pan, Jiadong, et al.
Published: (2026)
UniModel: A Visual-Only Framework for Unified Multimodal Understanding and Generation
by: Zhang, Chi, et al.
Published: (2025)
by: Zhang, Chi, et al.
Published: (2025)
GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation
by: Wang, Zhanyu, et al.
Published: (2023)
by: Wang, Zhanyu, et al.
Published: (2023)
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
by: Xie, Wulin, et al.
Published: (2025)
by: Xie, Wulin, et al.
Published: (2025)
Why Text Prevails: Vision May Undermine Multimodal Medical Decision Making
by: Dai, Siyuan, et al.
Published: (2025)
by: Dai, Siyuan, et al.
Published: (2025)
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
by: Zhao, Shanshan, et al.
Published: (2025)
by: Zhao, Shanshan, et al.
Published: (2025)
Period-LLM: Extending the Periodic Capability of Multimodal Large Language Model
by: Zhang, Yuting, et al.
Published: (2025)
by: Zhang, Yuting, et al.
Published: (2025)
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
by: Liao, Kang, et al.
Published: (2025)
by: Liao, Kang, et al.
Published: (2025)
Steering Visual Generation in Unified Multimodal Models with Understanding Supervision
by: Liu, Zeyu, et al.
Published: (2026)
by: Liu, Zeyu, et al.
Published: (2026)
Math Blind: Failures in Diagram Understanding Undermine Reasoning in MLLMs
by: Sun, Yanpeng, et al.
Published: (2025)
by: Sun, Yanpeng, et al.
Published: (2025)
CLEAR: Unlocking Generative Potential for Degraded Image Understanding in Unified Multimodal Models
by: Hao, Xiangzhao, et al.
Published: (2026)
by: Hao, Xiangzhao, et al.
Published: (2026)
No Safe Dose: How Training Data Drives Unsafe Image Generation
by: Friedrich, Felix, et al.
Published: (2026)
by: Friedrich, Felix, et al.
Published: (2026)
Beyond the Safety Tax: Mitigating Unsafe Text-to-Image Generation via External Safety Rectification
by: Meng, Xiangtao, et al.
Published: (2025)
by: Meng, Xiangtao, et al.
Published: (2025)
OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models
by: Zou, Jialv, et al.
Published: (2025)
by: Zou, Jialv, et al.
Published: (2025)
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
by: Wu, Size, et al.
Published: (2025)
by: Wu, Size, et al.
Published: (2025)
UniEval: Unified Holistic Evaluation for Unified Multimodal Understanding and Generation
by: Li, Yi, et al.
Published: (2025)
by: Li, Yi, et al.
Published: (2025)
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model
by: AI, Inclusion, et al.
Published: (2026)
by: AI, Inclusion, et al.
Published: (2026)
InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing
by: Tian, Changyao, et al.
Published: (2026)
by: Tian, Changyao, et al.
Published: (2026)
Omni-Weather: A Unified Multimodal Model for Weather Radar Understanding and Generation
by: Zhou, Zhiwang, et al.
Published: (2025)
by: Zhou, Zhiwang, et al.
Published: (2025)
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation
by: Zhang, Huichao, et al.
Published: (2026)
by: Zhang, Huichao, et al.
Published: (2026)
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
by: Xie, Jinheng, et al.
Published: (2024)
by: Xie, Jinheng, et al.
Published: (2024)
Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device
by: Shaker, Abdelrahman, et al.
Published: (2026)
by: Shaker, Abdelrahman, et al.
Published: (2026)
Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation
by: Li, Shufan, et al.
Published: (2025)
by: Li, Shufan, et al.
Published: (2025)
VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model
by: Zhuang, Xianwei, et al.
Published: (2025)
by: Zhuang, Xianwei, et al.
Published: (2025)
MedXChat: A Unified Multimodal Large Language Model Framework towards CXRs Understanding and Generation
by: Yang, Ling, et al.
Published: (2023)
by: Yang, Ling, et al.
Published: (2023)
HBridge: H-Shape Bridging of Heterogeneous Experts for Unified Multimodal Understanding and Generation
by: Wang, Xiang, et al.
Published: (2025)
by: Wang, Xiang, et al.
Published: (2025)
UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing
by: Tang, Hao, et al.
Published: (2025)
by: Tang, Hao, et al.
Published: (2025)
FEALLM: Advancing Facial Emotion Analysis in Multimodal Large Language Models with Emotional Synergy and Reasoning
by: Hu, Zhuozhao, et al.
Published: (2025)
by: Hu, Zhuozhao, et al.
Published: (2025)
QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation
by: Zhao, Yue, et al.
Published: (2025)
by: Zhao, Yue, et al.
Published: (2025)
UniCMs: A Unified Consistency Model For Efficient Multimodal Generation and Understanding
by: Xu, Chenkai, et al.
Published: (2025)
by: Xu, Chenkai, et al.
Published: (2025)
Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion
by: Li, Lijiang, et al.
Published: (2026)
by: Li, Lijiang, et al.
Published: (2026)
Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation
by: Lu, Yanzuo, et al.
Published: (2025)
by: Lu, Yanzuo, et al.
Published: (2025)
EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture
by: He, Xin, et al.
Published: (2025)
by: He, Xin, et al.
Published: (2025)
OpenUni: A Simple Baseline for Unified Multimodal Understanding and Generation
by: Wu, Size, et al.
Published: (2025)
by: Wu, Size, et al.
Published: (2025)
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation
by: Li, Teng, et al.
Published: (2025)
by: Li, Teng, et al.
Published: (2025)
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
by: Jiang, Jingjing, et al.
Published: (2025)
by: Jiang, Jingjing, et al.
Published: (2025)
Similar Items
-
ImAgent: A Unified Multimodal Agent Framework for Test-Time Scalable Image Generation
by: Wang, Kaishen, et al.
Published: (2025) -
Unified Reward Model for Multimodal Understanding and Generation
by: Wang, Yibin, et al.
Published: (2025) -
Towards Understanding Unsafe Video Generation
by: Pang, Yan, et al.
Published: (2024) -
To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now
by: Zhang, Yimeng, et al.
Published: (2023) -
Learning to Generate via Understanding: Understanding-Driven Intrinsic Rewarding for Unified Multimodal Models
by: Pan, Jiadong, et al.
Published: (2026)