Saved in:
| Main Authors: | Yang, Ling, Wang, Zhanyu, Chen, Zhenghao, Liang, Xinyu, Zhou, Luping |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2312.02233 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation
by: Wang, Zhanyu, et al.
Published: (2023)
by: Wang, Zhanyu, et al.
Published: (2023)
S-RRG-Bench: Structured Radiology Report Generation with Fine-Grained Evaluation Framework
by: Li, Yingshu, et al.
Published: (2025)
by: Li, Yingshu, et al.
Published: (2025)
KARGEN: Knowledge-enhanced Automated Radiology Report Generation Using Large Language Models
by: Li, Yingshu, et al.
Published: (2024)
by: Li, Yingshu, et al.
Published: (2024)
Reversing the Flow: Generation-to-Understanding Synergy in Large Multimodal Models
by: Tong, Yujun, et al.
Published: (2026)
by: Tong, Yujun, et al.
Published: (2026)
VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model
by: Zhuang, Xianwei, et al.
Published: (2025)
by: Zhuang, Xianwei, et al.
Published: (2025)
UniModel: A Visual-Only Framework for Unified Multimodal Understanding and Generation
by: Zhang, Chi, et al.
Published: (2025)
by: Zhang, Chi, et al.
Published: (2025)
MedMO: Grounding and Understanding Multimodal Large Language Model for Medical Images
by: Deria, Ankan, et al.
Published: (2026)
by: Deria, Ankan, et al.
Published: (2026)
Hulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding
by: Jiang, Songtao, et al.
Published: (2025)
by: Jiang, Songtao, et al.
Published: (2025)
A Systematic Evaluation of GPT-4V's Multimodal Capability for Medical Image Analysis
by: Li, Yingshu, et al.
Published: (2023)
by: Li, Yingshu, et al.
Published: (2023)
PoinTramba: A Hybrid Transformer-Mamba Framework for Point Cloud Analysis
by: Wang, Zicheng, et al.
Published: (2024)
by: Wang, Zicheng, et al.
Published: (2024)
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
by: Xie, Wulin, et al.
Published: (2025)
by: Xie, Wulin, et al.
Published: (2025)
Group-aware Parameter-efficient Updating for Content-Adaptive Neural Video Compression
by: Chen, Zhenghao, et al.
Published: (2024)
by: Chen, Zhenghao, et al.
Published: (2024)
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model
by: AI, Inclusion, et al.
Published: (2026)
by: AI, Inclusion, et al.
Published: (2026)
A Unified Image-Dense Annotation Generation Model for Underwater Scenes
by: Lin, Hongkai, et al.
Published: (2025)
by: Lin, Hongkai, et al.
Published: (2025)
Can Large Vision-Language Models Understand Multimodal Sarcasm?
by: Wang, Xinyu, et al.
Published: (2025)
by: Wang, Xinyu, et al.
Published: (2025)
Large Language Model-Driven Distributed Integrated Multimodal Sensing and Semantic Communications
by: Peng, Yubo, et al.
Published: (2025)
by: Peng, Yubo, et al.
Published: (2025)
MedSeg-R: Reasoning Segmentation in Medical Images with Multimodal Large Language Models
by: Huang, Yu, et al.
Published: (2025)
by: Huang, Yu, et al.
Published: (2025)
Evaluating Attribute Comprehension in Large Vision-Language Models
by: Zhang, Haiwen, et al.
Published: (2024)
by: Zhang, Haiwen, et al.
Published: (2024)
Unified Reward Model for Multimodal Understanding and Generation
by: Wang, Yibin, et al.
Published: (2025)
by: Wang, Yibin, et al.
Published: (2025)
UniCode$^2$: Cascaded Large-scale Codebooks for Unified Multimodal Understanding and Generation
by: Chen, Yanzhe, et al.
Published: (2025)
by: Chen, Yanzhe, et al.
Published: (2025)
MammothModa2: A Unified AR-Diffusion Framework for Multimodal Understanding and Generation
by: Shen, Tao, et al.
Published: (2025)
by: Shen, Tao, et al.
Published: (2025)
Learning to Generate via Understanding: Understanding-Driven Intrinsic Rewarding for Unified Multimodal Models
by: Pan, Jiadong, et al.
Published: (2026)
by: Pan, Jiadong, et al.
Published: (2026)
UniSVG: A Unified Dataset for Vector Graphic Understanding and Generation with Multimodal Large Language Models
by: Li, Jinke, et al.
Published: (2025)
by: Li, Jinke, et al.
Published: (2025)
UniBrain: A Unified Model for Cross-Subject Brain Decoding
by: Wang, Zicheng, et al.
Published: (2024)
by: Wang, Zicheng, et al.
Published: (2024)
Med-Evo: Test-time Self-evolution for Medical Multimodal Large Language Models
by: Xu, Dunyuan, et al.
Published: (2026)
by: Xu, Dunyuan, et al.
Published: (2026)
SynerMedGen: Synergizing Medical Multimodal Understanding with Generation via Task Alignment
by: Zhao, Weiren, et al.
Published: (2026)
by: Zhao, Weiren, et al.
Published: (2026)
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
by: Zhao, Shanshan, et al.
Published: (2025)
by: Zhao, Shanshan, et al.
Published: (2025)
Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation
by: Li, Shufan, et al.
Published: (2025)
by: Li, Shufan, et al.
Published: (2025)
Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation
by: Lu, Yanzuo, et al.
Published: (2025)
by: Lu, Yanzuo, et al.
Published: (2025)
Omni-Weather: A Unified Multimodal Model for Weather Radar Understanding and Generation
by: Zhou, Zhiwang, et al.
Published: (2025)
by: Zhou, Zhiwang, et al.
Published: (2025)
LLaDA-MedV: Exploring Large Language Diffusion Models for Biomedical Image Understanding
by: Dong, Xuanzhao, et al.
Published: (2025)
by: Dong, Xuanzhao, et al.
Published: (2025)
Semi-MedRef: Semi-Supervised Medical Referring Image Segmentation with Cross-Modal Alignment
by: Li, Yuchen, et al.
Published: (2026)
by: Li, Yuchen, et al.
Published: (2026)
Steering Visual Generation in Unified Multimodal Models with Understanding Supervision
by: Liu, Zeyu, et al.
Published: (2026)
by: Liu, Zeyu, et al.
Published: (2026)
Seeing is Believing? Mitigating OCR Hallucinations in Multimodal Large Language Models
by: He, Zhentao, et al.
Published: (2025)
by: He, Zhentao, et al.
Published: (2025)
Decoding the Delta: Unifying Remote Sensing Change Detection and Understanding with Multimodal Large Language Models
by: Li, Xiaohe, et al.
Published: (2026)
by: Li, Xiaohe, et al.
Published: (2026)
InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing
by: Tian, Changyao, et al.
Published: (2026)
by: Tian, Changyao, et al.
Published: (2026)
Attention-guided Fine-tuning of Multimodal Large Language Models Improves Chain-of-Thought Reasoning
by: Sinha, Sanchit, et al.
Published: (2026)
by: Sinha, Sanchit, et al.
Published: (2026)
MM-R1: Unleashing the Power of Unified Multimodal Large Language Models for Personalized Image Generation
by: Liang, Qian, et al.
Published: (2025)
by: Liang, Qian, et al.
Published: (2025)
UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models
by: Pan, Hewen, et al.
Published: (2025)
by: Pan, Hewen, et al.
Published: (2025)
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
by: Wu, Size, et al.
Published: (2025)
by: Wu, Size, et al.
Published: (2025)
Similar Items
-
GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation
by: Wang, Zhanyu, et al.
Published: (2023) -
S-RRG-Bench: Structured Radiology Report Generation with Fine-Grained Evaluation Framework
by: Li, Yingshu, et al.
Published: (2025) -
KARGEN: Knowledge-enhanced Automated Radiology Report Generation Using Large Language Models
by: Li, Yingshu, et al.
Published: (2024) -
Reversing the Flow: Generation-to-Understanding Synergy in Large Multimodal Models
by: Tong, Yujun, et al.
Published: (2026) -
VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model
by: Zhuang, Xianwei, et al.
Published: (2025)