:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yang, Ling, Wang, Zhanyu, Chen, Zhenghao, Liang, Xinyu, Zhou, Luping
Format:	Preprint
Published:	2023
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2312.02233
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation
by: Wang, Zhanyu, et al.
Published: (2023)

S-RRG-Bench: Structured Radiology Report Generation with Fine-Grained Evaluation Framework
by: Li, Yingshu, et al.
Published: (2025)

KARGEN: Knowledge-enhanced Automated Radiology Report Generation Using Large Language Models
by: Li, Yingshu, et al.
Published: (2024)

Reversing the Flow: Generation-to-Understanding Synergy in Large Multimodal Models
by: Tong, Yujun, et al.
Published: (2026)

VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model
by: Zhuang, Xianwei, et al.
Published: (2025)

UniModel: A Visual-Only Framework for Unified Multimodal Understanding and Generation
by: Zhang, Chi, et al.
Published: (2025)

MedMO: Grounding and Understanding Multimodal Large Language Model for Medical Images
by: Deria, Ankan, et al.
Published: (2026)

Hulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding
by: Jiang, Songtao, et al.
Published: (2025)

A Systematic Evaluation of GPT-4V's Multimodal Capability for Medical Image Analysis
by: Li, Yingshu, et al.
Published: (2023)

PoinTramba: A Hybrid Transformer-Mamba Framework for Point Cloud Analysis
by: Wang, Zicheng, et al.
Published: (2024)

MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
by: Xie, Wulin, et al.
Published: (2025)

Group-aware Parameter-efficient Updating for Content-Adaptive Neural Video Compression
by: Chen, Zhenghao, et al.
Published: (2024)

LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model
by: AI, Inclusion, et al.
Published: (2026)

A Unified Image-Dense Annotation Generation Model for Underwater Scenes
by: Lin, Hongkai, et al.
Published: (2025)

Can Large Vision-Language Models Understand Multimodal Sarcasm?
by: Wang, Xinyu, et al.
Published: (2025)

Large Language Model-Driven Distributed Integrated Multimodal Sensing and Semantic Communications
by: Peng, Yubo, et al.
Published: (2025)

MedSeg-R: Reasoning Segmentation in Medical Images with Multimodal Large Language Models
by: Huang, Yu, et al.
Published: (2025)

Evaluating Attribute Comprehension in Large Vision-Language Models
by: Zhang, Haiwen, et al.
Published: (2024)

Unified Reward Model for Multimodal Understanding and Generation
by: Wang, Yibin, et al.
Published: (2025)

UniCode$^2$: Cascaded Large-scale Codebooks for Unified Multimodal Understanding and Generation
by: Chen, Yanzhe, et al.
Published: (2025)

MammothModa2: A Unified AR-Diffusion Framework for Multimodal Understanding and Generation
by: Shen, Tao, et al.
Published: (2025)

Learning to Generate via Understanding: Understanding-Driven Intrinsic Rewarding for Unified Multimodal Models
by: Pan, Jiadong, et al.
Published: (2026)

UniSVG: A Unified Dataset for Vector Graphic Understanding and Generation with Multimodal Large Language Models
by: Li, Jinke, et al.
Published: (2025)

UniBrain: A Unified Model for Cross-Subject Brain Decoding
by: Wang, Zicheng, et al.
Published: (2024)

Med-Evo: Test-time Self-evolution for Medical Multimodal Large Language Models
by: Xu, Dunyuan, et al.
Published: (2026)

SynerMedGen: Synergizing Medical Multimodal Understanding with Generation via Task Alignment
by: Zhao, Weiren, et al.
Published: (2026)

Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
by: Zhao, Shanshan, et al.
Published: (2025)

Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation
by: Li, Shufan, et al.
Published: (2025)

Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation
by: Lu, Yanzuo, et al.
Published: (2025)

Omni-Weather: A Unified Multimodal Model for Weather Radar Understanding and Generation
by: Zhou, Zhiwang, et al.
Published: (2025)

LLaDA-MedV: Exploring Large Language Diffusion Models for Biomedical Image Understanding
by: Dong, Xuanzhao, et al.
Published: (2025)

Semi-MedRef: Semi-Supervised Medical Referring Image Segmentation with Cross-Modal Alignment
by: Li, Yuchen, et al.
Published: (2026)

Steering Visual Generation in Unified Multimodal Models with Understanding Supervision
by: Liu, Zeyu, et al.
Published: (2026)

Seeing is Believing? Mitigating OCR Hallucinations in Multimodal Large Language Models
by: He, Zhentao, et al.
Published: (2025)

Decoding the Delta: Unifying Remote Sensing Change Detection and Understanding with Multimodal Large Language Models
by: Li, Xiaohe, et al.
Published: (2026)

InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing
by: Tian, Changyao, et al.
Published: (2026)

Attention-guided Fine-tuning of Multimodal Large Language Models Improves Chain-of-Thought Reasoning
by: Sinha, Sanchit, et al.
Published: (2026)

MM-R1: Unleashing the Power of Unified Multimodal Large Language Models for Personalized Image Generation
by: Liang, Qian, et al.
Published: (2025)

UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models
by: Pan, Hewen, et al.
Published: (2025)

Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
by: Wu, Size, et al.
Published: (2025)