:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Yu, Songsong, Chen, Yuxin, Shan, Ying, Li, Yanwei
Format:	Preprint
Veröffentlicht:	2026
Schlagworte:	Computer Vision and Pattern Recognition Artificial Intelligence
Online-Zugang:	https://arxiv.org/abs/2605.18714
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

HaploOmni: Unified Single Transformer for Multimodal Video Understanding and Generation
von: Xiao, Yicheng, et al.
Veröffentlicht: (2025)

UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing
von: Li, Yiheng, et al.
Veröffentlicht: (2024)

Enhancing Alignment for Unified Multimodal Models via Semantically-Grounded Supervision
von: Kim, Jiyeong, et al.
Veröffentlicht: (2026)

Video-As-Prompt: Unified Semantic Control for Video Generation
von: Bian, Yuxuan, et al.
Veröffentlicht: (2025)

Discrete Diffusion Models with MLLMs for Unified Medical Multimodal Generation
von: Mao, Jiawei, et al.
Veröffentlicht: (2025)

JoPano: Unified Panorama Generation via Joint Modeling
von: Feng, Wancheng, et al.
Veröffentlicht: (2025)

UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision
von: Han, Ruiyan, et al.
Veröffentlicht: (2026)

Interact3D: Compositional 3D Generation of Interactive Objects
von: Shan, Hui, et al.
Veröffentlicht: (2026)

UniEval: Unified Holistic Evaluation for Unified Multimodal Understanding and Generation
von: Li, Yi, et al.
Veröffentlicht: (2025)

SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
von: Chen, Zisheng, et al.
Veröffentlicht: (2025)

NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation
von: Zhang, Huichao, et al.
Veröffentlicht: (2026)

Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
von: Chen, Xiaokang, et al.
Veröffentlicht: (2025)

Archon: A Unified Multimodal Model for Holistic Digital Human Generation
von: Bao, Chong, et al.
Veröffentlicht: (2026)

UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?
von: Wen, Zimo, et al.
Veröffentlicht: (2026)

UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding
von: Jiao, Yang, et al.
Veröffentlicht: (2025)

A Visual Semantic Adaptive Watermark grounded by Prefix-Tuning for Large Vision-Language Model
von: Zheng, Qi, et al.
Veröffentlicht: (2026)

Steering Visual Generation in Unified Multimodal Models with Understanding Supervision
von: Liu, Zeyu, et al.
Veröffentlicht: (2026)

UniCMs: A Unified Consistency Model For Efficient Multimodal Generation and Understanding
von: Xu, Chenkai, et al.
Veröffentlicht: (2025)

OmniCam: Unified Multimodal Video Generation via Camera Control
von: Yang, Xiaoda, et al.
Veröffentlicht: (2025)

UniSVG: A Unified Dataset for Vector Graphic Understanding and Generation with Multimodal Large Language Models
von: Li, Jinke, et al.
Veröffentlicht: (2025)

Understanding and Harnessing Sparsity in Unified Multimodal Models
von: He, Shwai, et al.
Veröffentlicht: (2025)

NI-Tex: Non-isometric Image-based Garment Texture Generation
von: Shan, Hui, et al.
Veröffentlicht: (2025)

LARGO: Low-Rank Regulated Gradient Projection for Robust Parameter Efficient Fine-Tuning
von: Zhang, Haotian, et al.
Veröffentlicht: (2025)

VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
von: Zhuang, Xianwei, et al.
Veröffentlicht: (2025)

DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing
von: Wang, Dianyi, et al.
Veröffentlicht: (2026)

MENTOR: Efficient Multimodal-Conditioned Tuning for Autoregressive Vision Generation Models
von: Zhao, Haozhe, et al.
Veröffentlicht: (2025)

Are Unified Vision-Language Models Necessary: Generalization Across Understanding and Generation
von: Zhang, Jihai, et al.
Veröffentlicht: (2025)

EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
von: Qiu, Lu, et al.
Veröffentlicht: (2024)

Unified Modeling Enhanced Multimodal Learning for Precision Neuro-Oncology
von: Yi, Huahui, et al.
Veröffentlicht: (2024)

Lumina-OmniLV: A Unified Multimodal Framework for General Low-Level Vision
von: Pu, Yuandong, et al.
Veröffentlicht: (2025)

Free Lunch for Unified Multimodal Models: Enhancing Generation via Reflective Rectification with Inherent Understanding
von: Jiang, Yibo, et al.
Veröffentlicht: (2026)

Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
von: Wu, Chengyue, et al.
Veröffentlicht: (2024)

Instinct vs. Reflection: Unifying Token and Verbalized Confidence in Multimodal Large Models
von: Dang, Yunkai, et al.
Veröffentlicht: (2026)

IQAGPT: Image Quality Assessment with Vision-language and ChatGPT Models
von: Chen, Zhihao, et al.
Veröffentlicht: (2023)

PixelBytes: Catching Unified Representation for Multimodal Generation
von: Furfaro, Fabien
Veröffentlicht: (2024)

Enhancing Multimodal Unified Representations for Cross Modal Generalization
von: Huang, Hai, et al.
Veröffentlicht: (2024)

PixelBytes: Catching Unified Embedding for Multimodal Generation
von: Furfaro, Fabien
Veröffentlicht: (2024)

Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation
von: AI, Inclusion, et al.
Veröffentlicht: (2025)

Lance: Unified Multimodal Modeling by Multi-Task Synergy
von: Fu, Fengyi, et al.
Veröffentlicht: (2026)

Taxonomy-Aware Representation Alignment for Hierarchical Visual Recognition with Large Multimodal Models
von: He, Hulingxiao, et al.
Veröffentlicht: (2026)