:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Tian, Juanxi, Li, Siyuan, He, Conghui, Wu, Lijun, Tan, Cheng
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2512.01816
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
by: Li, Siyuan, et al.
Published: (2025)

WorldScore: A Unified Evaluation Benchmark for World Generation
by: Duan, Haoyi, et al.
Published: (2025)

Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility
by: Lin, Honglin, et al.
Published: (2026)

GEBench: Benchmarking Image Generation Models as GUI Environments
by: Li, Haodong, et al.
Published: (2026)

UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
by: Lin, Bin, et al.
Published: (2025)

Are Unified Vision-Language Models Necessary: Generalization Across Understanding and Generation
by: Zhang, Jihai, et al.
Published: (2025)

GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding
by: Chen, Dongping, et al.
Published: (2024)

A Survey on Mixup Augmentations and Beyond
by: Jin, Xin, et al.
Published: (2024)

iWorld-Bench: A Benchmark for Interactive World Models with a Unified Action Generation Framework
by: Fang, Jianjie, et al.
Published: (2026)

MinerU-Popo: Universal Post-Processing Model for Structured Document Parsing
by: Xu, Bangrui, et al.
Published: (2026)

OpenMixup: Open Mixup Toolbox and Benchmark for Visual Representation Learning
by: Li, Siyuan, et al.
Published: (2022)

HaploOmni: Unified Single Transformer for Multimodal Video Understanding and Generation
by: Xiao, Yicheng, et al.
Published: (2025)

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
by: Li, Yifei, et al.
Published: (2025)

UniEval: Unified Holistic Evaluation for Unified Multimodal Understanding and Generation
by: Li, Yi, et al.
Published: (2025)

Envisioning global urban development with satellite imagery and generative AI
by: Sun, Kailai, et al.
Published: (2026)

Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models
by: Zhang, YiFan, et al.
Published: (2024)

Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator
by: Qin, Luozheng, et al.
Published: (2026)

UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing
by: Wang, Dianyi, et al.
Published: (2026)

Understanding and Harnessing Sparsity in Unified Multimodal Models
by: He, Shwai, et al.
Published: (2025)

Understanding Semantic Perturbations on In-Processing Generative Image Watermarks
by: Nakra, Anirudh, et al.
Published: (2026)

Steering Visual Generation in Unified Multimodal Models with Understanding Supervision
by: Liu, Zeyu, et al.
Published: (2026)

MMTABREAL: Real-World Benchmark for Multimodal Table Understanding
by: Titiya, Prasham, et al.
Published: (2025)

SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation
by: Chen, Siqi, et al.
Published: (2025)

UniTok: A Unified Tokenizer for Visual Generation and Understanding
by: Ma, Chuofan, et al.
Published: (2025)

Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline
by: Li, Haiyang, et al.
Published: (2025)

NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation
by: Zhang, Huichao, et al.
Published: (2026)

TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
by: Qu, Liao, et al.
Published: (2024)

Causality Model for Semantic Understanding on Videos
by: Yicong, Li
Published: (2025)

Archon: A Unified Multimodal Model for Holistic Digital Human Generation
by: Bao, Chong, et al.
Published: (2026)

CausalStep: A Benchmark for Explicit Stepwise Causal Reasoning in Videos
by: Li, Xuchen, et al.
Published: (2025)

Free Lunch for Unified Multimodal Models: Enhancing Generation via Reflective Rectification with Inherent Understanding
by: Jiang, Yibo, et al.
Published: (2026)

Video-Bench: Human-Aligned Video Generation Benchmark
by: Han, Hui, et al.
Published: (2025)

LVBench: An Extreme Long Video Understanding Benchmark
by: Wang, Weihan, et al.
Published: (2024)

Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
by: Wu, Chengyue, et al.
Published: (2024)

VIGC: Visual Instruction Generation and Correction
by: Wang, Bin, et al.
Published: (2023)

PointCoT: A Multi-modal Benchmark for Explicit 3D Geometric Reasoning
by: Zhang, Dongxu, et al.
Published: (2026)

UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios
by: Zhou, Baichuan, et al.
Published: (2024)

CausalAffect: Causal Discovery for Facial Affective Understanding
by: Hu, Guanyu, et al.
Published: (2025)

UniCMs: A Unified Consistency Model For Efficient Multimodal Generation and Understanding
by: Xu, Chenkai, et al.
Published: (2025)

Dual Diffusion for Unified Image Generation and Understanding
by: Li, Zijie, et al.
Published: (2024)