Saved in:
| Main Authors: | Tian, Juanxi, Li, Siyuan, He, Conghui, Wu, Lijun, Tan, Cheng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.01816 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
by: Li, Siyuan, et al.
Published: (2025)
by: Li, Siyuan, et al.
Published: (2025)
WorldScore: A Unified Evaluation Benchmark for World Generation
by: Duan, Haoyi, et al.
Published: (2025)
by: Duan, Haoyi, et al.
Published: (2025)
Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility
by: Lin, Honglin, et al.
Published: (2026)
by: Lin, Honglin, et al.
Published: (2026)
GEBench: Benchmarking Image Generation Models as GUI Environments
by: Li, Haodong, et al.
Published: (2026)
by: Li, Haodong, et al.
Published: (2026)
UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
by: Lin, Bin, et al.
Published: (2025)
by: Lin, Bin, et al.
Published: (2025)
Are Unified Vision-Language Models Necessary: Generalization Across Understanding and Generation
by: Zhang, Jihai, et al.
Published: (2025)
by: Zhang, Jihai, et al.
Published: (2025)
GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding
by: Chen, Dongping, et al.
Published: (2024)
by: Chen, Dongping, et al.
Published: (2024)
A Survey on Mixup Augmentations and Beyond
by: Jin, Xin, et al.
Published: (2024)
by: Jin, Xin, et al.
Published: (2024)
iWorld-Bench: A Benchmark for Interactive World Models with a Unified Action Generation Framework
by: Fang, Jianjie, et al.
Published: (2026)
by: Fang, Jianjie, et al.
Published: (2026)
MinerU-Popo: Universal Post-Processing Model for Structured Document Parsing
by: Xu, Bangrui, et al.
Published: (2026)
by: Xu, Bangrui, et al.
Published: (2026)
OpenMixup: Open Mixup Toolbox and Benchmark for Visual Representation Learning
by: Li, Siyuan, et al.
Published: (2022)
by: Li, Siyuan, et al.
Published: (2022)
HaploOmni: Unified Single Transformer for Multimodal Video Understanding and Generation
by: Xiao, Yicheng, et al.
Published: (2025)
by: Xiao, Yicheng, et al.
Published: (2025)
OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
by: Li, Yifei, et al.
Published: (2025)
by: Li, Yifei, et al.
Published: (2025)
UniEval: Unified Holistic Evaluation for Unified Multimodal Understanding and Generation
by: Li, Yi, et al.
Published: (2025)
by: Li, Yi, et al.
Published: (2025)
Envisioning global urban development with satellite imagery and generative AI
by: Sun, Kailai, et al.
Published: (2026)
by: Sun, Kailai, et al.
Published: (2026)
Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models
by: Zhang, YiFan, et al.
Published: (2024)
by: Zhang, YiFan, et al.
Published: (2024)
Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator
by: Qin, Luozheng, et al.
Published: (2026)
by: Qin, Luozheng, et al.
Published: (2026)
UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing
by: Wang, Dianyi, et al.
Published: (2026)
by: Wang, Dianyi, et al.
Published: (2026)
Understanding and Harnessing Sparsity in Unified Multimodal Models
by: He, Shwai, et al.
Published: (2025)
by: He, Shwai, et al.
Published: (2025)
Understanding Semantic Perturbations on In-Processing Generative Image Watermarks
by: Nakra, Anirudh, et al.
Published: (2026)
by: Nakra, Anirudh, et al.
Published: (2026)
Steering Visual Generation in Unified Multimodal Models with Understanding Supervision
by: Liu, Zeyu, et al.
Published: (2026)
by: Liu, Zeyu, et al.
Published: (2026)
MMTABREAL: Real-World Benchmark for Multimodal Table Understanding
by: Titiya, Prasham, et al.
Published: (2025)
by: Titiya, Prasham, et al.
Published: (2025)
SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation
by: Chen, Siqi, et al.
Published: (2025)
by: Chen, Siqi, et al.
Published: (2025)
UniTok: A Unified Tokenizer for Visual Generation and Understanding
by: Ma, Chuofan, et al.
Published: (2025)
by: Ma, Chuofan, et al.
Published: (2025)
Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline
by: Li, Haiyang, et al.
Published: (2025)
by: Li, Haiyang, et al.
Published: (2025)
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation
by: Zhang, Huichao, et al.
Published: (2026)
by: Zhang, Huichao, et al.
Published: (2026)
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
by: Qu, Liao, et al.
Published: (2024)
by: Qu, Liao, et al.
Published: (2024)
Causality Model for Semantic Understanding on Videos
by: Yicong, Li
Published: (2025)
by: Yicong, Li
Published: (2025)
Archon: A Unified Multimodal Model for Holistic Digital Human Generation
by: Bao, Chong, et al.
Published: (2026)
by: Bao, Chong, et al.
Published: (2026)
CausalStep: A Benchmark for Explicit Stepwise Causal Reasoning in Videos
by: Li, Xuchen, et al.
Published: (2025)
by: Li, Xuchen, et al.
Published: (2025)
Free Lunch for Unified Multimodal Models: Enhancing Generation via Reflective Rectification with Inherent Understanding
by: Jiang, Yibo, et al.
Published: (2026)
by: Jiang, Yibo, et al.
Published: (2026)
Video-Bench: Human-Aligned Video Generation Benchmark
by: Han, Hui, et al.
Published: (2025)
by: Han, Hui, et al.
Published: (2025)
LVBench: An Extreme Long Video Understanding Benchmark
by: Wang, Weihan, et al.
Published: (2024)
by: Wang, Weihan, et al.
Published: (2024)
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
by: Wu, Chengyue, et al.
Published: (2024)
by: Wu, Chengyue, et al.
Published: (2024)
VIGC: Visual Instruction Generation and Correction
by: Wang, Bin, et al.
Published: (2023)
by: Wang, Bin, et al.
Published: (2023)
PointCoT: A Multi-modal Benchmark for Explicit 3D Geometric Reasoning
by: Zhang, Dongxu, et al.
Published: (2026)
by: Zhang, Dongxu, et al.
Published: (2026)
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios
by: Zhou, Baichuan, et al.
Published: (2024)
by: Zhou, Baichuan, et al.
Published: (2024)
CausalAffect: Causal Discovery for Facial Affective Understanding
by: Hu, Guanyu, et al.
Published: (2025)
by: Hu, Guanyu, et al.
Published: (2025)
UniCMs: A Unified Consistency Model For Efficient Multimodal Generation and Understanding
by: Xu, Chenkai, et al.
Published: (2025)
by: Xu, Chenkai, et al.
Published: (2025)
Dual Diffusion for Unified Image Generation and Understanding
by: Li, Zijie, et al.
Published: (2024)
by: Li, Zijie, et al.
Published: (2024)
Similar Items
-
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
by: Li, Siyuan, et al.
Published: (2025) -
WorldScore: A Unified Evaluation Benchmark for World Generation
by: Duan, Haoyi, et al.
Published: (2025) -
Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility
by: Lin, Honglin, et al.
Published: (2026) -
GEBench: Benchmarking Image Generation Models as GUI Environments
by: Li, Haodong, et al.
Published: (2026) -
UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
by: Lin, Bin, et al.
Published: (2025)