:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Tang, Hao, Xie, Chenwei, Bao, Xiaoyi, Weng, Tingyu, Li, Pandeng, Zheng, Yun, Wang, Liwei
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2507.23278
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface
by: Tang, Hao, et al.
Published: (2025)

DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding
by: Bao, Xiaoyi, et al.
Published: (2025)

UniLiPs: Unified LiDAR Pseudo-Labeling with Geometry-Grounded Dynamic Scene Decomposition
by: Ghilotti, Filippo, et al.
Published: (2026)

UniVideo: Unified Understanding, Generation, and Editing for Videos
by: Wei, Cong, et al.
Published: (2025)

UniMesh: Unifying 3D Mesh Understanding and Generation
by: Huang, Peng, et al.
Published: (2026)

UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation
by: Li, Teng, et al.
Published: (2025)

UniEval: Unified Holistic Evaluation for Unified Multimodal Understanding and Generation
by: Li, Yi, et al.
Published: (2025)

EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture
by: He, Xin, et al.
Published: (2025)

Unified Personalized Understanding, Generating and Editing
by: Zhong, Yu, et al.
Published: (2026)

OpenUni: A Simple Baseline for Unified Multimodal Understanding and Generation
by: Wu, Size, et al.
Published: (2025)

ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement
by: Liu, Zhihang, et al.
Published: (2025)

UniModel: A Visual-Only Framework for Unified Multimodal Understanding and Generation
by: Zhang, Chi, et al.
Published: (2025)

UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing
by: Li, Yiheng, et al.
Published: (2024)

Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning
by: Zheng, Dian, et al.
Published: (2026)

UniCrossAdapter: Multimodal Adaptation of CLIP for Radiology Report Generation
by: Chen, Yaxiong, et al.
Published: (2025)

Skywork UniPic: Unified Autoregressive Modeling for Visual Understanding and Generation
by: Wang, Peiyu, et al.
Published: (2025)

GenAgent: Scaling Text-to-Image Generation via Agentic Multimodal Reasoning
by: Jiang, Kaixun, et al.
Published: (2026)

UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation
by: Tian, Rui, et al.
Published: (2025)

UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding
by: Jiao, Yang, et al.
Published: (2025)

UniCMs: A Unified Consistency Model For Efficient Multimodal Generation and Understanding
by: Xu, Chenkai, et al.
Published: (2025)

UniEdit-I: Training-free Image Editing for Unified VLM via Iterative Understanding, Editing and Verifying
by: Bai, Chengyu, et al.
Published: (2025)

UniSVG: A Unified Dataset for Vector Graphic Understanding and Generation with Multimodal Large Language Models
by: Li, Jinke, et al.
Published: (2025)

UniCode$^2$: Cascaded Large-scale Codebooks for Unified Multimodal Understanding and Generation
by: Chen, Yanzhe, et al.
Published: (2025)

Aligned Better, Listen Better for Audio-Visual Large Language Models
by: Guo, Yuxin, et al.
Published: (2025)

UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving
by: Lu, Hao, et al.
Published: (2025)

CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation
by: Zhu, Wenqi, et al.
Published: (2024)

UniAlignment: Semantic Alignment for Unified Image Generation, Understanding, Manipulation and Perception
by: Song, Xinyang, et al.
Published: (2025)

UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing
by: Fu, Tsu-Jui, et al.
Published: (2025)

Tele-Omni: a Unified Multimodal Framework for Video Generation and Editing
by: Liu, Jialun, et al.
Published: (2026)

LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model
by: AI, Inclusion, et al.
Published: (2026)

UniX: Unifying Autoregression and Diffusion for Chest X-Ray Understanding and Generation
by: Zhang, Ruiheng, et al.
Published: (2026)

UniCTokens: Boosting Personalized Understanding and Generation via Unified Concept Tokens
by: An, Ruichuan, et al.
Published: (2025)

Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration
by: Pan, Kaihang, et al.
Published: (2024)

Unified Reward Model for Multimodal Understanding and Generation
by: Wang, Yibin, et al.
Published: (2025)

UniHash: Unifying Pointwise and Pairwise Hashing Paradigms
by: Ma, Xiaoxu, et al.
Published: (2026)

InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing
by: Tian, Changyao, et al.
Published: (2026)

Towards Generalized Multi-Image Editing for Unified Multimodal Models
by: Xu, Pengcheng, et al.
Published: (2026)

UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths
by: Mao, Weijia, et al.
Published: (2025)

AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection
by: Gao, Bin-Bin, et al.
Published: (2025)

UniM: A Unified Any-to-Any Interleaved Multimodal Benchmark
by: Li, Yanlin, et al.
Published: (2026)