Saved in:
| Main Authors: | Tang, Hao, Xie, Chenwei, Bao, Xiaoyi, Weng, Tingyu, Li, Pandeng, Zheng, Yun, Wang, Liwei |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.23278 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface
by: Tang, Hao, et al.
Published: (2025)
by: Tang, Hao, et al.
Published: (2025)
DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding
by: Bao, Xiaoyi, et al.
Published: (2025)
by: Bao, Xiaoyi, et al.
Published: (2025)
UniLiPs: Unified LiDAR Pseudo-Labeling with Geometry-Grounded Dynamic Scene Decomposition
by: Ghilotti, Filippo, et al.
Published: (2026)
by: Ghilotti, Filippo, et al.
Published: (2026)
UniVideo: Unified Understanding, Generation, and Editing for Videos
by: Wei, Cong, et al.
Published: (2025)
by: Wei, Cong, et al.
Published: (2025)
UniMesh: Unifying 3D Mesh Understanding and Generation
by: Huang, Peng, et al.
Published: (2026)
by: Huang, Peng, et al.
Published: (2026)
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation
by: Li, Teng, et al.
Published: (2025)
by: Li, Teng, et al.
Published: (2025)
UniEval: Unified Holistic Evaluation for Unified Multimodal Understanding and Generation
by: Li, Yi, et al.
Published: (2025)
by: Li, Yi, et al.
Published: (2025)
EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture
by: He, Xin, et al.
Published: (2025)
by: He, Xin, et al.
Published: (2025)
Unified Personalized Understanding, Generating and Editing
by: Zhong, Yu, et al.
Published: (2026)
by: Zhong, Yu, et al.
Published: (2026)
OpenUni: A Simple Baseline for Unified Multimodal Understanding and Generation
by: Wu, Size, et al.
Published: (2025)
by: Wu, Size, et al.
Published: (2025)
ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement
by: Liu, Zhihang, et al.
Published: (2025)
by: Liu, Zhihang, et al.
Published: (2025)
UniModel: A Visual-Only Framework for Unified Multimodal Understanding and Generation
by: Zhang, Chi, et al.
Published: (2025)
by: Zhang, Chi, et al.
Published: (2025)
UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing
by: Li, Yiheng, et al.
Published: (2024)
by: Li, Yiheng, et al.
Published: (2024)
Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning
by: Zheng, Dian, et al.
Published: (2026)
by: Zheng, Dian, et al.
Published: (2026)
UniCrossAdapter: Multimodal Adaptation of CLIP for Radiology Report Generation
by: Chen, Yaxiong, et al.
Published: (2025)
by: Chen, Yaxiong, et al.
Published: (2025)
Skywork UniPic: Unified Autoregressive Modeling for Visual Understanding and Generation
by: Wang, Peiyu, et al.
Published: (2025)
by: Wang, Peiyu, et al.
Published: (2025)
GenAgent: Scaling Text-to-Image Generation via Agentic Multimodal Reasoning
by: Jiang, Kaixun, et al.
Published: (2026)
by: Jiang, Kaixun, et al.
Published: (2026)
UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation
by: Tian, Rui, et al.
Published: (2025)
by: Tian, Rui, et al.
Published: (2025)
UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding
by: Jiao, Yang, et al.
Published: (2025)
by: Jiao, Yang, et al.
Published: (2025)
UniCMs: A Unified Consistency Model For Efficient Multimodal Generation and Understanding
by: Xu, Chenkai, et al.
Published: (2025)
by: Xu, Chenkai, et al.
Published: (2025)
UniEdit-I: Training-free Image Editing for Unified VLM via Iterative Understanding, Editing and Verifying
by: Bai, Chengyu, et al.
Published: (2025)
by: Bai, Chengyu, et al.
Published: (2025)
UniSVG: A Unified Dataset for Vector Graphic Understanding and Generation with Multimodal Large Language Models
by: Li, Jinke, et al.
Published: (2025)
by: Li, Jinke, et al.
Published: (2025)
UniCode$^2$: Cascaded Large-scale Codebooks for Unified Multimodal Understanding and Generation
by: Chen, Yanzhe, et al.
Published: (2025)
by: Chen, Yanzhe, et al.
Published: (2025)
Aligned Better, Listen Better for Audio-Visual Large Language Models
by: Guo, Yuxin, et al.
Published: (2025)
by: Guo, Yuxin, et al.
Published: (2025)
UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving
by: Lu, Hao, et al.
Published: (2025)
by: Lu, Hao, et al.
Published: (2025)
CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation
by: Zhu, Wenqi, et al.
Published: (2024)
by: Zhu, Wenqi, et al.
Published: (2024)
UniAlignment: Semantic Alignment for Unified Image Generation, Understanding, Manipulation and Perception
by: Song, Xinyang, et al.
Published: (2025)
by: Song, Xinyang, et al.
Published: (2025)
UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing
by: Fu, Tsu-Jui, et al.
Published: (2025)
by: Fu, Tsu-Jui, et al.
Published: (2025)
Tele-Omni: a Unified Multimodal Framework for Video Generation and Editing
by: Liu, Jialun, et al.
Published: (2026)
by: Liu, Jialun, et al.
Published: (2026)
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model
by: AI, Inclusion, et al.
Published: (2026)
by: AI, Inclusion, et al.
Published: (2026)
UniX: Unifying Autoregression and Diffusion for Chest X-Ray Understanding and Generation
by: Zhang, Ruiheng, et al.
Published: (2026)
by: Zhang, Ruiheng, et al.
Published: (2026)
UniCTokens: Boosting Personalized Understanding and Generation via Unified Concept Tokens
by: An, Ruichuan, et al.
Published: (2025)
by: An, Ruichuan, et al.
Published: (2025)
Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration
by: Pan, Kaihang, et al.
Published: (2024)
by: Pan, Kaihang, et al.
Published: (2024)
Unified Reward Model for Multimodal Understanding and Generation
by: Wang, Yibin, et al.
Published: (2025)
by: Wang, Yibin, et al.
Published: (2025)
UniHash: Unifying Pointwise and Pairwise Hashing Paradigms
by: Ma, Xiaoxu, et al.
Published: (2026)
by: Ma, Xiaoxu, et al.
Published: (2026)
InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing
by: Tian, Changyao, et al.
Published: (2026)
by: Tian, Changyao, et al.
Published: (2026)
Towards Generalized Multi-Image Editing for Unified Multimodal Models
by: Xu, Pengcheng, et al.
Published: (2026)
by: Xu, Pengcheng, et al.
Published: (2026)
UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths
by: Mao, Weijia, et al.
Published: (2025)
by: Mao, Weijia, et al.
Published: (2025)
AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection
by: Gao, Bin-Bin, et al.
Published: (2025)
by: Gao, Bin-Bin, et al.
Published: (2025)
UniM: A Unified Any-to-Any Interleaved Multimodal Benchmark
by: Li, Yanlin, et al.
Published: (2026)
by: Li, Yanlin, et al.
Published: (2026)
Similar Items
-
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface
by: Tang, Hao, et al.
Published: (2025) -
DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding
by: Bao, Xiaoyi, et al.
Published: (2025) -
UniLiPs: Unified LiDAR Pseudo-Labeling with Geometry-Grounded Dynamic Scene Decomposition
by: Ghilotti, Filippo, et al.
Published: (2026) -
UniVideo: Unified Understanding, Generation, and Editing for Videos
by: Wei, Cong, et al.
Published: (2025) -
UniMesh: Unifying 3D Mesh Understanding and Generation
by: Huang, Peng, et al.
Published: (2026)