:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zeng, Weixuan, Wei, Pengcheng, Wang, Huaiqing, Zhang, Boheng, Sun, Jia, Fan, Dewen, HE, Lin, Chen, Long, Gan, Qianqian, Yang, Fan, Gao, Tingting
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.19643
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

OmniVTON: Training-Free Universal Virtual Try-On
by: Yang, Zhaotong, et al.
Published: (2025)

HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context
by: Yang, Qize, et al.
Published: (2025)

OmniVTON++: Training-Free Universal Virtual Try-On with Principal Pose Guidance
by: Yang, Zhaotong, et al.
Published: (2026)

CaC: Advancing Video Reward Models via Hierarchical Spatiotemporal Concentrating
by: Wang, Jiyuan, et al.
Published: (2026)

OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding
by: Yao, Jiali, et al.
Published: (2025)

HumanOmni-Speaker: Identifying Who said What and When
by: Bai, Detao, et al.
Published: (2026)

Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion
by: Li, Lijiang, et al.
Published: (2026)

OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding
by: Xi, Dianbing, et al.
Published: (2025)

OmniDPO: A Preference Optimization Framework to Address Omni-Modal Hallucination
by: Chen, Junzhe, et al.
Published: (2025)

OmniVGGT: Omni-Modality Driven Visual Geometry Grounded Transformer
by: Peng, Haosong, et al.
Published: (2025)

OmniPSD: Layered PSD Generation with Diffusion Transformer
by: Liu, Cheng, et al.
Published: (2025)

OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation
by: Zhang, Guohui, et al.
Published: (2026)

SkyReels-Audio: Omni Audio-Conditioned Talking Portraits in Video Diffusion Transformers
by: Fei, Zhengcong, et al.
Published: (2025)

Baichuan-Omni Technical Report
by: Li, Yadong, et al.
Published: (2024)

DiT-VTON: Diffusion Transformer Framework for Unified Multi-Category Virtual Try-On and Virtual Try-All with Integrated Image Editing
by: Li, Qi, et al.
Published: (2025)

CASC: Condition-Aware Semantic Communication with Latent Diffusion Models
by: Chen, Weixuan, et al.
Published: (2024)

OmniEncoder: See, Hear, and Feel Continuous Motion Like Humans With One Encoder
by: Bai, Detao, et al.
Published: (2026)

Omni-directional attention mechanism based on Mamba for speech separation
by: Xue, Ke, et al.
Published: (2026)

Logics-Parsing-Omni Technical Report
by: An, Xin, et al.
Published: (2026)

Context Unrolling in Omni Models
by: Yang, Ceyuan, et al.
Published: (2026)

MC-VTON: Minimal Control Virtual Try-On Diffusion Transformer
by: Luan, Junsheng, et al.
Published: (2025)

OmniPlay: Benchmarking Omni-Modal Models on Omni-Modal Game Playing
by: Bie, Fuqing, et al.
Published: (2025)

Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding
by: Xin, Yi, et al.
Published: (2025)

Grid: Omni Visual Generation
by: Wan, Cong, et al.
Published: (2024)

MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech
by: Wang, Chengyao, et al.
Published: (2025)

OmniRe: Omni Urban Scene Reconstruction
by: Chen, Ziyu, et al.
Published: (2024)

SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models
by: Xie, Tianyu, et al.
Published: (2026)

OmniGUI: Benchmarking GUI Agents in Omni-Modal Smartphone Environments
by: Henry, Felix, et al.
Published: (2026)

OMCAT: Omni Context Aware Transformer
by: Goel, Arushi, et al.
Published: (2024)

Is Extending Modality The Right Path Towards Omni-Modality?
by: Zhu, Tinghui, et al.
Published: (2025)

R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcement Learning
by: Zhao, Jiaxing, et al.
Published: (2025)

OmniTrace: A Unified Framework for Generation-Time Attribution in Omni-Modal LLMs
by: Yan, Qianqi, et al.
Published: (2026)

More than the Sum: Panorama-Language Models for Adverse Omni-Scenes
by: Fan, Weijia, et al.
Published: (2026)

OmniSync: Towards Universal Lip Synchronization via Diffusion Transformers
by: Peng, Ziqiao, et al.
Published: (2025)

FlashOmni: A Unified Sparse Attention Engine for Diffusion Transformers
by: Qiao, Liang, et al.
Published: (2025)

OmniBench: Towards The Future of Universal Omni-Language Models
by: Li, Yizhi, et al.
Published: (2024)

VITA: Towards Open-Source Interactive Omni Multimodal LLM
by: Fu, Chaoyou, et al.
Published: (2024)

OmniJigsaw: Enhancing Omni-Modal Reasoning via Modality-Orchestrated Reordering
by: Jia, Yiduo, et al.
Published: (2026)

Omni-DeepSearch: A Benchmark for Audio-Driven Omni-Modal Deep Search
by: Yu, Tao, et al.
Published: (2026)

Omni-Captioner: Data Pipeline, Models, and Benchmark for Omni Detailed Perception
by: Ma, Ziyang, et al.
Published: (2025)