Saved in:
| Main Authors: | Zeng, Weixuan, Wei, Pengcheng, Wang, Huaiqing, Zhang, Boheng, Sun, Jia, Fan, Dewen, HE, Lin, Chen, Long, Gan, Qianqian, Yang, Fan, Gao, Tingting |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.19643 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
OmniVTON: Training-Free Universal Virtual Try-On
by: Yang, Zhaotong, et al.
Published: (2025)
by: Yang, Zhaotong, et al.
Published: (2025)
HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context
by: Yang, Qize, et al.
Published: (2025)
by: Yang, Qize, et al.
Published: (2025)
OmniVTON++: Training-Free Universal Virtual Try-On with Principal Pose Guidance
by: Yang, Zhaotong, et al.
Published: (2026)
by: Yang, Zhaotong, et al.
Published: (2026)
CaC: Advancing Video Reward Models via Hierarchical Spatiotemporal Concentrating
by: Wang, Jiyuan, et al.
Published: (2026)
by: Wang, Jiyuan, et al.
Published: (2026)
OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding
by: Yao, Jiali, et al.
Published: (2025)
by: Yao, Jiali, et al.
Published: (2025)
HumanOmni-Speaker: Identifying Who said What and When
by: Bai, Detao, et al.
Published: (2026)
by: Bai, Detao, et al.
Published: (2026)
Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion
by: Li, Lijiang, et al.
Published: (2026)
by: Li, Lijiang, et al.
Published: (2026)
OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding
by: Xi, Dianbing, et al.
Published: (2025)
by: Xi, Dianbing, et al.
Published: (2025)
OmniDPO: A Preference Optimization Framework to Address Omni-Modal Hallucination
by: Chen, Junzhe, et al.
Published: (2025)
by: Chen, Junzhe, et al.
Published: (2025)
OmniVGGT: Omni-Modality Driven Visual Geometry Grounded Transformer
by: Peng, Haosong, et al.
Published: (2025)
by: Peng, Haosong, et al.
Published: (2025)
OmniPSD: Layered PSD Generation with Diffusion Transformer
by: Liu, Cheng, et al.
Published: (2025)
by: Liu, Cheng, et al.
Published: (2025)
OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation
by: Zhang, Guohui, et al.
Published: (2026)
by: Zhang, Guohui, et al.
Published: (2026)
SkyReels-Audio: Omni Audio-Conditioned Talking Portraits in Video Diffusion Transformers
by: Fei, Zhengcong, et al.
Published: (2025)
by: Fei, Zhengcong, et al.
Published: (2025)
Baichuan-Omni Technical Report
by: Li, Yadong, et al.
Published: (2024)
by: Li, Yadong, et al.
Published: (2024)
DiT-VTON: Diffusion Transformer Framework for Unified Multi-Category Virtual Try-On and Virtual Try-All with Integrated Image Editing
by: Li, Qi, et al.
Published: (2025)
by: Li, Qi, et al.
Published: (2025)
CASC: Condition-Aware Semantic Communication with Latent Diffusion Models
by: Chen, Weixuan, et al.
Published: (2024)
by: Chen, Weixuan, et al.
Published: (2024)
OmniEncoder: See, Hear, and Feel Continuous Motion Like Humans With One Encoder
by: Bai, Detao, et al.
Published: (2026)
by: Bai, Detao, et al.
Published: (2026)
Omni-directional attention mechanism based on Mamba for speech separation
by: Xue, Ke, et al.
Published: (2026)
by: Xue, Ke, et al.
Published: (2026)
Logics-Parsing-Omni Technical Report
by: An, Xin, et al.
Published: (2026)
by: An, Xin, et al.
Published: (2026)
Context Unrolling in Omni Models
by: Yang, Ceyuan, et al.
Published: (2026)
by: Yang, Ceyuan, et al.
Published: (2026)
MC-VTON: Minimal Control Virtual Try-On Diffusion Transformer
by: Luan, Junsheng, et al.
Published: (2025)
by: Luan, Junsheng, et al.
Published: (2025)
OmniPlay: Benchmarking Omni-Modal Models on Omni-Modal Game Playing
by: Bie, Fuqing, et al.
Published: (2025)
by: Bie, Fuqing, et al.
Published: (2025)
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding
by: Xin, Yi, et al.
Published: (2025)
by: Xin, Yi, et al.
Published: (2025)
Grid: Omni Visual Generation
by: Wan, Cong, et al.
Published: (2024)
by: Wan, Cong, et al.
Published: (2024)
MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech
by: Wang, Chengyao, et al.
Published: (2025)
by: Wang, Chengyao, et al.
Published: (2025)
OmniRe: Omni Urban Scene Reconstruction
by: Chen, Ziyu, et al.
Published: (2024)
by: Chen, Ziyu, et al.
Published: (2024)
SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models
by: Xie, Tianyu, et al.
Published: (2026)
by: Xie, Tianyu, et al.
Published: (2026)
OmniGUI: Benchmarking GUI Agents in Omni-Modal Smartphone Environments
by: Henry, Felix, et al.
Published: (2026)
by: Henry, Felix, et al.
Published: (2026)
OMCAT: Omni Context Aware Transformer
by: Goel, Arushi, et al.
Published: (2024)
by: Goel, Arushi, et al.
Published: (2024)
Is Extending Modality The Right Path Towards Omni-Modality?
by: Zhu, Tinghui, et al.
Published: (2025)
by: Zhu, Tinghui, et al.
Published: (2025)
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcement Learning
by: Zhao, Jiaxing, et al.
Published: (2025)
by: Zhao, Jiaxing, et al.
Published: (2025)
OmniTrace: A Unified Framework for Generation-Time Attribution in Omni-Modal LLMs
by: Yan, Qianqi, et al.
Published: (2026)
by: Yan, Qianqi, et al.
Published: (2026)
More than the Sum: Panorama-Language Models for Adverse Omni-Scenes
by: Fan, Weijia, et al.
Published: (2026)
by: Fan, Weijia, et al.
Published: (2026)
OmniSync: Towards Universal Lip Synchronization via Diffusion Transformers
by: Peng, Ziqiao, et al.
Published: (2025)
by: Peng, Ziqiao, et al.
Published: (2025)
FlashOmni: A Unified Sparse Attention Engine for Diffusion Transformers
by: Qiao, Liang, et al.
Published: (2025)
by: Qiao, Liang, et al.
Published: (2025)
OmniBench: Towards The Future of Universal Omni-Language Models
by: Li, Yizhi, et al.
Published: (2024)
by: Li, Yizhi, et al.
Published: (2024)
VITA: Towards Open-Source Interactive Omni Multimodal LLM
by: Fu, Chaoyou, et al.
Published: (2024)
by: Fu, Chaoyou, et al.
Published: (2024)
OmniJigsaw: Enhancing Omni-Modal Reasoning via Modality-Orchestrated Reordering
by: Jia, Yiduo, et al.
Published: (2026)
by: Jia, Yiduo, et al.
Published: (2026)
Omni-DeepSearch: A Benchmark for Audio-Driven Omni-Modal Deep Search
by: Yu, Tao, et al.
Published: (2026)
by: Yu, Tao, et al.
Published: (2026)
Omni-Captioner: Data Pipeline, Models, and Benchmark for Omni Detailed Perception
by: Ma, Ziyang, et al.
Published: (2025)
by: Ma, Ziyang, et al.
Published: (2025)
Similar Items
-
OmniVTON: Training-Free Universal Virtual Try-On
by: Yang, Zhaotong, et al.
Published: (2025) -
HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context
by: Yang, Qize, et al.
Published: (2025) -
OmniVTON++: Training-Free Universal Virtual Try-On with Principal Pose Guidance
by: Yang, Zhaotong, et al.
Published: (2026) -
CaC: Advancing Video Reward Models via Hierarchical Spatiotemporal Concentrating
by: Wang, Jiyuan, et al.
Published: (2026) -
OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding
by: Yao, Jiali, et al.
Published: (2025)