Saved in:
| Main Authors: | Shao, Hang, Gao, Heting, Shen, Yunhang, Chen, Jiawei, Long, Zuwei, Yang, Dong, Li, Ke, Sun, Xing |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.21864 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion
by: Li, Lijiang, et al.
Published: (2026)
by: Li, Lijiang, et al.
Published: (2026)
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
by: Long, Zuwei, et al.
Published: (2025)
by: Long, Zuwei, et al.
Published: (2025)
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
by: Fu, Chaoyou, et al.
Published: (2025)
by: Fu, Chaoyou, et al.
Published: (2025)
Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
by: Wang, Xiong, et al.
Published: (2024)
by: Wang, Xiong, et al.
Published: (2024)
LUCY: Linguistic Understanding and Control Yielding Early Stage of Her
by: Gao, Heting, et al.
Published: (2025)
by: Gao, Heting, et al.
Published: (2025)
VITA: Towards Open-Source Interactive Omni Multimodal LLM
by: Fu, Chaoyou, et al.
Published: (2024)
by: Fu, Chaoyou, et al.
Published: (2024)
OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction
by: Zhang, Haonan, et al.
Published: (2025)
by: Zhang, Haonan, et al.
Published: (2025)
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
by: Fang, Qingkai, et al.
Published: (2024)
by: Fang, Qingkai, et al.
Published: (2024)
Omni-Referring Image Segmentation
by: Zheng, Qiancheng, et al.
Published: (2025)
by: Zheng, Qiancheng, et al.
Published: (2025)
MoE-Hub: Taming Software Complexity for Seamless MoE Overlap with Hardware-Accelerated Communication on Multi-GPU Systems
by: Zhou, Zhuoshan, et al.
Published: (2026)
by: Zhou, Zhuoshan, et al.
Published: (2026)
OmniMoE: An Efficient MoE by Orchestrating Atomic Experts at Scale
by: Shi, Jingze, et al.
Published: (2026)
by: Shi, Jingze, et al.
Published: (2026)
VEQ: Modality-Adaptive Quantization for MoE Vision-Language Models
by: Qin, Guangshuo, et al.
Published: (2026)
by: Qin, Guangshuo, et al.
Published: (2026)
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory
by: Wu, Haoze, et al.
Published: (2024)
by: Wu, Haoze, et al.
Published: (2024)
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
by: Li, Yunxin, et al.
Published: (2025)
by: Li, Yunxin, et al.
Published: (2025)
FlexiReID: Adaptive Mixture of Expert for Multi-Modal Person Re-Identification
by: Sun, Zhen, et al.
Published: (2025)
by: Sun, Zhen, et al.
Published: (2025)
OmniGAIA: Towards Native Omni-Modal AI Agents
by: Li, Xiaoxi, et al.
Published: (2026)
by: Li, Xiaoxi, et al.
Published: (2026)
Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts
by: Wu, Haoyuan, et al.
Published: (2025)
by: Wu, Haoyuan, et al.
Published: (2025)
Towards Unsupervised Speech Recognition Without Pronunciation Models
by: Ni, Junrui, et al.
Published: (2024)
by: Ni, Junrui, et al.
Published: (2024)
SMoES: Soft Modality-Guided Expert Specialization in MoE-VLMs
by: Bo, Zi-Hao, et al.
Published: (2026)
by: Bo, Zi-Hao, et al.
Published: (2026)
MiM-DiT: MoE in MoE with Diffusion Transformers for All-in-One Image Restoration
by: Kong, Lingshun, et al.
Published: (2026)
by: Kong, Lingshun, et al.
Published: (2026)
MoTAS: MoE-Guided Feature Selection from TTS-Augmented Speech for Enhanced Multimodal Alzheimer's Early Screening
by: Shao, Yongqi, et al.
Published: (2025)
by: Shao, Yongqi, et al.
Published: (2025)
MoE3D: Mixture of Experts meets Multi-Modal 3D Understanding
by: Li, Yu, et al.
Published: (2025)
by: Li, Yu, et al.
Published: (2025)
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
by: Chen, Junyi, et al.
Published: (2023)
by: Chen, Junyi, et al.
Published: (2023)
MoE-Loco: Mixture of Experts for Multitask Locomotion
by: Huang, Runhan, et al.
Published: (2025)
by: Huang, Runhan, et al.
Published: (2025)
iSchools and Non-iSchools in the USA: An Examination of Their Master's Programs
by: Chu, Heting
Published: (2012)
by: Chu, Heting
Published: (2012)
Hyperlinks: How Well Do They Represent the Intellectual Content of Digital Collections?
by: Chu, Heting
Published: (1997)
by: Chu, Heting
Published: (1997)
Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts
by: Yun, Sukwon, et al.
Published: (2024)
by: Yun, Sukwon, et al.
Published: (2024)
Is Extending Modality The Right Path Towards Omni-Modality?
by: Zhu, Tinghui, et al.
Published: (2025)
by: Zhu, Tinghui, et al.
Published: (2025)
BIG-MoE: Bypass Isolated Gating MoE for Generalized Multimodal Face Anti-Spoofing
by: Ma, Yingjie, et al.
Published: (2024)
by: Ma, Yingjie, et al.
Published: (2024)
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
by: Chen, Qian, et al.
Published: (2025)
by: Chen, Qian, et al.
Published: (2025)
KBVQ-MoE: KLT-guided SVD with Bias-Corrected Vector Quantization for MoE Large Language Models
by: Xu, Zukang, et al.
Published: (2026)
by: Xu, Zukang, et al.
Published: (2026)
InterMoE: Individual-Specific 3D Human Interaction Generation via Dynamic Temporal-Selective MoE
by: Wang, Lipeng, et al.
Published: (2025)
by: Wang, Lipeng, et al.
Published: (2025)
Omni-DeepSearch: A Benchmark for Audio-Driven Omni-Modal Deep Search
by: Yu, Tao, et al.
Published: (2026)
by: Yu, Tao, et al.
Published: (2026)
I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts
by: Xin, Jiayi, et al.
Published: (2025)
by: Xin, Jiayi, et al.
Published: (2025)
Mix-MoE: Improving Multilingual Machine Translation of Large Language Models through Mixed MoEs
by: Li, Bo, et al.
Published: (2026)
by: Li, Bo, et al.
Published: (2026)
OmniJigsaw: Enhancing Omni-Modal Reasoning via Modality-Orchestrated Reordering
by: Jia, Yiduo, et al.
Published: (2026)
by: Jia, Yiduo, et al.
Published: (2026)
AST: Adaptive, Seamless, and Training-Free Precise Speech Editing
by: Lv, Sihan, et al.
Published: (2026)
by: Lv, Sihan, et al.
Published: (2026)
VA-MoE: Variables-Adaptive Mixture of Experts for Incremental Weather Forecasting
by: Chen, Hao, et al.
Published: (2024)
by: Chen, Hao, et al.
Published: (2024)
GRIN: GRadient-INformed MoE
by: Liu, Liyuan, et al.
Published: (2024)
by: Liu, Liyuan, et al.
Published: (2024)
Can Unified Generation and Understanding Models Maintain Semantic Equivalence Across Different Output Modalities?
by: Jiang, Hongbo, et al.
Published: (2026)
by: Jiang, Hongbo, et al.
Published: (2026)
Similar Items
-
Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion
by: Li, Lijiang, et al.
Published: (2026) -
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
by: Long, Zuwei, et al.
Published: (2025) -
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
by: Fu, Chaoyou, et al.
Published: (2025) -
Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
by: Wang, Xiong, et al.
Published: (2024) -
LUCY: Linguistic Understanding and Control Yielding Early Stage of Her
by: Gao, Heting, et al.
Published: (2025)