Saved in:
| Main Authors: | Shi, Yang, Dong, Yuhao, Ding, Yue, Wang, Yuran, Zhu, Xuanyu, Zhou, Sheng, Liu, Wenting, Tian, Haochen, Wang, Rundong, Wang, Huanqian, Liu, Zuyan, Zeng, Bohan, Chen, Ruizhe, Wang, Qixun, Zhang, Zhuoran, Chen, Xinlong, Tong, Chengzhuo, Li, Bozhou, Liu, Qiang, Wang, Haotian, Yang, Wenjing, Zhang, Yuanxing, Wan, Pengfei, Zhang, Yi-Fan, Liu, Ziwei |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.24897 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling
by: Wang, Yuran, et al.
Published: (2025)
by: Wang, Yuran, et al.
Published: (2025)
LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV
by: Liu, Tengfei, et al.
Published: (2026)
by: Liu, Tengfei, et al.
Published: (2026)
Artifact-Bench: Evaluating MLLMs on Detecting and Assessing the Artifacts of AI-Generated Videos
by: Tang, Yuqi, et al.
Published: (2026)
by: Tang, Yuqi, et al.
Published: (2026)
Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling
by: Bai, Xuehai, et al.
Published: (2026)
by: Bai, Xuehai, et al.
Published: (2026)
LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning
by: Dai, Yifan, et al.
Published: (2026)
by: Dai, Yifan, et al.
Published: (2026)
MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios
by: Shi, Yang, et al.
Published: (2025)
by: Shi, Yang, et al.
Published: (2025)
VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining
by: Zhu, Xuanyu, et al.
Published: (2026)
by: Zhu, Xuanyu, et al.
Published: (2026)
DiaDem: Advancing Dialogue Descriptions in Audiovisual Video Captioning for Multimodal Large Language Models
by: Chen, Xinlong, et al.
Published: (2026)
by: Chen, Xinlong, et al.
Published: (2026)
OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models
by: Ding, Yue, et al.
Published: (2026)
by: Ding, Yue, et al.
Published: (2026)
The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss
by: Li, Bozhou, et al.
Published: (2025)
by: Li, Bozhou, et al.
Published: (2025)
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
by: Shi, Yang, et al.
Published: (2025)
by: Shi, Yang, et al.
Published: (2025)
Monet: Reasoning in Latent Visual Space Beyond Images and Language
by: Wang, Qixun, et al.
Published: (2025)
by: Wang, Qixun, et al.
Published: (2025)
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models
by: DataFlow Team, et al.
Published: (2026)
by: DataFlow Team, et al.
Published: (2026)
Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks
by: Zeng, Bohan, et al.
Published: (2026)
by: Zeng, Bohan, et al.
Published: (2026)
CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation
by: Tong, Chengzhuo, et al.
Published: (2026)
by: Tong, Chengzhuo, et al.
Published: (2026)
Unified Vision-Language-Action Model
by: Wang, Yuqi, et al.
Published: (2025)
by: Wang, Yuqi, et al.
Published: (2025)
GRAN-TED: Generating Robust, Aligned, and Nuanced Text Embedding for Diffusion Models
by: Li, Bozhou, et al.
Published: (2025)
by: Li, Bozhou, et al.
Published: (2025)
T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation
by: Cao, Zhe, et al.
Published: (2025)
by: Cao, Zhe, et al.
Published: (2025)
AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration
by: Chen, Xinlong, et al.
Published: (2025)
by: Chen, Xinlong, et al.
Published: (2025)
VidBridge-R1: Bridging QA and Captioning for RL-based Video Understanding Models with Intermediate Proxy Tasks
by: Chen, Xinlong, et al.
Published: (2025)
by: Chen, Xinlong, et al.
Published: (2025)
Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities
by: Liu, Jing, et al.
Published: (2025)
by: Liu, Jing, et al.
Published: (2025)
Visual-Aware CoT: Achieving High-Fidelity Visual Consistency in Unified Models
by: Ye, Zixuan, et al.
Published: (2025)
by: Ye, Zixuan, et al.
Published: (2025)
Toward a unified data-driven turbulence model through multi-objective learning
by: Liu, Zhuoran, et al.
Published: (2025)
by: Liu, Zhuoran, et al.
Published: (2025)
AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment Graph
by: Wang, Zhaowei, et al.
Published: (2023)
by: Wang, Zhaowei, et al.
Published: (2023)
A Unified Framework for Optimizing Uniformly Controlled Structures in Quantum Circuits
by: Xu, Chengzhuo, et al.
Published: (2025)
by: Xu, Chengzhuo, et al.
Published: (2025)
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
by: Liu, Zhenyu, et al.
Published: (2025)
by: Liu, Zhenyu, et al.
Published: (2025)
Unified Batch Normalization: Identifying and Alleviating the Feature Condensation in Batch Normalization and a Unified Framework
by: Wang, Shaobo, et al.
Published: (2023)
by: Wang, Shaobo, et al.
Published: (2023)
Ola: Pushing the Frontiers of Omni-Modal Language Model
by: Liu, Zuyan, et al.
Published: (2025)
by: Liu, Zuyan, et al.
Published: (2025)
Microstructure Evolution, Mechanical Properties, and Corrosion Behavior of Novel Low‐Density Zr–xAl–0.5Si Alloys
by: Chaoqun Xia, et al.
Published: (2025)
by: Chaoqun Xia, et al.
Published: (2025)
Memory-Guided Unified Hardware Accelerator for Mixed-Precision Scientific Computing
by: Wang, Chuanzhen, et al.
Published: (2026)
by: Wang, Chuanzhen, et al.
Published: (2026)
MetaWave: A Platform for Unified Implementation of Nonrelativistic and Relativistic Wavefunctions
by: Zhang, Ning, et al.
Published: (2025)
by: Zhang, Ning, et al.
Published: (2025)
Mixture of Decoding: An Attention-Inspired Adaptive Decoding Strategy to Mitigate Hallucinations in Large Vision-Language Models
by: Chen, Xinlong, et al.
Published: (2025)
by: Chen, Xinlong, et al.
Published: (2025)
Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models
by: Dong, Yuhao, et al.
Published: (2026)
by: Dong, Yuhao, et al.
Published: (2026)
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding
by: Fan, Weichen, et al.
Published: (2025)
by: Fan, Weichen, et al.
Published: (2025)
Synthesis and Performance of Multifunctional Cobalt‐Doped Polydopamine‐Derived Carbon‐Based Electrocatalysts
by: Chengzhuo Xiao, et al.
Published: (2026)
by: Chengzhuo Xiao, et al.
Published: (2026)
Med-U1: Incentivizing Unified Medical Reasoning in LLMs via Large-scale Reinforcement Learning
by: Zhang, Xiaotian, et al.
Published: (2025)
by: Zhang, Xiaotian, et al.
Published: (2025)
Unified MPI Parallelization of Wave Function Methods: iCIPT2 as a Showcase
by: Wang, Qingpeng, et al.
Published: (2026)
by: Wang, Qingpeng, et al.
Published: (2026)
Quality of Evidence for Prenatal Down Syndrome Screening: An Umbrella Review
by: Yuehua Zhang, et al.
Published: (2026)
by: Yuehua Zhang, et al.
Published: (2026)
GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization
by: Wang, Yikun, et al.
Published: (2025)
by: Wang, Yikun, et al.
Published: (2025)
The Power of Many: Synergistic Unification of Diverse Augmentations for Efficient Adversarial Robustness
by: Yu-Hang, Wang, et al.
Published: (2025)
by: Yu-Hang, Wang, et al.
Published: (2025)
Similar Items
-
Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling
by: Wang, Yuran, et al.
Published: (2025) -
LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV
by: Liu, Tengfei, et al.
Published: (2026) -
Artifact-Bench: Evaluating MLLMs on Detecting and Assessing the Artifacts of AI-Generated Videos
by: Tang, Yuqi, et al.
Published: (2026) -
Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling
by: Bai, Xuehai, et al.
Published: (2026) -
LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning
by: Dai, Yifan, et al.
Published: (2026)