:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Shi, Yang, Dong, Yuhao, Ding, Yue, Wang, Yuran, Zhu, Xuanyu, Zhou, Sheng, Liu, Wenting, Tian, Haochen, Wang, Rundong, Wang, Huanqian, Liu, Zuyan, Zeng, Bohan, Chen, Ruizhe, Wang, Qixun, Zhang, Zhuoran, Chen, Xinlong, Tong, Chengzhuo, Li, Bozhou, Liu, Qiang, Wang, Haotian, Yang, Wenjing, Zhang, Yuanxing, Wan, Pengfei, Zhang, Yi-Fan, Liu, Ziwei
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2509.24897
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling
by: Wang, Yuran, et al.
Published: (2025)

LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV
by: Liu, Tengfei, et al.
Published: (2026)

Artifact-Bench: Evaluating MLLMs on Detecting and Assessing the Artifacts of AI-Generated Videos
by: Tang, Yuqi, et al.
Published: (2026)

Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling
by: Bai, Xuehai, et al.
Published: (2026)

LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning
by: Dai, Yifan, et al.
Published: (2026)

MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios
by: Shi, Yang, et al.
Published: (2025)

VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining
by: Zhu, Xuanyu, et al.
Published: (2026)

DiaDem: Advancing Dialogue Descriptions in Audiovisual Video Captioning for Multimodal Large Language Models
by: Chen, Xinlong, et al.
Published: (2026)

OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models
by: Ding, Yue, et al.
Published: (2026)

The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss
by: Li, Bozhou, et al.
Published: (2025)

Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
by: Shi, Yang, et al.
Published: (2025)

Monet: Reasoning in Latent Visual Space Beyond Images and Language
by: Wang, Qixun, et al.
Published: (2025)

OpenWorldLib: A Unified Codebase and Definition of Advanced World Models
by: DataFlow Team, et al.
Published: (2026)

Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks
by: Zeng, Bohan, et al.
Published: (2026)

CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation
by: Tong, Chengzhuo, et al.
Published: (2026)

Unified Vision-Language-Action Model
by: Wang, Yuqi, et al.
Published: (2025)

GRAN-TED: Generating Robust, Aligned, and Nuanced Text Embedding for Diffusion Models
by: Li, Bozhou, et al.
Published: (2025)

T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation
by: Cao, Zhe, et al.
Published: (2025)

AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration
by: Chen, Xinlong, et al.
Published: (2025)

VidBridge-R1: Bridging QA and Captioning for RL-based Video Understanding Models with Intermediate Proxy Tasks
by: Chen, Xinlong, et al.
Published: (2025)

Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities
by: Liu, Jing, et al.
Published: (2025)

Visual-Aware CoT: Achieving High-Fidelity Visual Consistency in Unified Models
by: Ye, Zixuan, et al.
Published: (2025)

Toward a unified data-driven turbulence model through multi-objective learning
by: Liu, Zhuoran, et al.
Published: (2025)

AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment Graph
by: Wang, Zhaowei, et al.
Published: (2023)

A Unified Framework for Optimizing Uniformly Controlled Structures in Quantum Circuits
by: Xu, Chengzhuo, et al.
Published: (2025)

UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
by: Liu, Zhenyu, et al.
Published: (2025)

Unified Batch Normalization: Identifying and Alleviating the Feature Condensation in Batch Normalization and a Unified Framework
by: Wang, Shaobo, et al.
Published: (2023)

Ola: Pushing the Frontiers of Omni-Modal Language Model
by: Liu, Zuyan, et al.
Published: (2025)

Microstructure Evolution, Mechanical Properties, and Corrosion Behavior of Novel Low‐Density Zr–xAl–0.5Si Alloys
by: Chaoqun Xia, et al.
Published: (2025)

Memory-Guided Unified Hardware Accelerator for Mixed-Precision Scientific Computing
by: Wang, Chuanzhen, et al.
Published: (2026)

MetaWave: A Platform for Unified Implementation of Nonrelativistic and Relativistic Wavefunctions
by: Zhang, Ning, et al.
Published: (2025)

Mixture of Decoding: An Attention-Inspired Adaptive Decoding Strategy to Mitigate Hallucinations in Large Vision-Language Models
by: Chen, Xinlong, et al.
Published: (2025)

Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models
by: Dong, Yuhao, et al.
Published: (2026)

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding
by: Fan, Weichen, et al.
Published: (2025)

Synthesis and Performance of Multifunctional Cobalt‐Doped Polydopamine‐Derived Carbon‐Based Electrocatalysts
by: Chengzhuo Xiao, et al.
Published: (2026)

Med-U1: Incentivizing Unified Medical Reasoning in LLMs via Large-scale Reinforcement Learning
by: Zhang, Xiaotian, et al.
Published: (2025)

Unified MPI Parallelization of Wave Function Methods: iCIPT2 as a Showcase
by: Wang, Qingpeng, et al.
Published: (2026)

Quality of Evidence for Prenatal Down Syndrome Screening: An Umbrella Review
by: Yuehua Zhang, et al.
Published: (2026)

GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization
by: Wang, Yikun, et al.
Published: (2025)

The Power of Many: Synergistic Unification of Diverse Augmentations for Efficient Adversarial Robustness
by: Yu-Hang, Wang, et al.
Published: (2025)