:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lu, Jianglin, Wang, Hailing, Xu, Yi, Wang, Yizhou, Yang, Kuo, Fu, Yun
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2510.05184
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

The Indra Representation Hypothesis for Multimodal Alignment
by: Lu, Jianglin, et al.
Published: (2026)

Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks
by: Dong, Qihua, et al.
Published: (2026)

Scale-Free Graph-Language Models
by: Lu, Jianglin, et al.
Published: (2025)

Embodied Representation Alignment with Mirror Neurons
by: Zhu, Wentao, et al.
Published: (2025)

A Survey of Resource-efficient LLM and Multimodal Foundation Models
by: Xu, Mengwei, et al.
Published: (2024)

Unveiling the Unseen: A Comprehensive Survey on Explainable Anomaly Detection in Images and Videos
by: Wang, Yizhou, et al.
Published: (2023)

A Theoretical Survey on Foundation Models
by: Fu, Shi, et al.
Published: (2024)

Don't Judge by the Look: Towards Motion Coherent Video Representation
by: Zhang, Yitian, et al.
Published: (2024)

Through the Theory of Mind's Eye: Reading Minds with Multimodal Video Large Language Models
by: Chen, Zhawnen, et al.
Published: (2024)

Learning to Rotate: Temporal and Semantic Rotary Encoding for Sequential Modeling
by: Cheng, Hailing, et al.
Published: (2026)

AI Alignment: A Comprehensive Survey
by: Ji, Jiaming, et al.
Published: (2023)

D-CoDe: Scaling Image-Pretrained VLMs to Video via Dynamic Compression and Question Decomposition
by: Huang, Yiyang, et al.
Published: (2025)

Trajectory Prediction Meets Large Language Models: A Survey
by: Xu, Yi, et al.
Published: (2025)

Distorted or Fabricated? A Survey on Hallucination in Video LLMs
by: Huang, Yiyang, et al.
Published: (2026)

IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations
by: Fu, Deqing, et al.
Published: (2024)

RoboAlign-R1: Distilled Multimodal Reward Alignment for Robot Video World Models
by: Wu, Hao, et al.
Published: (2026)

Human-Centric Foundation Models: Perception, Generation and Agentic Modeling
by: Tang, Shixiang, et al.
Published: (2025)

BrainDINO: A Brain MRI Foundation Model for Generalizable Clinical Representation Learning
by: Wu, Yizhou, et al.
Published: (2026)

MIO: A Foundation Model on Multimodal Tokens
by: Wang, Zekun, et al.
Published: (2024)

Multimodal Representation Alignment for Cross-modal Information Retrieval
by: Xu, Fan, et al.
Published: (2025)

Beyond Interleaving: Causal Attention Reformulations for Generative Recommender Systems
by: Cheng, Hailing
Published: (2026)

A Survey of Multimodal Mathematical Reasoning: From Perception, Alignment to Reasoning
by: Yang, Tianyu, et al.
Published: (2026)

Session-Level Spoken Language Assessment with a Multimodal Foundation Model via Multi-Target Learning
by: Lin, Hong-Yun, et al.
Published: (2025)

Understanding the Emergence of Multimodal Representation Alignment
by: Tjandrasuwita, Megan, et al.
Published: (2025)

A Survey on Benchmarks of Multimodal Large Language Models
by: Li, Jian, et al.
Published: (2024)

When Tabular Foundation Models Meet Strategic Tabular Data: A Prior Alignment Approach
by: Lv, Xinpeng, et al.
Published: (2026)

Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
by: Zhou, Guanghao, et al.
Published: (2025)

Foundations and Recent Trends in Multimodal Mobile Agents: A Survey
by: Wu, Biao, et al.
Published: (2024)

A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers
by: Hu, Ming, et al.
Published: (2025)

From Efficient Multimodal Models to World Models: A Survey
by: Mai, Xinji, et al.
Published: (2024)

Accessing Vision Foundation Models via ImageNet-1K
by: Zhang, Yitian, et al.
Published: (2024)

Endogenous Reprompting: Self-Evolving Cognitive Alignment for Unified Multimodal Models
by: Tang, Zhenchen, et al.
Published: (2026)

Decipher the Modality Gap in Multimodal Contrastive Learning: From Convergent Representations to Pairwise Alignment
by: Yi, Lingjie, et al.
Published: (2025)

Deploying Foundation Model Powered Agent Services: A Survey
by: Xu, Wenchao, et al.
Published: (2024)

Synergizing Foundation Models and Federated Learning: A Survey
by: Li, Shenghui, et al.
Published: (2024)

EGRA:Toward Enhanced Behavior Graphs and Representation Alignment for Multimodal Recommendation
by: Zhang, Xiaoxiong, et al.
Published: (2025)

Revisiting Model Stitching In the Foundation Model Era
by: Mai, Zheda, et al.
Published: (2026)

Boosting Large Language Models with Mask Fine-Tuning
by: Zhang, Mingyuan, et al.
Published: (2025)

ECG-MoE: Mixture-of-Expert Electrocardiogram Foundation Model
by: Xu, Yuhao, et al.
Published: (2026)

From Perception to Cognition: A Survey of Vision-Language Interactive Reasoning in Multimodal Large Language Models
by: Zhou, Chenyue, et al.
Published: (2025)