:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yuan, Ruifeng, Xiao, Chenghao, Leng, Sicong, Wang, Jianyu, Li, Long, Xu, Weiwen, Chan, Hou Pong, Zhao, Deli, Xu, Tingyang, Wei, Zhongyu, Zhang, Hao, Rong, Yu
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2507.22607
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning
by: LASA Team, et al.
Published: (2025)

ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
by: Sun, Yu, et al.
Published: (2025)

GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning
by: Chen, Guizhen, et al.
Published: (2025)

Scaling Language-Centric Omnimodal Representation Learning
by: Xiao, Chenghao, et al.
Published: (2025)

FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving
by: Chen, Guizhen, et al.
Published: (2025)

STAR-R1: Spatial TrAnsformation Reasoning by Reinforcing Multimodal LLMs
by: Li, Zongzhao, et al.
Published: (2025)

SeaLLMs-Audio: Large Audio-Language Models for Southeast Asia
by: Liu, Chaoqun, et al.
Published: (2025)

Progressive Multimodal Reasoning via Active Retrieval
by: Dong, Guanting, et al.
Published: (2024)

Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations
by: Xiao, Chenghao, et al.
Published: (2025)

MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources
by: Leng, Sicong, et al.
Published: (2025)

S1-VL: Scientific Multimodal Reasoning Model with Thinking-with-Images
by: Li, Qingxiao, et al.
Published: (2026)

Praxis-VLM: Vision-Grounded Decision Making via Text-Driven Reinforcement Learning
by: Hu, Zhe, et al.
Published: (2025)

SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages
by: Zhang, Wenxuan, et al.
Published: (2024)

GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning
by: Su, Yanzhou, et al.
Published: (2025)

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
by: Wang, Weiyun, et al.
Published: (2025)

AMERICANO: Argument Generation with Discourse-driven Decomposition and Agent Interaction
by: Hu, Zhe, et al.
Published: (2023)

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
by: Cheng, Zesen, et al.
Published: (2024)

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
by: Leng, Sicong, et al.
Published: (2024)

MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
by: Guo, Jarvis, et al.
Published: (2024)

Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning
by: Wang, Xiaokun, et al.
Published: (2025)

InterLV-Search: Benchmarking Interleaved Multimodal Agentic Search
by: Hou, Bohan, et al.
Published: (2026)

Auto-Arena: Automating LLM Evaluations with Agent Peer Battles and Committee Discussions
by: Zhao, Ruochen, et al.
Published: (2024)

VL-Calibration: Decoupled Confidence Calibration for Large Vision-Language Models Reasoning
by: Xiao, Wenyi, et al.
Published: (2026)

Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers
by: Zhao, Yiran, et al.
Published: (2025)

HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices
by: HyperAI Team, et al.
Published: (2025)

MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning
by: Wang, Ke, et al.
Published: (2025)

DocCogito: Aligning Layout Cognition and Step-Level Grounded Reasoning for Document Understanding
by: Wu, Yuchuan, et al.
Published: (2026)

KDRL: Post-Training Reasoning LLMs via Unified Knowledge Distillation and Reinforcement Learning
by: Xu, Hongling, et al.
Published: (2025)

From Macro to Micro: Benchmarking Microscopic Spatial Intelligence on Molecules via Vision-Language Models
by: Li, Zongzhao, et al.
Published: (2025)

VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning
by: Ma, Jingkun, et al.
Published: (2024)

Cogito Smart Journal
Published: (2017)

Fleming-VL: Towards Universal Medical Visual Reasoning with Multimodal LLMs
by: Shu, Yan, et al.
Published: (2025)

Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
by: Lu, Jinghui, et al.
Published: (2026)

AgriGPT-VL: Agricultural Vision-Language Understanding Suite
by: Yang, Bo, et al.
Published: (2025)

SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
by: Fan, Kaixuan, et al.
Published: (2025)

Debate-to-Write: A Persona-Driven Multi-Agent Framework for Diverse Argument Generation
by: Hu, Zhe, et al.
Published: (2024)

ArgusCogito: Chain-of-Thought for Cross-Modal Synergy and Omnidirectional Reasoning in Camouflaged Object Segmentation
by: Tan, Jianwen, et al.
Published: (2025)

Do LLMs Really Know What They Don't Know? Internal States Mainly Reflect Knowledge Recall Rather Than Truthfulness
by: Cheang, Chi Seng, et al.
Published: (2025)

Strong Reasoning Isn't Enough: Evaluating Evidence Elicitation in Interactive Diagnosis
by: Long, Zhuohan, et al.
Published: (2026)

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
by: Wu, Zhiyu, et al.
Published: (2024)