Saved in:
| Main Authors: | Yuan, Ruifeng, Xiao, Chenghao, Leng, Sicong, Wang, Jianyu, Li, Long, Xu, Weiwen, Chan, Hou Pong, Zhao, Deli, Xu, Tingyang, Wei, Zhongyu, Zhang, Hao, Rong, Yu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.22607 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning
by: LASA Team, et al.
Published: (2025)
by: LASA Team, et al.
Published: (2025)
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
by: Sun, Yu, et al.
Published: (2025)
by: Sun, Yu, et al.
Published: (2025)
GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning
by: Chen, Guizhen, et al.
Published: (2025)
by: Chen, Guizhen, et al.
Published: (2025)
Scaling Language-Centric Omnimodal Representation Learning
by: Xiao, Chenghao, et al.
Published: (2025)
by: Xiao, Chenghao, et al.
Published: (2025)
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving
by: Chen, Guizhen, et al.
Published: (2025)
by: Chen, Guizhen, et al.
Published: (2025)
STAR-R1: Spatial TrAnsformation Reasoning by Reinforcing Multimodal LLMs
by: Li, Zongzhao, et al.
Published: (2025)
by: Li, Zongzhao, et al.
Published: (2025)
SeaLLMs-Audio: Large Audio-Language Models for Southeast Asia
by: Liu, Chaoqun, et al.
Published: (2025)
by: Liu, Chaoqun, et al.
Published: (2025)
Progressive Multimodal Reasoning via Active Retrieval
by: Dong, Guanting, et al.
Published: (2024)
by: Dong, Guanting, et al.
Published: (2024)
Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations
by: Xiao, Chenghao, et al.
Published: (2025)
by: Xiao, Chenghao, et al.
Published: (2025)
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources
by: Leng, Sicong, et al.
Published: (2025)
by: Leng, Sicong, et al.
Published: (2025)
S1-VL: Scientific Multimodal Reasoning Model with Thinking-with-Images
by: Li, Qingxiao, et al.
Published: (2026)
by: Li, Qingxiao, et al.
Published: (2026)
Praxis-VLM: Vision-Grounded Decision Making via Text-Driven Reinforcement Learning
by: Hu, Zhe, et al.
Published: (2025)
by: Hu, Zhe, et al.
Published: (2025)
SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages
by: Zhang, Wenxuan, et al.
Published: (2024)
by: Zhang, Wenxuan, et al.
Published: (2024)
GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning
by: Su, Yanzhou, et al.
Published: (2025)
by: Su, Yanzhou, et al.
Published: (2025)
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
by: Wang, Weiyun, et al.
Published: (2025)
by: Wang, Weiyun, et al.
Published: (2025)
AMERICANO: Argument Generation with Discourse-driven Decomposition and Agent Interaction
by: Hu, Zhe, et al.
Published: (2023)
by: Hu, Zhe, et al.
Published: (2023)
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
by: Cheng, Zesen, et al.
Published: (2024)
by: Cheng, Zesen, et al.
Published: (2024)
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
by: Leng, Sicong, et al.
Published: (2024)
by: Leng, Sicong, et al.
Published: (2024)
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
by: Guo, Jarvis, et al.
Published: (2024)
by: Guo, Jarvis, et al.
Published: (2024)
Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning
by: Wang, Xiaokun, et al.
Published: (2025)
by: Wang, Xiaokun, et al.
Published: (2025)
InterLV-Search: Benchmarking Interleaved Multimodal Agentic Search
by: Hou, Bohan, et al.
Published: (2026)
by: Hou, Bohan, et al.
Published: (2026)
Auto-Arena: Automating LLM Evaluations with Agent Peer Battles and Committee Discussions
by: Zhao, Ruochen, et al.
Published: (2024)
by: Zhao, Ruochen, et al.
Published: (2024)
VL-Calibration: Decoupled Confidence Calibration for Large Vision-Language Models Reasoning
by: Xiao, Wenyi, et al.
Published: (2026)
by: Xiao, Wenyi, et al.
Published: (2026)
Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers
by: Zhao, Yiran, et al.
Published: (2025)
by: Zhao, Yiran, et al.
Published: (2025)
HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices
by: HyperAI Team, et al.
Published: (2025)
by: HyperAI Team, et al.
Published: (2025)
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning
by: Wang, Ke, et al.
Published: (2025)
by: Wang, Ke, et al.
Published: (2025)
DocCogito: Aligning Layout Cognition and Step-Level Grounded Reasoning for Document Understanding
by: Wu, Yuchuan, et al.
Published: (2026)
by: Wu, Yuchuan, et al.
Published: (2026)
KDRL: Post-Training Reasoning LLMs via Unified Knowledge Distillation and Reinforcement Learning
by: Xu, Hongling, et al.
Published: (2025)
by: Xu, Hongling, et al.
Published: (2025)
From Macro to Micro: Benchmarking Microscopic Spatial Intelligence on Molecules via Vision-Language Models
by: Li, Zongzhao, et al.
Published: (2025)
by: Li, Zongzhao, et al.
Published: (2025)
VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning
by: Ma, Jingkun, et al.
Published: (2024)
by: Ma, Jingkun, et al.
Published: (2024)
Cogito Smart Journal
Published: (2017)
Published: (2017)
Fleming-VL: Towards Universal Medical Visual Reasoning with Multimodal LLMs
by: Shu, Yan, et al.
Published: (2025)
by: Shu, Yan, et al.
Published: (2025)
Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
by: Lu, Jinghui, et al.
Published: (2026)
by: Lu, Jinghui, et al.
Published: (2026)
AgriGPT-VL: Agricultural Vision-Language Understanding Suite
by: Yang, Bo, et al.
Published: (2025)
by: Yang, Bo, et al.
Published: (2025)
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
by: Fan, Kaixuan, et al.
Published: (2025)
by: Fan, Kaixuan, et al.
Published: (2025)
Debate-to-Write: A Persona-Driven Multi-Agent Framework for Diverse Argument Generation
by: Hu, Zhe, et al.
Published: (2024)
by: Hu, Zhe, et al.
Published: (2024)
ArgusCogito: Chain-of-Thought for Cross-Modal Synergy and Omnidirectional Reasoning in Camouflaged Object Segmentation
by: Tan, Jianwen, et al.
Published: (2025)
by: Tan, Jianwen, et al.
Published: (2025)
Do LLMs Really Know What They Don't Know? Internal States Mainly Reflect Knowledge Recall Rather Than Truthfulness
by: Cheang, Chi Seng, et al.
Published: (2025)
by: Cheang, Chi Seng, et al.
Published: (2025)
Strong Reasoning Isn't Enough: Evaluating Evidence Elicitation in Interactive Diagnosis
by: Long, Zhuohan, et al.
Published: (2026)
by: Long, Zhuohan, et al.
Published: (2026)
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
by: Wu, Zhiyu, et al.
Published: (2024)
by: Wu, Zhiyu, et al.
Published: (2024)
Similar Items
-
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning
by: LASA Team, et al.
Published: (2025) -
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
by: Sun, Yu, et al.
Published: (2025) -
GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning
by: Chen, Guizhen, et al.
Published: (2025) -
Scaling Language-Centric Omnimodal Representation Learning
by: Xiao, Chenghao, et al.
Published: (2025) -
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving
by: Chen, Guizhen, et al.
Published: (2025)