Guardado en:
| Autores principales: | Leng, Sicong, Wang, Jing, Li, Jiaxi, Zhang, Hao, Hu, Zhiqiang, Zhang, Boqiang, Jiang, Yuming, Zhang, Hang, Li, Xin, Bing, Lidong, Zhao, Deli, Lu, Wei, Rong, Yu, Sun, Aixin, Lu, Shijian |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2509.21268 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
por: Leng, Sicong, et al.
Publicado: (2024)
por: Leng, Sicong, et al.
Publicado: (2024)
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding
por: Zhang, Boqiang, et al.
Publicado: (2025)
por: Zhang, Boqiang, et al.
Publicado: (2025)
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
por: Cheng, Zesen, et al.
Publicado: (2024)
por: Cheng, Zesen, et al.
Publicado: (2024)
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling
por: Yang, Zuhao, et al.
Publicado: (2025)
por: Yang, Zuhao, et al.
Publicado: (2025)
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
por: Zhang, Wenqi, et al.
Publicado: (2025)
por: Zhang, Wenqi, et al.
Publicado: (2025)
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
por: Cheng, Zesen, et al.
Publicado: (2024)
por: Cheng, Zesen, et al.
Publicado: (2024)
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
por: Yuan, Yuqian, et al.
Publicado: (2024)
por: Yuan, Yuqian, et al.
Publicado: (2024)
Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents
por: Li, Long, et al.
Publicado: (2024)
por: Li, Long, et al.
Publicado: (2024)
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning
por: Yuan, Ruifeng, et al.
Publicado: (2025)
por: Yuan, Ruifeng, et al.
Publicado: (2025)
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
por: Zhang, Kaichen, et al.
Publicado: (2025)
por: Zhang, Kaichen, et al.
Publicado: (2025)
Local well-posedness of strong solutions to the compressible Navier-Stokes equations with degenerate viscosities and far field vacuum in 3D exterior domains
por: Li, Jiaxu, et al.
Publicado: (2026)
por: Li, Jiaxu, et al.
Publicado: (2026)
Strong solutions to the initial-boundary-value problem of compressible MHD equations with degenerate viscosities and far field vacuum in 3D exterior domains
por: Li, Jiaxu, et al.
Publicado: (2026)
por: Li, Jiaxu, et al.
Publicado: (2026)
Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective
por: Wang, Jianyu, et al.
Publicado: (2025)
por: Wang, Jianyu, et al.
Publicado: (2025)
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency
por: Wang, Zhikai, et al.
Publicado: (2025)
por: Wang, Zhikai, et al.
Publicado: (2025)
FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging
por: Tang, Zichen, et al.
Publicado: (2025)
por: Tang, Zichen, et al.
Publicado: (2025)
MMR-Life: Piecing Together Real-life Scenes for Multimodal Multi-image Reasoning
por: Li, Jiachun, et al.
Publicado: (2026)
por: Li, Jiachun, et al.
Publicado: (2026)
Exploring 3D Reasoning-Driven Planning: From Implicit Human Intentions to Route-Aware Activity Planning
por: Jiang, Xueying, et al.
Publicado: (2025)
por: Jiang, Xueying, et al.
Publicado: (2025)
Multimodal 3D Reasoning Segmentation with Complex Scenes
por: Jiang, Xueying, et al.
Publicado: (2024)
por: Jiang, Xueying, et al.
Publicado: (2024)
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving
por: Chen, Guizhen, et al.
Publicado: (2025)
por: Chen, Guizhen, et al.
Publicado: (2025)
Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers
por: Zhao, Yiran, et al.
Publicado: (2025)
por: Zhao, Yiran, et al.
Publicado: (2025)
Incorporating Feature Pyramid Tokenization and Open Vocabulary Semantic Segmentation
por: Zhang, Jianyu, et al.
Publicado: (2024)
por: Zhang, Jianyu, et al.
Publicado: (2024)
On the Generalization Capacities of MLLMs for Spatial Intelligence
por: Zhang, Gongjie, et al.
Publicado: (2026)
por: Zhang, Gongjie, et al.
Publicado: (2026)
Towards Camera-Robust 3D Localization: Equation-Anchored Tool-Use for MLLMs
por: Jiang, Xueying, et al.
Publicado: (2026)
por: Jiang, Xueying, et al.
Publicado: (2026)
Infi-MMR: Curriculum-based Unlocking Multimodal Reasoning via Phased Reinforcement Learning in Multimodal Small Language Models
por: Liu, Zeyu, et al.
Publicado: (2025)
por: Liu, Zeyu, et al.
Publicado: (2025)
On the Role of Discreteness in Diffusion LLMs
por: Jin, Ziqi, et al.
Publicado: (2025)
por: Jin, Ziqi, et al.
Publicado: (2025)
MMR: Evaluating Reading Ability of Large Multimodal Models
por: Chen, Jian, et al.
Publicado: (2024)
por: Chen, Jian, et al.
Publicado: (2024)
Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
por: An, Wenbin, et al.
Publicado: (2024)
por: An, Wenbin, et al.
Publicado: (2024)
VADE: Variance-Aware Dynamic Sampling via Online Sample-Level Difficulty Estimation for Multimodal RL
por: Hu, Zengjie, et al.
Publicado: (2025)
por: Hu, Zengjie, et al.
Publicado: (2025)
MM-OpenFGL: A Comprehensive Benchmark for Multimodal Federated Graph Learning
por: Li, Xunkai, et al.
Publicado: (2026)
por: Li, Xunkai, et al.
Publicado: (2026)
ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
por: Dang, Ronghao, et al.
Publicado: (2025)
por: Dang, Ronghao, et al.
Publicado: (2025)
Open-Vocabulary Object Detection via Language Hierarchy
por: Huang, Jiaxing, et al.
Publicado: (2024)
por: Huang, Jiaxing, et al.
Publicado: (2024)
STAR-R1: Spatial TrAnsformation Reasoning by Reinforcing Multimodal LLMs
por: Li, Zongzhao, et al.
Publicado: (2025)
por: Li, Zongzhao, et al.
Publicado: (2025)
RynnBrain: Open Embodied Foundation Models
por: Dang, Ronghao, et al.
Publicado: (2026)
por: Dang, Ronghao, et al.
Publicado: (2026)
ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning
por: Yang, Zuhao, et al.
Publicado: (2026)
por: Yang, Zuhao, et al.
Publicado: (2026)
MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos
por: Zhu, Kejian, et al.
Publicado: (2025)
por: Zhu, Kejian, et al.
Publicado: (2025)
Auto-Arena: Automating LLM Evaluations with Agent Peer Battles and Committee Discussions
por: Zhao, Ruochen, et al.
Publicado: (2024)
por: Zhao, Ruochen, et al.
Publicado: (2024)
A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models
por: Li, Duo, et al.
Publicado: (2025)
por: Li, Duo, et al.
Publicado: (2025)
Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective
por: Zhu, Yongxin, et al.
Publicado: (2024)
por: Zhu, Yongxin, et al.
Publicado: (2024)
Boosting Reasoning in Large Multimodal Models via Activation Replay
por: Xing, Yun, et al.
Publicado: (2025)
por: Xing, Yun, et al.
Publicado: (2025)
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
por: Zhang, Jingyi, et al.
Publicado: (2025)
por: Zhang, Jingyi, et al.
Publicado: (2025)
Ejemplares similares
-
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
por: Leng, Sicong, et al.
Publicado: (2024) -
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding
por: Zhang, Boqiang, et al.
Publicado: (2025) -
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
por: Cheng, Zesen, et al.
Publicado: (2024) -
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling
por: Yang, Zuhao, et al.
Publicado: (2025) -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
por: Zhang, Wenqi, et al.
Publicado: (2025)