Guardado en:
| Autores principales: | Sun, Yuanfu, Li, Kang, Guo, Pengkang, Liu, Jiajin, Tan, Qiaoyu |
|---|---|
| Formato: | Preprint |
| Publicado: |
2026
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2603.05181 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
GraphVLM: Benchmarking Vision Language Models for Multimodal Graph Learning
por: Liu, Jiajin, et al.
Publicado: (2026)
por: Liu, Jiajin, et al.
Publicado: (2026)
GraphSearch: Agentic Search-Augmented Reasoning for Zero-Shot Graph Learning
por: Liu, Jiajin, et al.
Publicado: (2026)
por: Liu, Jiajin, et al.
Publicado: (2026)
Can MLLMs Reason Beyond Language? VisReason: A Comprehensive Benchmark for Vision-Centric Reasoning
por: Guo, Longteng, et al.
Publicado: (2026)
por: Guo, Longteng, et al.
Publicado: (2026)
SciVQR: A Multidisciplinary Multimodal Benchmark for Advanced Scientific Reasoning Evaluation
por: Guo, Longteng, et al.
Publicado: (2026)
por: Guo, Longteng, et al.
Publicado: (2026)
AgentGL: Towards Agentic Graph Learning with LLMs via Reinforcement Learning
por: Sun, Yuanfu, et al.
Publicado: (2026)
por: Sun, Yuanfu, et al.
Publicado: (2026)
S1-MMAlign: A Large-Scale, Multi-Disciplinary Dataset for Scientific Figure-Text Understanding
por: Wang, He, et al.
Publicado: (2026)
por: Wang, He, et al.
Publicado: (2026)
Actial: Activate Spatial Reasoning Ability of Multimodal Large Language Models
por: Zhan, Xiaoyu, et al.
Publicado: (2025)
por: Zhan, Xiaoyu, et al.
Publicado: (2025)
MicroVQA++: High-Quality Microscopy Reasoning Dataset with Weakly Supervised Graphs for Multimodal Large Language Model
por: Li, Manyu, et al.
Publicado: (2025)
por: Li, Manyu, et al.
Publicado: (2025)
GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs
por: Fang, Yi, et al.
Publicado: (2025)
por: Fang, Yi, et al.
Publicado: (2025)
Language-Instructed Reasoning for Group Activity Detection via Multimodal Large Language Model
por: Peng, Jihua, et al.
Publicado: (2025)
por: Peng, Jihua, et al.
Publicado: (2025)
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
por: Dong, Yuhao, et al.
Publicado: (2024)
por: Dong, Yuhao, et al.
Publicado: (2024)
MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation
por: Huang, Jiaxin, et al.
Publicado: (2025)
por: Huang, Jiaxin, et al.
Publicado: (2025)
Plug-and-Play Grounding of Reasoning in Multimodal Large Language Models
por: Chen, Jiaxing, et al.
Publicado: (2024)
por: Chen, Jiaxing, et al.
Publicado: (2024)
UAV-VL-R1: Generalizing Vision-Language Models via Supervised Fine-Tuning and Multi-Stage GRPO for UAV Visual Reasoning
por: Guan, Jiajin, et al.
Publicado: (2025)
por: Guan, Jiajin, et al.
Publicado: (2025)
Proactive Reasoning-with-Retrieval Framework for Medical Multimodal Large Language Models
por: Wang, Lehan, et al.
Publicado: (2025)
por: Wang, Lehan, et al.
Publicado: (2025)
Enhancing Spatial Reasoning in Multimodal Large Language Models through Reasoning-based Segmentation
por: Ning, Zhenhua, et al.
Publicado: (2025)
por: Ning, Zhenhua, et al.
Publicado: (2025)
Retrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits Multimodal Reasoning
por: Tan, Cheng, et al.
Publicado: (2024)
por: Tan, Cheng, et al.
Publicado: (2024)
Multimodal Chain of Continuous Thought for Latent-Space Reasoning in Vision-Language Models
por: Pham, Tan-Hanh, et al.
Publicado: (2025)
por: Pham, Tan-Hanh, et al.
Publicado: (2025)
LENS: Multi-level Evaluation of Multimodal Reasoning with Large Language Models
por: Yao, Ruilin, et al.
Publicado: (2025)
por: Yao, Ruilin, et al.
Publicado: (2025)
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
por: Fang, Rongyao, et al.
Publicado: (2025)
por: Fang, Rongyao, et al.
Publicado: (2025)
Disease-informed Adaptation of Vision-Language Models
por: Zhang, Jiajin, et al.
Publicado: (2024)
por: Zhang, Jiajin, et al.
Publicado: (2024)
Learning Trajectory-Aware Multimodal Large Language Models for Video Reasoning Segmentation
por: Luo, Jingnan, et al.
Publicado: (2026)
por: Luo, Jingnan, et al.
Publicado: (2026)
StimuVAR: Spatiotemporal Stimuli-aware Video Affective Reasoning with Multimodal Large Language Models
por: Guo, Yuxiang, et al.
Publicado: (2024)
por: Guo, Yuxiang, et al.
Publicado: (2024)
Are Multimodal Large Language Models Ready for Omnidirectional Spatial Reasoning?
por: Dongfang, Zihao, et al.
Publicado: (2025)
por: Dongfang, Zihao, et al.
Publicado: (2025)
TangramPuzzle: Evaluating Multimodal Large Language Models with Compositional Spatial Reasoning
por: Liu, Daixian, et al.
Publicado: (2026)
por: Liu, Daixian, et al.
Publicado: (2026)
DUALVISION: RGB-Infrared Multimodal Large Language Models for Robust Visual Reasoning
por: Majeedi, Abrar, et al.
Publicado: (2026)
por: Majeedi, Abrar, et al.
Publicado: (2026)
Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models
por: Xu, Shilin, et al.
Publicado: (2025)
por: Xu, Shilin, et al.
Publicado: (2025)
SPR-128K: A New Benchmark for Spatial Plausibility Reasoning with Multimodal Large Language Models
por: Hu, Zhiyuan, et al.
Publicado: (2025)
por: Hu, Zhiyuan, et al.
Publicado: (2025)
Unleashing Spatial Reasoning in Multimodal Large Language Models via Textual Representation Guided Reasoning
por: Hua, Jiacheng, et al.
Publicado: (2026)
por: Hua, Jiacheng, et al.
Publicado: (2026)
DR$^2$Seg: Decomposed Two-Stage Rollouts for Efficient Reasoning Segmentation in Multimodal Large Language Models
por: He, Yulin, et al.
Publicado: (2026)
por: He, Yulin, et al.
Publicado: (2026)
Attribute-Grounded Selective Reasoning for Artwork Emotion Understanding with Multimodal Large Language Models
por: Zhang, Cheng, et al.
Publicado: (2026)
por: Zhang, Cheng, et al.
Publicado: (2026)
UKnow: A Unified Knowledge Protocol with Multimodal Knowledge Graph Datasets for Reasoning and Vision-Language Pre-Training
por: Gong, Biao, et al.
Publicado: (2023)
por: Gong, Biao, et al.
Publicado: (2023)
CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models
por: Luo, Fuwen, et al.
Publicado: (2024)
por: Luo, Fuwen, et al.
Publicado: (2024)
Attention-guided Fine-tuning of Multimodal Large Language Models Improves Chain-of-Thought Reasoning
por: Sinha, Sanchit, et al.
Publicado: (2026)
por: Sinha, Sanchit, et al.
Publicado: (2026)
LLaVA-RE: Binary Image-Text Relevancy Evaluation with Multimodal Large Language Model
por: Sun, Tao, et al.
Publicado: (2025)
por: Sun, Tao, et al.
Publicado: (2025)
Concept-Centric Token Interpretation for Vector-Quantized Generative Models
por: Yang, Tianze, et al.
Publicado: (2025)
por: Yang, Tianze, et al.
Publicado: (2025)
DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models
por: Liu, Jianyu, et al.
Publicado: (2025)
por: Liu, Jianyu, et al.
Publicado: (2025)
Jailbreaking Multimodal Large Language Models using Multi-Clip Video
por: Kang, Choongwon, et al.
Publicado: (2026)
por: Kang, Choongwon, et al.
Publicado: (2026)
LAST: Leveraging Tools as Hints to Enhance Spatial Reasoning for Multimodal Large Language Models
por: Tian, Shi-Yu, et al.
Publicado: (2026)
por: Tian, Shi-Yu, et al.
Publicado: (2026)
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
por: Tang, Yolo Y., et al.
Publicado: (2025)
por: Tang, Yolo Y., et al.
Publicado: (2025)
Ejemplares similares
-
GraphVLM: Benchmarking Vision Language Models for Multimodal Graph Learning
por: Liu, Jiajin, et al.
Publicado: (2026) -
GraphSearch: Agentic Search-Augmented Reasoning for Zero-Shot Graph Learning
por: Liu, Jiajin, et al.
Publicado: (2026) -
Can MLLMs Reason Beyond Language? VisReason: A Comprehensive Benchmark for Vision-Centric Reasoning
por: Guo, Longteng, et al.
Publicado: (2026) -
SciVQR: A Multidisciplinary Multimodal Benchmark for Advanced Scientific Reasoning Evaluation
por: Guo, Longteng, et al.
Publicado: (2026) -
AgentGL: Towards Agentic Graph Learning with LLMs via Reinforcement Learning
por: Sun, Yuanfu, et al.
Publicado: (2026)