Saved in:
| Main Authors: | Zhu, Rui, Shen, Xin, Wu, Shuchen, Miao, Chenxi, Yu, Xin, Li, Yang, Li, Weikang, Xia, Deguo, Huang, Jizhou |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.09430 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
FingerCap: Fine-grained Finger-level Hand Motion Captioning
by: Shen, Xin, et al.
Published: (2025)
by: Shen, Xin, et al.
Published: (2025)
Stay in Character, Stay Safe: Dual-Cycle Adversarial Self-Evolution for Safety Role-Playing Agents
by: Liao, Mingyang, et al.
Published: (2026)
by: Liao, Mingyang, et al.
Published: (2026)
PAIRS: Parametric-Verified Adaptive Information Retrieval and Selection for Efficient RAG
by: Chen, Wang, et al.
Published: (2025)
by: Chen, Wang, et al.
Published: (2025)
Decide Then Retrieve: A Training-Free Framework with Uncertainty-Guided Triggering and Dual-Path Retrieval
by: Chen, Wang, et al.
Published: (2026)
by: Chen, Wang, et al.
Published: (2026)
DuCCAE: A Hybrid Engine for Immersive Conversation via Collaboration, Augmentation, and Evolution
by: Shen, Xin, et al.
Published: (2026)
by: Shen, Xin, et al.
Published: (2026)
Student Guides Teacher: Weak-to-Strong Inference via Spectral Orthogonal Exploration
by: Wang, Dayu, et al.
Published: (2026)
by: Wang, Dayu, et al.
Published: (2026)
SciEGQA: A Dataset for Scientific Evidence-Grounded Question Answering and Reasoning
by: Yu, Wenhan, et al.
Published: (2025)
by: Yu, Wenhan, et al.
Published: (2025)
Beyond End-to-End Video Models: An LLM-Based Multi-Agent System for Educational Video Generation
by: Yan, Lingyong, et al.
Published: (2026)
by: Yan, Lingyong, et al.
Published: (2026)
Probabilistic Modeling of Intentions in Socially Intelligent LLM Agents
by: Xia, Feifan, et al.
Published: (2025)
by: Xia, Feifan, et al.
Published: (2025)
CMRAG: Co-modality-based visual document retrieval and question answering
by: Chen, Wang, et al.
Published: (2025)
by: Chen, Wang, et al.
Published: (2025)
Cross-LoRA: A Data-Free LoRA Transfer Framework across Heterogeneous LLMs
by: Xia, Feifan, et al.
Published: (2025)
by: Xia, Feifan, et al.
Published: (2025)
Facial-R1: Aligning Reasoning and Recognition for Facial Emotion Analysis
by: Wu, Jiulong, et al.
Published: (2025)
by: Wu, Jiulong, et al.
Published: (2025)
VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning
by: Wang, Qi, et al.
Published: (2025)
by: Wang, Qi, et al.
Published: (2025)
ActTraitBench: Quantifying the Knowledge-Decision Gap in Large Language Models via Human-Grounded Behavioral Validation
by: Yang, Yutong, et al.
Published: (2026)
by: Yang, Yutong, et al.
Published: (2026)
TCC-Bench: Benchmarking the Traditional Chinese Culture Understanding Capabilities of MLLMs
by: Xu, Pengju, et al.
Published: (2025)
by: Xu, Pengju, et al.
Published: (2025)
Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities
by: Ying, Shuangshuang, et al.
Published: (2026)
by: Ying, Shuangshuang, et al.
Published: (2026)
S$^2$-MLLM: Boosting Spatial Reasoning Capability of MLLMs for 3D Visual Grounding with Structural Guidance
by: Xu, Beining, et al.
Published: (2025)
by: Xu, Beining, et al.
Published: (2025)
DuMapNet: An End-to-End Vectorization System for City-Scale Lane-Level Map Generation
by: Xia, Deguo, et al.
Published: (2024)
by: Xia, Deguo, et al.
Published: (2024)
The MSR-Video to Text Dataset with Clean Annotations
by: Chen, Haoran, et al.
Published: (2021)
by: Chen, Haoran, et al.
Published: (2021)
M$^3$-Med: A Benchmark for Multi-lingual, Multi-modal, and Multi-hop Reasoning in Medical Instructional Video Understanding
by: Liu, Shenxi, et al.
Published: (2025)
by: Liu, Shenxi, et al.
Published: (2025)
SpaceR: Reinforcing MLLMs in Video Spatial Reasoning
by: Ouyang, Kun, et al.
Published: (2025)
by: Ouyang, Kun, et al.
Published: (2025)
MultihopSpatial: Multi-hop Compositional Spatial Reasoning Benchmark for Vision-Language Model
by: Lee, Youngwan, et al.
Published: (2026)
by: Lee, Youngwan, et al.
Published: (2026)
Spatial-Temporal Multi-level Association for Video Object Segmentation
by: Miao, Deshui, et al.
Published: (2024)
by: Miao, Deshui, et al.
Published: (2024)
Trait-Aware Policy Optimization for Autoregressive Multi-Trait Essay Scoring
by: Wang, Zhengyang, et al.
Published: (2026)
by: Wang, Zhengyang, et al.
Published: (2026)
WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music Reasoning
by: Mundada, Gagan, et al.
Published: (2025)
by: Mundada, Gagan, et al.
Published: (2025)
An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models
by: Shiri, Fatemeh, et al.
Published: (2024)
by: Shiri, Fatemeh, et al.
Published: (2024)
Evaluating MLLMs with Multimodal Multi-image Reasoning Benchmark
by: Cheng, Ziming, et al.
Published: (2025)
by: Cheng, Ziming, et al.
Published: (2025)
Benchmarking Chinese Commonsense Reasoning with a Multi-hop Reasoning Perspective
by: You, Wangjie, et al.
Published: (2025)
by: You, Wangjie, et al.
Published: (2025)
Perception-R1: Advancing Multimodal Reasoning Capabilities of MLLMs via Visual Perception Reward
by: Xiao, Tong, et al.
Published: (2025)
by: Xiao, Tong, et al.
Published: (2025)
LDMapNet-U: An End-to-End System for City-Scale Lane-Level Map Updating
by: Xia, Deguo, et al.
Published: (2025)
by: Xia, Deguo, et al.
Published: (2025)
Kinship Data Benchmark for Multi-hop Reasoning
by: Sun, Tianda, et al.
Published: (2026)
by: Sun, Tianda, et al.
Published: (2026)
Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark
by: Hao, Yunzhuo, et al.
Published: (2025)
by: Hao, Yunzhuo, et al.
Published: (2025)
ST-BiBench: Benchmarking Multi-Stream Multimodal Coordination in Bimanual Embodied Tasks for MLLMs
by: Wu, Xin, et al.
Published: (2026)
by: Wu, Xin, et al.
Published: (2026)
MSR-Align: Policy-Grounded Multimodal Alignment for Safety-Aware Reasoning in Vision-Language Models
by: Xia, Yinan, et al.
Published: (2025)
by: Xia, Yinan, et al.
Published: (2025)
Cube Bench: A Benchmark for Spatial Visual Reasoning in MLLMs
by: Anand, Dhruv, et al.
Published: (2025)
by: Anand, Dhruv, et al.
Published: (2025)
Video-R1: Reinforcing Video Reasoning in MLLMs
by: Feng, Kaituo, et al.
Published: (2025)
by: Feng, Kaituo, et al.
Published: (2025)
Scaling Spatial Reasoning in MLLMs through Programmatic Data Synthesis
by: Helu, Zhi, et al.
Published: (2025)
by: Helu, Zhi, et al.
Published: (2025)
Moment-Video: Diagnosing Temporal Fidelity of Video MLLMs on Momentary Visual Events
by: Liu, Xiaolin, et al.
Published: (2026)
by: Liu, Xiaolin, et al.
Published: (2026)
Think 360°: Evaluating the Width-centric Reasoning Capability of MLLMs Beyond Depth
by: Chen, Mingrui, et al.
Published: (2026)
by: Chen, Mingrui, et al.
Published: (2026)
DepWiGNN: A Depth-wise Graph Neural Network for Multi-hop Spatial Reasoning in Text
by: Li, Shuaiyi, et al.
Published: (2023)
by: Li, Shuaiyi, et al.
Published: (2023)
Similar Items
-
FingerCap: Fine-grained Finger-level Hand Motion Captioning
by: Shen, Xin, et al.
Published: (2025) -
Stay in Character, Stay Safe: Dual-Cycle Adversarial Self-Evolution for Safety Role-Playing Agents
by: Liao, Mingyang, et al.
Published: (2026) -
PAIRS: Parametric-Verified Adaptive Information Retrieval and Selection for Efficient RAG
by: Chen, Wang, et al.
Published: (2025) -
Decide Then Retrieve: A Training-Free Framework with Uncertainty-Guided Triggering and Dual-Path Retrieval
by: Chen, Wang, et al.
Published: (2026) -
DuCCAE: A Hybrid Engine for Immersive Conversation via Collaboration, Augmentation, and Evolution
by: Shen, Xin, et al.
Published: (2026)