Saved in:
| Main Authors: | Cao, Shiwen, Zhang, Zhaoxing, Jiao, Junming, Qiao, Juyi, Song, Guowen, Shen, Rong, Meng, Xiangbing |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.17213 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Tempo-R0: A Video-MLLM for Temporal Video Grounding through Efficient Temporal Sensing Reinforcement Learning
by: Yue, Feng, et al.
Published: (2025)
by: Yue, Feng, et al.
Published: (2025)
DR-RAG: Applying Dynamic Document Relevance to Retrieval-Augmented Generation for Question-Answering
by: Hei, Zijian, et al.
Published: (2024)
by: Hei, Zijian, et al.
Published: (2024)
HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation
by: Liu, Pei, et al.
Published: (2025)
by: Liu, Pei, et al.
Published: (2025)
Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training
by: Jia, Mengzhao, et al.
Published: (2024)
by: Jia, Mengzhao, et al.
Published: (2024)
VideoARM: Agentic Reasoning over Hierarchical Memory for Long-Form Video Understanding
by: Yin, Yufei, et al.
Published: (2025)
by: Yin, Yufei, et al.
Published: (2025)
Total Syntheses of Highly Oxidized Natural Products†
by: Yan Wang, et al.
Published: (2025)
by: Yan Wang, et al.
Published: (2025)
Fostering Sustainable Cooperation through Strategic Resource Allocation and Utilization on Social Networks
by: Li, Juyi, et al.
Published: (2025)
by: Li, Juyi, et al.
Published: (2025)
REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding
by: Li, Jiaze, et al.
Published: (2025)
by: Li, Jiaze, et al.
Published: (2025)
Reinforcing Video Reasoning with Focused Thinking
by: Dang, Jisheng, et al.
Published: (2025)
by: Dang, Jisheng, et al.
Published: (2025)
ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding
by: Huang, Muye, et al.
Published: (2025)
by: Huang, Muye, et al.
Published: (2025)
ETBHD‐HMF: A Hierarchical Multimodal Fusion Architecture for Enhanced Text‐Based Hair Design
by: Rong He, et al.
Published: (2024)
by: Rong He, et al.
Published: (2024)
MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention
by: Pang, Yuqi, et al.
Published: (2025)
by: Pang, Yuqi, et al.
Published: (2025)
Thinking with Drafts: Speculative Temporal Reasoning for Efficient Long Video Understanding
by: Hu, Pengfei, et al.
Published: (2025)
by: Hu, Pengfei, et al.
Published: (2025)
GSM-Agent: Understanding Agentic Reasoning Using Controllable Environments
by: Zhu, Hanlin, et al.
Published: (2025)
by: Zhu, Hanlin, et al.
Published: (2025)
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding
by: Fan, Yue, et al.
Published: (2024)
by: Fan, Yue, et al.
Published: (2024)
TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding
by: Pan, Junwen, et al.
Published: (2025)
by: Pan, Junwen, et al.
Published: (2025)
Reinforce LLM Reasoning through Multi-Agent Reflection
by: Yuan, Yurun, et al.
Published: (2025)
by: Yuan, Yurun, et al.
Published: (2025)
GPA-RAM: Grasp-Pretraining Augmented Robotic Attention Mamba for Spatial Task Learning
by: Sheng, Juyi, et al.
Published: (2025)
by: Sheng, Juyi, et al.
Published: (2025)
HAVEN: Hierarchically Aligned Multimodal Benchmark for Unified Video Understanding
by: Shi, Mengqi, et al.
Published: (2026)
by: Shi, Mengqi, et al.
Published: (2026)
Multimodal Reasoning Agent for Zero-Shot Composed Image Retrieval
by: Tu, Rong-Cheng, et al.
Published: (2025)
by: Tu, Rong-Cheng, et al.
Published: (2025)
Visual Attention Reasoning via Hierarchical Search and Self-Verification
by: Cai, Wei, et al.
Published: (2025)
by: Cai, Wei, et al.
Published: (2025)
Where to Focus: Query-Modulated Multimodal Keyframe Selection for Long Video Understanding
by: Wang, Shaoguang, et al.
Published: (2026)
by: Wang, Shaoguang, et al.
Published: (2026)
A Multi-Agent Framework with Structured Reasoning and Reflective Refinement for Multimodal Empathetic Response Generation
by: Wang, Liping, et al.
Published: (2026)
by: Wang, Liping, et al.
Published: (2026)
Learning to Reflect: Hierarchical Multi-Agent Reinforcement Learning for CSI-Free mmWave Beam-Focusing
by: Le, Hieu, et al.
Published: (2026)
by: Le, Hieu, et al.
Published: (2026)
A Design Trajectory Map of Human-AI Collaborative Reinforcement Learning Systems: Survey and Taxonomy
by: Li, Zhaoxing
Published: (2024)
by: Li, Zhaoxing
Published: (2024)
Sparse Asymptotic PCA: Identifying Sparse Latent Factors Across Time Horizon in High-Dimensional Time Series
by: Gao, Zhaoxing
Published: (2024)
by: Gao, Zhaoxing
Published: (2024)
Enhancing Language Agent Strategic Reasoning through Self-Play in Adversarial Games
by: Zhang, Yikai, et al.
Published: (2025)
by: Zhang, Yikai, et al.
Published: (2025)
State-Space Hierarchical Compression with Gated Attention and Learnable Sampling for Hour-Long Video Understanding in Large Multimodal Models
by: Kim, Geewook, et al.
Published: (2025)
by: Kim, Geewook, et al.
Published: (2025)
TTSR: Test-Time Self-Reflection for Continual Reasoning Improvement
by: He, Haoyang, et al.
Published: (2026)
by: He, Haoyang, et al.
Published: (2026)
Designing Domain-Specific Agents via Hierarchical Task Abstraction Mechanism
by: Li, Kaiyu, et al.
Published: (2025)
by: Li, Kaiyu, et al.
Published: (2025)
SurgCoT: Advancing Spatiotemporal Reasoning in Surgical Videos through a Chain-of-Thought Benchmark
by: Wang, Gui, et al.
Published: (2026)
by: Wang, Gui, et al.
Published: (2026)
Self-Paced Sample Selection for Barely-Supervised Medical Image Segmentation
by: Su, Junming, et al.
Published: (2024)
by: Su, Junming, et al.
Published: (2024)
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs
by: Ma, David, et al.
Published: (2025)
by: Ma, David, et al.
Published: (2025)
EVLM: Self-Reflective Multimodal Reasoning for Cross-Dimensional Visual Editing
by: Khalid, Umar, et al.
Published: (2024)
by: Khalid, Umar, et al.
Published: (2024)
The Reasoning Trap: How Enhancing LLM Reasoning Amplifies Tool Hallucination
by: Yin, Chenlong, et al.
Published: (2025)
by: Yin, Chenlong, et al.
Published: (2025)
HiCrew: Hierarchical Reasoning for Long-Form Video Understanding via Question-Aware Multi-Agent Collaboration
by: Zhu, Yuehan, et al.
Published: (2026)
by: Zhu, Yuehan, et al.
Published: (2026)
Self-ReS: Self-Reflection in Large Vision-Language Models for Long Video Understanding
by: Pereira, Joao, et al.
Published: (2025)
by: Pereira, Joao, et al.
Published: (2025)
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls
by: Du, Yu, et al.
Published: (2024)
by: Du, Yu, et al.
Published: (2024)
Construction of organ of Corti organoid to study the effects of berberine sulfate on damaged auditory cells
by: Junming Zhang, et al.
Published: (2024)
by: Junming Zhang, et al.
Published: (2024)
Aligning Vision to Language: Annotation-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning
by: Liu, Junming, et al.
Published: (2025)
by: Liu, Junming, et al.
Published: (2025)
Similar Items
-
Tempo-R0: A Video-MLLM for Temporal Video Grounding through Efficient Temporal Sensing Reinforcement Learning
by: Yue, Feng, et al.
Published: (2025) -
DR-RAG: Applying Dynamic Document Relevance to Retrieval-Augmented Generation for Question-Answering
by: Hei, Zijian, et al.
Published: (2024) -
HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation
by: Liu, Pei, et al.
Published: (2025) -
Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training
by: Jia, Mengzhao, et al.
Published: (2024) -
VideoARM: Agentic Reasoning over Hierarchical Memory for Long-Form Video Understanding
by: Yin, Yufei, et al.
Published: (2025)