:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Cao, Shiwen, Zhang, Zhaoxing, Jiao, Junming, Qiao, Juyi, Song, Guowen, Shen, Rong, Meng, Xiangbing
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2504.17213
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Tempo-R0: A Video-MLLM for Temporal Video Grounding through Efficient Temporal Sensing Reinforcement Learning
by: Yue, Feng, et al.
Published: (2025)

DR-RAG: Applying Dynamic Document Relevance to Retrieval-Augmented Generation for Question-Answering
by: Hei, Zijian, et al.
Published: (2024)

HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation
by: Liu, Pei, et al.
Published: (2025)

Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training
by: Jia, Mengzhao, et al.
Published: (2024)

VideoARM: Agentic Reasoning over Hierarchical Memory for Long-Form Video Understanding
by: Yin, Yufei, et al.
Published: (2025)

Total Syntheses of Highly Oxidized Natural Products†
by: Yan Wang, et al.
Published: (2025)

Fostering Sustainable Cooperation through Strategic Resource Allocation and Utilization on Social Networks
by: Li, Juyi, et al.
Published: (2025)

REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding
by: Li, Jiaze, et al.
Published: (2025)

Reinforcing Video Reasoning with Focused Thinking
by: Dang, Jisheng, et al.
Published: (2025)

ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding
by: Huang, Muye, et al.
Published: (2025)

ETBHD‐HMF: A Hierarchical Multimodal Fusion Architecture for Enhanced Text‐Based Hair Design
by: Rong He, et al.
Published: (2024)

MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention
by: Pang, Yuqi, et al.
Published: (2025)

Thinking with Drafts: Speculative Temporal Reasoning for Efficient Long Video Understanding
by: Hu, Pengfei, et al.
Published: (2025)

GSM-Agent: Understanding Agentic Reasoning Using Controllable Environments
by: Zhu, Hanlin, et al.
Published: (2025)

VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding
by: Fan, Yue, et al.
Published: (2024)

TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding
by: Pan, Junwen, et al.
Published: (2025)

Reinforce LLM Reasoning through Multi-Agent Reflection
by: Yuan, Yurun, et al.
Published: (2025)

GPA-RAM: Grasp-Pretraining Augmented Robotic Attention Mamba for Spatial Task Learning
by: Sheng, Juyi, et al.
Published: (2025)

HAVEN: Hierarchically Aligned Multimodal Benchmark for Unified Video Understanding
by: Shi, Mengqi, et al.
Published: (2026)

Multimodal Reasoning Agent for Zero-Shot Composed Image Retrieval
by: Tu, Rong-Cheng, et al.
Published: (2025)

Visual Attention Reasoning via Hierarchical Search and Self-Verification
by: Cai, Wei, et al.
Published: (2025)

Where to Focus: Query-Modulated Multimodal Keyframe Selection for Long Video Understanding
by: Wang, Shaoguang, et al.
Published: (2026)

A Multi-Agent Framework with Structured Reasoning and Reflective Refinement for Multimodal Empathetic Response Generation
by: Wang, Liping, et al.
Published: (2026)

Learning to Reflect: Hierarchical Multi-Agent Reinforcement Learning for CSI-Free mmWave Beam-Focusing
by: Le, Hieu, et al.
Published: (2026)

A Design Trajectory Map of Human-AI Collaborative Reinforcement Learning Systems: Survey and Taxonomy
by: Li, Zhaoxing
Published: (2024)

Sparse Asymptotic PCA: Identifying Sparse Latent Factors Across Time Horizon in High-Dimensional Time Series
by: Gao, Zhaoxing
Published: (2024)

Enhancing Language Agent Strategic Reasoning through Self-Play in Adversarial Games
by: Zhang, Yikai, et al.
Published: (2025)

State-Space Hierarchical Compression with Gated Attention and Learnable Sampling for Hour-Long Video Understanding in Large Multimodal Models
by: Kim, Geewook, et al.
Published: (2025)

TTSR: Test-Time Self-Reflection for Continual Reasoning Improvement
by: He, Haoyang, et al.
Published: (2026)

Designing Domain-Specific Agents via Hierarchical Task Abstraction Mechanism
by: Li, Kaiyu, et al.
Published: (2025)

SurgCoT: Advancing Spatiotemporal Reasoning in Surgical Videos through a Chain-of-Thought Benchmark
by: Wang, Gui, et al.
Published: (2026)

Self-Paced Sample Selection for Barely-Supervised Medical Image Segmentation
by: Su, Junming, et al.
Published: (2024)

IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs
by: Ma, David, et al.
Published: (2025)

EVLM: Self-Reflective Multimodal Reasoning for Cross-Dimensional Visual Editing
by: Khalid, Umar, et al.
Published: (2024)

The Reasoning Trap: How Enhancing LLM Reasoning Amplifies Tool Hallucination
by: Yin, Chenlong, et al.
Published: (2025)

HiCrew: Hierarchical Reasoning for Long-Form Video Understanding via Question-Aware Multi-Agent Collaboration
by: Zhu, Yuehan, et al.
Published: (2026)

Self-ReS: Self-Reflection in Large Vision-Language Models for Long Video Understanding
by: Pereira, Joao, et al.
Published: (2025)

AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls
by: Du, Yu, et al.
Published: (2024)

Construction of organ of Corti organoid to study the effects of berberine sulfate on damaged auditory cells
by: Junming Zhang, et al.
Published: (2024)

Aligning Vision to Language: Annotation-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning
by: Liu, Junming, et al.
Published: (2025)