:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Tang, Yi, Wang, Kai-Ni, Chen, Yang, He, Xiaopu, Zhou, Guangquan
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence Computation and Language Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2508.07292
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

EndoCoT: Scaling Endogenous Chain-of-Thought Reasoning in Diffusion Models
by: Dai, Xuanlang, et al.
Published: (2026)

An Agentic System for Rare Disease Diagnosis with Traceable Reasoning
by: Zhao, Weike, et al.
Published: (2025)

InternAgent: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification
by: InternAgent Team, et al.
Published: (2025)

RadFabric: Agentic AI System with Reasoning Capability for Radiology
by: Chen, Wenting, et al.
Published: (2025)

CogniAlign: Survivability-Grounded Multi-Agent Moral Reasoning for Safe and Transparent AI
by: Ali, Hasin Jawad, et al.
Published: (2025)

RadAgents: Multimodal Agentic Reasoning for Chest X-ray Interpretation with Radiologist-like Workflows
by: Zhang, Kai, et al.
Published: (2025)

Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks
by: Ashraf, Tajamul, et al.
Published: (2025)

EndoStreamDepth: Temporally Consistent Monocular Depth Estimation for Endoscopic Video Streams
by: Li, Hao, et al.
Published: (2025)

History-Aware Reasoning for GUI Agents
by: Wang, Ziwei, et al.
Published: (2025)

Visual Merit or Linguistic Crutch? A Close Look at DeepSeek-OCR
by: Liang, Yunhao, et al.
Published: (2026)

ReLoop: "Seeing Twice and Thinking Backwards" via Closed-loop Training to Mitigate Hallucinations in Multimodal understanding
by: Yang, Jianjiang, et al.
Published: (2025)

Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs
by: Lu, Meng, et al.
Published: (2025)

VideoARM: Agentic Reasoning over Hierarchical Memory for Long-Form Video Understanding
by: Yin, Yufei, et al.
Published: (2025)

From Consistency to Complementarity: Aligned and Disentangled Multi-modal Learning for Time Series Understanding and Reasoning
by: Ni, Hang, et al.
Published: (2026)

Sherlock: Self-Correcting Reasoning in Vision-Language Models
by: Ding, Yi, et al.
Published: (2025)

EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery
by: Wang, Guankun, et al.
Published: (2025)

Diving into Self-Evolving Training for Multimodal Reasoning
by: Liu, Wei, et al.
Published: (2024)

Multimodal Human-AI Synergy for Medical Imaging Quality Control: A Hybrid Intelligence Framework with Adaptive Dataset Curation and Closed-Loop Evaluation
by: Qin, Zhi, et al.
Published: (2025)

EndoGaussian: Real-time Gaussian Splatting for Dynamic Endoscopic Scene Reconstruction
by: Liu, Yifan, et al.
Published: (2024)

Agent S: An Open Agentic Framework that Uses Computers Like a Human
by: Agashe, Saaket, et al.
Published: (2024)

Agri-R1: Agricultural Reasoning for Disease Diagnosis via Automated-Synthesis and Reinforcement Learning
by: Zhang, Wentao, et al.
Published: (2026)

EndoWave: Rational-Wavelet 4D Gaussian Splatting for Endoscopic Reconstruction
by: Wu, Taoyu, et al.
Published: (2025)

GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning
by: Chen, Yi, et al.
Published: (2025)

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
by: Zhou, Andy, et al.
Published: (2023)

Clinical Cognition Alignment for Gastrointestinal Diagnosis with Multimodal LLMs
by: Zheng, Huan, et al.
Published: (2026)

rStar2-Agent: Agentic Reasoning Technical Report
by: Shang, Ning, et al.
Published: (2025)

PresentAgent-2: Towards Generalist Multimodal Presentation Agents
by: Wu, Wei, et al.
Published: (2026)

EndoGS: Deformable Endoscopic Tissues Reconstruction with Gaussian Splatting
by: Zhu, Lingting, et al.
Published: (2024)

CycleChart: A Unified Consistency-Based Learning Framework for Bidirectional Chart Understanding and Generation
by: Deng, Dazhen, et al.
Published: (2025)

EndoGen: Conditional Autoregressive Endoscopic Video Generation
by: Liu, Xinyu, et al.
Published: (2025)

Hierarchical Visual Agent: Managing Contexts in Joint Image-Text Space for Advanced Chart Reasoning
by: Dong, Qihua, et al.
Published: (2026)

EndoDepth: A Benchmark for Assessing Robustness in Endoscopic Depth Prediction
by: Reyes-Amezcua, Ivan, et al.
Published: (2024)

Decompose and Compare Consistency: Measuring VLMs' Answer Reliability via Task-Decomposition Consistency Comparison
by: Yang, Qian, et al.
Published: (2024)

MARS: Memory-Enhanced Agents with Reflective Self-improvement
by: Liang, Xuechen, et al.
Published: (2025)

Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models
by: Wu, Junfei, et al.
Published: (2024)

End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning
by: Zheng, Qiaoyu, et al.
Published: (2025)

ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both
by: Guo, Ziyu, et al.
Published: (2026)

CogniFold: Always-On Proactive Memory via Cognitive Folding
by: Wang, Suli, et al.
Published: (2026)

JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents
by: Zheng, Kaizhi, et al.
Published: (2022)

EndoMamba: An Efficient Foundation Model for Endoscopic Videos via Hierarchical Pre-training
by: Tian, Qingyao, et al.
Published: (2025)