:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Du, Bodong, Liu, Bowen, Yu, Yang, Ding, Xinpeng, Wu, Zhiheng, Wang, Shuning, Nie, Shuo, Liu, Naiming, Chen, Qifeng, Song, Yangqiu, Li, Xiaomeng
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2605.06537
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Divide-then-Diagnose: Weaving Clinician-Inspired Contexts for Ultra-Long Capsule Endoscopy Videos
by: Liu, Bowen, et al.
Published: (2026)

RadHiera: Semantic Hierarchical Reinforcement Learning for Medical Report Generation
by: Du, Bodong, et al.
Published: (2025)

See Further, Think Deeper: Advancing VLM's Reasoning Ability with Low-level Visual Cues and Reflection
by: Wu, Zhiheng, et al.
Published: (2026)

Multi-Modal Explainable Medical AI Assistant for Trustworthy Human-AI Collaboration
by: Yang, Honglong, et al.
Published: (2025)

Distribution-Aware Reward Estimation for Test-Time Reinforcement Learning
by: Du, Bodong, et al.
Published: (2026)

Subgraph Aggregation for Out-of-Distribution Generalization on Graphs
by: Liu, Bowen, et al.
Published: (2024)

WildLMa: Long Horizon Loco-Manipulation in the Wild
by: Qiu, Ri-Zhao, et al.
Published: (2024)

Tri-Plane Mamba: Efficiently Adapting Segment Anything Model for 3D Medical Images
by: Wang, Hualiang, et al.
Published: (2024)

MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding
by: Zuo, Yuxin, et al.
Published: (2025)

QuarkMedSearch: A Long-Horizon Deep Search Agent for Exploring Medical Intelligence
by: Lin, Zhichao, et al.
Published: (2026)

PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning
by: Ding, Xinpeng, et al.
Published: (2025)

Spatially Grounded Long-Horizon Task Planning in the Wild
by: Jung, Sehun, et al.
Published: (2026)

AMemGym: Interactive Memory Benchmarking for Assistants in Long-Horizon Conversations
by: Jiayang, Cheng, et al.
Published: (2026)

Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models
by: Ding, Xinpeng, et al.
Published: (2024)

WorldWeaver: Generating Long-Horizon Video Worlds via Rich Perception
by: Liu, Zhiheng, et al.
Published: (2025)

LongVideoAgent: Multi-Agent Reasoning with Long Videos
by: Liu, Runtao, et al.
Published: (2025)

Deep Bayesian Reinforcement Learning for Spacecraft Proximity Maneuvers and Docking
by: Du, Desong, et al.
Published: (2023)

Towards Subgraph Isomorphism Counting with Graph Kernels
by: Liu, Xin, et al.
Published: (2024)

Channel Modeling and Rate Analysis of Optical Inter-Satellite Link (OISL)
by: Shang, Bodong, et al.
Published: (2025)

MedMT-Bench: Can LLMs Memorize and Understand Long Multi-Turn Conversations in Medical Scenarios?
by: Yang, Lin, et al.
Published: (2026)

ST-SimDiff: Balancing Spatiotemporal Similarity and Difference for Efficient Video Understanding with MLLMs
by: Luo, Bingjun, et al.
Published: (2026)

WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation
by: Ding, Shuangrui, et al.
Published: (2026)

Beyond Tools: Generative AI as Epistemic Infrastructure in Education
by: Chen, Bodong
Published: (2025)

1st Place Winner of the 2024 Pixel-level Video Understanding in the Wild (CVPR'24 PVUW) Challenge in Video Panoptic Segmentation and Best Long Video Consistency of Video Semantic Segmentation
by: Liu, Qingfeng, et al.
Published: (2024)

The Cognitive Bandwidth Bottleneck: Shifting Long-Horizon Agent from Planning with Actions to Planning with Schemas
by: Xu, Baixuan, et al.
Published: (2025)

LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding
by: Wu, Haoning, et al.
Published: (2024)

MedGround-R1: Advancing Medical Image Grounding via Spatial-Semantic Rewarded Group Relative Policy Optimization
by: Xu, Huihui, et al.
Published: (2025)

MedGRPO: Multi-Task Reinforcement Learning for Heterogeneous Medical Video Understanding
by: Su, Yuhao, et al.
Published: (2025)

MedCoT: Medical Chain of Thought via Hierarchical Expert
by: Liu, Jiaxiang, et al.
Published: (2024)

Towards Event-oriented Long Video Understanding
by: Du, Yifan, et al.
Published: (2024)

PyraVid: Hierarchical Multimodal Memory for Long-Horizon Video Reasoning
by: Yan, Sikuan, et al.
Published: (2026)

OmniV-Med: Scaling Medical Vision-Language Model for Universal Visual Understanding
by: Jiang, Songtao, et al.
Published: (2025)

InfiMed: Low-Resource Medical MLLMs with Advancing Understanding and Reasoning
by: Liu, Zeyu, et al.
Published: (2025)

Token Activation Map to Visually Explain Multimodal LLMs
by: Li, Yi, et al.
Published: (2025)

HiLM-D: Enhancing MLLMs with Multi-Scale High-Resolution Details for Autonomous Driving
by: Ding, Xinpeng, et al.
Published: (2023)

MedSapiens: Taking a Pose to Rethink Medical Imaging Landmark Detection
by: Elbatel, Marawan, et al.
Published: (2025)

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
by: Tian, Zeyue, et al.
Published: (2024)

Synthetic Context Generation for Question Generation
by: Liu, Naiming, et al.
Published: (2024)

Circuit Complexity of Hierarchical Knowledge Tracing and Implications for Log-Precision Transformers
by: Liu, Naiming, et al.
Published: (2026)

MetaCLASS: Metacognitive Coaching for Learning with Adaptive Self-regulation Support
by: Liu, Naiming, et al.
Published: (2026)