:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Yuan, Dou, Sihao, Hu, Kai, Deng, Shuhua, Cao, Chunhong, Xiao, Fen, Gao, Xieping
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.25778
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Distilling Knowledge from Heterogeneous Architectures for Semantic Segmentation
by: Huang, Yanglin, et al.
Published: (2025)

What You Perceive Is What You Conceive: A Cognition-Inspired Framework for Open Vocabulary Image Segmentation
by: Lin, Jianghang, et al.
Published: (2025)

Focus Entirety and Perceive Environment for Arbitrary-Shaped Text Detection
by: Han, Xu, et al.
Published: (2024)

Explicit Relational Reasoning Network for Scene Text Detection
by: Su, Yuchen, et al.
Published: (2024)

Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
by: Jin, Peng, et al.
Published: (2024)

Self-Supervised Learning for Endoscopic Video Analysis
by: Hirsch, Roy, et al.
Published: (2023)

Cognitive-Inspired Hierarchical Attention Fusion With Visual and Textual for Cross-Domain Sequential Recommendation
by: Wu, Wangyu, et al.
Published: (2025)

Learning to Rank Patches for Unbiased Image Redundancy Reduction
by: Luo, Yang, et al.
Published: (2024)

BIMM: Brain Inspired Masked Modeling for Video Representation Learning
by: Wan, Zhifan, et al.
Published: (2024)

Out of Length Text Recognition with Sub-String Matching
by: Du, Yongkun, et al.
Published: (2024)

LocoMotion: Learning Motion-Focused Video-Language Representations
by: Doughty, Hazel, et al.
Published: (2024)

Sparse Reasoning is Enough: Biological-Inspired Framework for Video Anomaly Detection with Large Pre-trained Models
by: Huang, He, et al.
Published: (2025)

EndoMamba: An Efficient Foundation Model for Endoscopic Videos via Hierarchical Pre-training
by: Tian, Qingyao, et al.
Published: (2025)

DiffCL: A Diffusion-Based Contrastive Learning Framework with Semantic Alignment for Multimodal Recommendations
by: Song, Qiya, et al.
Published: (2025)

Enhanced Scale-aware Depth Estimation for Monocular Endoscopic Scenes with Geometric Modeling
by: Wei, Ruofeng, et al.
Published: (2024)

Joint-Motion Mutual Learning for Pose Estimation in Videos
by: Wu, Sifan, et al.
Published: (2024)

Causal Perception Inspired Representation Learning for Trustworthy Image Quality Assessment
by: Wang, Lei, et al.
Published: (2024)

Achieving Fine-grained Cross-modal Understanding through Brain-inspired Hierarchical Representation Learning
by: You, Weihang, et al.
Published: (2026)

MASR: Self-Reflective Reasoning through Multimodal Hierarchical Attention Focusing for Agent-based Video Understanding
by: Cao, Shiwen, et al.
Published: (2025)

Manipulating a Tetris-Inspired 3D Video Representation
by: Godbole, Mihir
Published: (2024)

Unlocking Exocentric Video-Language Data for Egocentric Video Representation Learning
by: Dou, Zi-Yi, et al.
Published: (2024)

A Point-Neighborhood Learning Framework for Nasal Endoscope Image Segmentation
by: Jie, Pengyu, et al.
Published: (2024)

Video Compression with Hierarchical Temporal Neural Representation
by: Zhu, Jun, et al.
Published: (2026)

Endo3R: Unified Online Reconstruction from Dynamic Monocular Endoscopic Video
by: Guo, Jiaxin, et al.
Published: (2025)

Learning Brain Representation with Hierarchical Visual Embeddings
by: Zheng, Jiawen, et al.
Published: (2026)

EndoGen: Conditional Autoregressive Endoscopic Video Generation
by: Liu, Xinyu, et al.
Published: (2025)

MetaCOG: A Hierarchical Probabilistic Model for Learning Meta-Cognitive Visual Representations
by: Berke, Marlene D., et al.
Published: (2021)

FocusDiffuser: Perceiving Local Disparities for Camouflaged Object Detection
by: Zhao, Jianwei, et al.
Published: (2024)

A Heterogeneous Multimodal Graph Learning Framework for Recognizing User Emotions in Social Networks
by: Bhattacharyya, Sree, et al.
Published: (2025)

DocVideoQA: Towards Comprehensive Understanding of Document-Centric Videos through Question Answering
by: Wang, Haochen, et al.
Published: (2025)

Semantic-Aware Representation Learning via Conditional Transport for Multi-Label Image Classification
by: Xie, Ren-Dong, et al.
Published: (2025)

AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding
by: Wang, Xiao, et al.
Published: (2025)

VideoPerceiver: Enhancing Fine-Grained Temporal Perception in Video Multimodal Large Language Models
by: Zhao, Fufangchen, et al.
Published: (2025)

Learning Spatial-Preserving Hierarchical Representations for Digital Pathology
by: Wu, Weiyi, et al.
Published: (2024)

State-Change Learning for Prediction of Future Events in Endoscopic Videos
by: Sharma, Saurav, et al.
Published: (2025)

Bridging Brain and Semantics: A Hierarchical Framework for Semantically Enhanced fMRI-to-Video Reconstruction
by: Wei, Yujie, et al.
Published: (2026)

GaussianVideo: Efficient Video Representation via Hierarchical Gaussian Splatting
by: Bond, Andrew, et al.
Published: (2025)

Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video Recognition
by: Wang, Yulin, et al.
Published: (2024)

HiNeRV: Video Compression with Hierarchical Encoding-based Neural Representation
by: Kwan, Ho Man, et al.
Published: (2023)

Multi-Object Tracking by Hierarchical Visual Representations
by: Cao, Jinkun, et al.
Published: (2024)