Saved in:
| Main Authors: | Zhang, Yuan, Dou, Sihao, Hu, Kai, Deng, Shuhua, Cao, Chunhong, Xiao, Fen, Gao, Xieping |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.25778 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Distilling Knowledge from Heterogeneous Architectures for Semantic Segmentation
by: Huang, Yanglin, et al.
Published: (2025)
by: Huang, Yanglin, et al.
Published: (2025)
What You Perceive Is What You Conceive: A Cognition-Inspired Framework for Open Vocabulary Image Segmentation
by: Lin, Jianghang, et al.
Published: (2025)
by: Lin, Jianghang, et al.
Published: (2025)
Focus Entirety and Perceive Environment for Arbitrary-Shaped Text Detection
by: Han, Xu, et al.
Published: (2024)
by: Han, Xu, et al.
Published: (2024)
Explicit Relational Reasoning Network for Scene Text Detection
by: Su, Yuchen, et al.
Published: (2024)
by: Su, Yuchen, et al.
Published: (2024)
Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
by: Jin, Peng, et al.
Published: (2024)
by: Jin, Peng, et al.
Published: (2024)
Self-Supervised Learning for Endoscopic Video Analysis
by: Hirsch, Roy, et al.
Published: (2023)
by: Hirsch, Roy, et al.
Published: (2023)
Cognitive-Inspired Hierarchical Attention Fusion With Visual and Textual for Cross-Domain Sequential Recommendation
by: Wu, Wangyu, et al.
Published: (2025)
by: Wu, Wangyu, et al.
Published: (2025)
Learning to Rank Patches for Unbiased Image Redundancy Reduction
by: Luo, Yang, et al.
Published: (2024)
by: Luo, Yang, et al.
Published: (2024)
BIMM: Brain Inspired Masked Modeling for Video Representation Learning
by: Wan, Zhifan, et al.
Published: (2024)
by: Wan, Zhifan, et al.
Published: (2024)
Out of Length Text Recognition with Sub-String Matching
by: Du, Yongkun, et al.
Published: (2024)
by: Du, Yongkun, et al.
Published: (2024)
LocoMotion: Learning Motion-Focused Video-Language Representations
by: Doughty, Hazel, et al.
Published: (2024)
by: Doughty, Hazel, et al.
Published: (2024)
Sparse Reasoning is Enough: Biological-Inspired Framework for Video Anomaly Detection with Large Pre-trained Models
by: Huang, He, et al.
Published: (2025)
by: Huang, He, et al.
Published: (2025)
EndoMamba: An Efficient Foundation Model for Endoscopic Videos via Hierarchical Pre-training
by: Tian, Qingyao, et al.
Published: (2025)
by: Tian, Qingyao, et al.
Published: (2025)
DiffCL: A Diffusion-Based Contrastive Learning Framework with Semantic Alignment for Multimodal Recommendations
by: Song, Qiya, et al.
Published: (2025)
by: Song, Qiya, et al.
Published: (2025)
Enhanced Scale-aware Depth Estimation for Monocular Endoscopic Scenes with Geometric Modeling
by: Wei, Ruofeng, et al.
Published: (2024)
by: Wei, Ruofeng, et al.
Published: (2024)
Joint-Motion Mutual Learning for Pose Estimation in Videos
by: Wu, Sifan, et al.
Published: (2024)
by: Wu, Sifan, et al.
Published: (2024)
Causal Perception Inspired Representation Learning for Trustworthy Image Quality Assessment
by: Wang, Lei, et al.
Published: (2024)
by: Wang, Lei, et al.
Published: (2024)
Achieving Fine-grained Cross-modal Understanding through Brain-inspired Hierarchical Representation Learning
by: You, Weihang, et al.
Published: (2026)
by: You, Weihang, et al.
Published: (2026)
MASR: Self-Reflective Reasoning through Multimodal Hierarchical Attention Focusing for Agent-based Video Understanding
by: Cao, Shiwen, et al.
Published: (2025)
by: Cao, Shiwen, et al.
Published: (2025)
Manipulating a Tetris-Inspired 3D Video Representation
by: Godbole, Mihir
Published: (2024)
by: Godbole, Mihir
Published: (2024)
Unlocking Exocentric Video-Language Data for Egocentric Video Representation Learning
by: Dou, Zi-Yi, et al.
Published: (2024)
by: Dou, Zi-Yi, et al.
Published: (2024)
A Point-Neighborhood Learning Framework for Nasal Endoscope Image Segmentation
by: Jie, Pengyu, et al.
Published: (2024)
by: Jie, Pengyu, et al.
Published: (2024)
Video Compression with Hierarchical Temporal Neural Representation
by: Zhu, Jun, et al.
Published: (2026)
by: Zhu, Jun, et al.
Published: (2026)
Endo3R: Unified Online Reconstruction from Dynamic Monocular Endoscopic Video
by: Guo, Jiaxin, et al.
Published: (2025)
by: Guo, Jiaxin, et al.
Published: (2025)
Learning Brain Representation with Hierarchical Visual Embeddings
by: Zheng, Jiawen, et al.
Published: (2026)
by: Zheng, Jiawen, et al.
Published: (2026)
EndoGen: Conditional Autoregressive Endoscopic Video Generation
by: Liu, Xinyu, et al.
Published: (2025)
by: Liu, Xinyu, et al.
Published: (2025)
MetaCOG: A Hierarchical Probabilistic Model for Learning Meta-Cognitive Visual Representations
by: Berke, Marlene D., et al.
Published: (2021)
by: Berke, Marlene D., et al.
Published: (2021)
FocusDiffuser: Perceiving Local Disparities for Camouflaged Object Detection
by: Zhao, Jianwei, et al.
Published: (2024)
by: Zhao, Jianwei, et al.
Published: (2024)
A Heterogeneous Multimodal Graph Learning Framework for Recognizing User Emotions in Social Networks
by: Bhattacharyya, Sree, et al.
Published: (2025)
by: Bhattacharyya, Sree, et al.
Published: (2025)
DocVideoQA: Towards Comprehensive Understanding of Document-Centric Videos through Question Answering
by: Wang, Haochen, et al.
Published: (2025)
by: Wang, Haochen, et al.
Published: (2025)
Semantic-Aware Representation Learning via Conditional Transport for Multi-Label Image Classification
by: Xie, Ren-Dong, et al.
Published: (2025)
by: Xie, Ren-Dong, et al.
Published: (2025)
AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding
by: Wang, Xiao, et al.
Published: (2025)
by: Wang, Xiao, et al.
Published: (2025)
VideoPerceiver: Enhancing Fine-Grained Temporal Perception in Video Multimodal Large Language Models
by: Zhao, Fufangchen, et al.
Published: (2025)
by: Zhao, Fufangchen, et al.
Published: (2025)
Learning Spatial-Preserving Hierarchical Representations for Digital Pathology
by: Wu, Weiyi, et al.
Published: (2024)
by: Wu, Weiyi, et al.
Published: (2024)
State-Change Learning for Prediction of Future Events in Endoscopic Videos
by: Sharma, Saurav, et al.
Published: (2025)
by: Sharma, Saurav, et al.
Published: (2025)
Bridging Brain and Semantics: A Hierarchical Framework for Semantically Enhanced fMRI-to-Video Reconstruction
by: Wei, Yujie, et al.
Published: (2026)
by: Wei, Yujie, et al.
Published: (2026)
GaussianVideo: Efficient Video Representation via Hierarchical Gaussian Splatting
by: Bond, Andrew, et al.
Published: (2025)
by: Bond, Andrew, et al.
Published: (2025)
Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video Recognition
by: Wang, Yulin, et al.
Published: (2024)
by: Wang, Yulin, et al.
Published: (2024)
HiNeRV: Video Compression with Hierarchical Encoding-based Neural Representation
by: Kwan, Ho Man, et al.
Published: (2023)
by: Kwan, Ho Man, et al.
Published: (2023)
Multi-Object Tracking by Hierarchical Visual Representations
by: Cao, Jinkun, et al.
Published: (2024)
by: Cao, Jinkun, et al.
Published: (2024)
Similar Items
-
Distilling Knowledge from Heterogeneous Architectures for Semantic Segmentation
by: Huang, Yanglin, et al.
Published: (2025) -
What You Perceive Is What You Conceive: A Cognition-Inspired Framework for Open Vocabulary Image Segmentation
by: Lin, Jianghang, et al.
Published: (2025) -
Focus Entirety and Perceive Environment for Arbitrary-Shaped Text Detection
by: Han, Xu, et al.
Published: (2024) -
Explicit Relational Reasoning Network for Scene Text Detection
by: Su, Yuchen, et al.
Published: (2024) -
Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
by: Jin, Peng, et al.
Published: (2024)