:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Dong, Ze, Shi, Hao, Gao, Zejia, Yi, Zhonghua, Wang, Kaiwei, Wang, Lin
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2604.15823
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Benchmarking the Robustness of Optical Flow Estimation to Corruptions
by: Yi, Zhonghua, et al.
Published: (2024)

EI-Nexus: Towards Unmediated and Flexible Inter-Modality Local Feature Extraction and Matching for Event-Image Data
by: Yi, Zhonghua, et al.
Published: (2024)

Learning to Highlight Audio by Watching Movies
by: Huang, Chao, et al.
Published: (2025)

AoE: Always-on Egocentric Human Video Collection for Embodied AI
by: Yang, Bowen, et al.
Published: (2026)

EgoEV-HandPose: Egocentric 3D Hand Pose Estimation and Gesture Recognition with Stereo Event Cameras
by: Wang, Luming, et al.
Published: (2026)

EgoEvGesture: Gesture Recognition Based on Egocentric Event Camera
by: Wang, Luming, et al.
Published: (2025)

Minimalist and High-Quality Panoramic Imaging with PSF-aware Transformers
by: Jiang, Qi, et al.
Published: (2023)

EgoSim: Egocentric World Simulator for Embodied Interaction Generation
by: Hao, Jinkun, et al.
Published: (2026)

Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding
by: Fan, Yue, et al.
Published: (2024)

Exploring Event-based Human Pose Estimation with 3D Event Representations
by: Yin, Xiaoting, et al.
Published: (2023)

Beyond the Field-of-View: Enhancing Scene Visibility and Perception with Clip-Recurrent Transformer
by: Shi, Hao, et al.
Published: (2022)

LF-PGVIO: A Visual-Inertial-Odometry Framework for Large Field-of-View Cameras using Points and Geodesic Segments
by: Wang, Ze, et al.
Published: (2023)

OmniLens: Towards Universal Lens Aberration Correction via LensLib-to-Specific Domain Adaptation
by: Jiang, Qi, et al.
Published: (2024)

Egocentric Human-Object Interaction Detection: A New Benchmark and Method
by: Deng, Kunyuan, et al.
Published: (2025)

Character-Centric Understanding of Animated Movies
by: Gui, Zhongrui, et al.
Published: (2025)

Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model
by: Huang, Yifei, et al.
Published: (2024)

VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI
by: Cheng, Sijie, et al.
Published: (2024)

OneOcc: Semantic Occupancy Prediction for Legged Robots with a Single Panoramic Camera
by: Shi, Hao, et al.
Published: (2025)

Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding
by: Zhang, Haoyu, et al.
Published: (2025)

Event-guided 3D Gaussian Splatting for Dynamic Human and Scene Reconstruction
by: Yin, Xiaoting, et al.
Published: (2025)

MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
by: Wu, Weijia, et al.
Published: (2024)

AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding
by: Suglia, Alessandro, et al.
Published: (2024)

Ego4o: Egocentric Human Motion Capture and Understanding from Multi-Modal Input
by: Wang, Jian, et al.
Published: (2025)

WAT: Online Video Understanding Needs Watching Before Thinking
by: Han, Zifan, et al.
Published: (2026)

LOME: Learning Human-Object Manipulation with Action-Conditioned Egocentric World Model
by: Gao, Quankai, et al.
Published: (2026)

EgoAVU: Egocentric Audio-Visual Understanding
by: Seth, Ashish, et al.
Published: (2026)

P2U-SLAM: A Monocular Wide-FoV SLAM System Based on Point Uncertainty and Pose Uncertainty
by: Zhang, Yufan, et al.
Published: (2024)

OPTIAGENT: A Physics-Driven Agentic Framework for Automated Optical Design
by: Geng, Yuyu, et al.
Published: (2026)

MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
by: Song, Enxin, et al.
Published: (2023)

EgoSound: Benchmarking Sound Understanding in Egocentric Videos
by: Zhu, Bingwen, et al.
Published: (2026)

S$^{5}$Mars: Semi-Supervised Learning for Mars Semantic Segmentation
by: Zhang, Jiahang, et al.
Published: (2022)

ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
by: Dang, Ronghao, et al.
Published: (2025)

EVA02-AT: Egocentric Video-Language Understanding with Spatial-Temporal Rotary Positional Embeddings and Symmetric Optimization
by: Wang, Xiaoqi, et al.
Published: (2025)

Movie Facts and Fibs (MF$^2$): A Benchmark for Long Movie Understanding
by: Zaranis, Emmanouil, et al.
Published: (2025)

HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction
by: Liu, Yunze, et al.
Published: (2022)

EgoSplat: Open-Vocabulary Egocentric Scene Understanding with Language Embedded 3D Gaussian Splatting
by: Li, Di, et al.
Published: (2025)

SFHand: Learning Embodied Manipulation by Streaming Egocentric 3D Hand Forecasting
by: Liu, Ruicong, et al.
Published: (2025)

MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence
by: Zhao, Canyu, et al.
Published: (2024)

Movie101v2: Improved Movie Narration Benchmark
by: Yue, Zihao, et al.
Published: (2024)

Representing Domain-Mixing Optical Degradation for Real-World Computational Aberration Correction via Vector Quantization
by: Jiang, Qi, et al.
Published: (2024)