Saved in:
| Main Authors: | Dong, Ze, Shi, Hao, Gao, Zejia, Yi, Zhonghua, Wang, Kaiwei, Wang, Lin |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.15823 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Benchmarking the Robustness of Optical Flow Estimation to Corruptions
by: Yi, Zhonghua, et al.
Published: (2024)
by: Yi, Zhonghua, et al.
Published: (2024)
EI-Nexus: Towards Unmediated and Flexible Inter-Modality Local Feature Extraction and Matching for Event-Image Data
by: Yi, Zhonghua, et al.
Published: (2024)
by: Yi, Zhonghua, et al.
Published: (2024)
Learning to Highlight Audio by Watching Movies
by: Huang, Chao, et al.
Published: (2025)
by: Huang, Chao, et al.
Published: (2025)
AoE: Always-on Egocentric Human Video Collection for Embodied AI
by: Yang, Bowen, et al.
Published: (2026)
by: Yang, Bowen, et al.
Published: (2026)
EgoEV-HandPose: Egocentric 3D Hand Pose Estimation and Gesture Recognition with Stereo Event Cameras
by: Wang, Luming, et al.
Published: (2026)
by: Wang, Luming, et al.
Published: (2026)
EgoEvGesture: Gesture Recognition Based on Egocentric Event Camera
by: Wang, Luming, et al.
Published: (2025)
by: Wang, Luming, et al.
Published: (2025)
Minimalist and High-Quality Panoramic Imaging with PSF-aware Transformers
by: Jiang, Qi, et al.
Published: (2023)
by: Jiang, Qi, et al.
Published: (2023)
EgoSim: Egocentric World Simulator for Embodied Interaction Generation
by: Hao, Jinkun, et al.
Published: (2026)
by: Hao, Jinkun, et al.
Published: (2026)
Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding
by: Fan, Yue, et al.
Published: (2024)
by: Fan, Yue, et al.
Published: (2024)
Exploring Event-based Human Pose Estimation with 3D Event Representations
by: Yin, Xiaoting, et al.
Published: (2023)
by: Yin, Xiaoting, et al.
Published: (2023)
Beyond the Field-of-View: Enhancing Scene Visibility and Perception with Clip-Recurrent Transformer
by: Shi, Hao, et al.
Published: (2022)
by: Shi, Hao, et al.
Published: (2022)
LF-PGVIO: A Visual-Inertial-Odometry Framework for Large Field-of-View Cameras using Points and Geodesic Segments
by: Wang, Ze, et al.
Published: (2023)
by: Wang, Ze, et al.
Published: (2023)
OmniLens: Towards Universal Lens Aberration Correction via LensLib-to-Specific Domain Adaptation
by: Jiang, Qi, et al.
Published: (2024)
by: Jiang, Qi, et al.
Published: (2024)
Egocentric Human-Object Interaction Detection: A New Benchmark and Method
by: Deng, Kunyuan, et al.
Published: (2025)
by: Deng, Kunyuan, et al.
Published: (2025)
Character-Centric Understanding of Animated Movies
by: Gui, Zhongrui, et al.
Published: (2025)
by: Gui, Zhongrui, et al.
Published: (2025)
Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model
by: Huang, Yifei, et al.
Published: (2024)
by: Huang, Yifei, et al.
Published: (2024)
VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI
by: Cheng, Sijie, et al.
Published: (2024)
by: Cheng, Sijie, et al.
Published: (2024)
OneOcc: Semantic Occupancy Prediction for Legged Robots with a Single Panoramic Camera
by: Shi, Hao, et al.
Published: (2025)
by: Shi, Hao, et al.
Published: (2025)
Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding
by: Zhang, Haoyu, et al.
Published: (2025)
by: Zhang, Haoyu, et al.
Published: (2025)
Event-guided 3D Gaussian Splatting for Dynamic Human and Scene Reconstruction
by: Yin, Xiaoting, et al.
Published: (2025)
by: Yin, Xiaoting, et al.
Published: (2025)
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
by: Wu, Weijia, et al.
Published: (2024)
by: Wu, Weijia, et al.
Published: (2024)
AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding
by: Suglia, Alessandro, et al.
Published: (2024)
by: Suglia, Alessandro, et al.
Published: (2024)
Ego4o: Egocentric Human Motion Capture and Understanding from Multi-Modal Input
by: Wang, Jian, et al.
Published: (2025)
by: Wang, Jian, et al.
Published: (2025)
WAT: Online Video Understanding Needs Watching Before Thinking
by: Han, Zifan, et al.
Published: (2026)
by: Han, Zifan, et al.
Published: (2026)
LOME: Learning Human-Object Manipulation with Action-Conditioned Egocentric World Model
by: Gao, Quankai, et al.
Published: (2026)
by: Gao, Quankai, et al.
Published: (2026)
EgoAVU: Egocentric Audio-Visual Understanding
by: Seth, Ashish, et al.
Published: (2026)
by: Seth, Ashish, et al.
Published: (2026)
P2U-SLAM: A Monocular Wide-FoV SLAM System Based on Point Uncertainty and Pose Uncertainty
by: Zhang, Yufan, et al.
Published: (2024)
by: Zhang, Yufan, et al.
Published: (2024)
OPTIAGENT: A Physics-Driven Agentic Framework for Automated Optical Design
by: Geng, Yuyu, et al.
Published: (2026)
by: Geng, Yuyu, et al.
Published: (2026)
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
by: Song, Enxin, et al.
Published: (2023)
by: Song, Enxin, et al.
Published: (2023)
EgoSound: Benchmarking Sound Understanding in Egocentric Videos
by: Zhu, Bingwen, et al.
Published: (2026)
by: Zhu, Bingwen, et al.
Published: (2026)
S$^{5}$Mars: Semi-Supervised Learning for Mars Semantic Segmentation
by: Zhang, Jiahang, et al.
Published: (2022)
by: Zhang, Jiahang, et al.
Published: (2022)
ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
by: Dang, Ronghao, et al.
Published: (2025)
by: Dang, Ronghao, et al.
Published: (2025)
EVA02-AT: Egocentric Video-Language Understanding with Spatial-Temporal Rotary Positional Embeddings and Symmetric Optimization
by: Wang, Xiaoqi, et al.
Published: (2025)
by: Wang, Xiaoqi, et al.
Published: (2025)
Movie Facts and Fibs (MF$^2$): A Benchmark for Long Movie Understanding
by: Zaranis, Emmanouil, et al.
Published: (2025)
by: Zaranis, Emmanouil, et al.
Published: (2025)
HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction
by: Liu, Yunze, et al.
Published: (2022)
by: Liu, Yunze, et al.
Published: (2022)
EgoSplat: Open-Vocabulary Egocentric Scene Understanding with Language Embedded 3D Gaussian Splatting
by: Li, Di, et al.
Published: (2025)
by: Li, Di, et al.
Published: (2025)
SFHand: Learning Embodied Manipulation by Streaming Egocentric 3D Hand Forecasting
by: Liu, Ruicong, et al.
Published: (2025)
by: Liu, Ruicong, et al.
Published: (2025)
MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence
by: Zhao, Canyu, et al.
Published: (2024)
by: Zhao, Canyu, et al.
Published: (2024)
Movie101v2: Improved Movie Narration Benchmark
by: Yue, Zihao, et al.
Published: (2024)
by: Yue, Zihao, et al.
Published: (2024)
Representing Domain-Mixing Optical Degradation for Real-World Computational Aberration Correction via Vector Quantization
by: Jiang, Qi, et al.
Published: (2024)
by: Jiang, Qi, et al.
Published: (2024)
Similar Items
-
Benchmarking the Robustness of Optical Flow Estimation to Corruptions
by: Yi, Zhonghua, et al.
Published: (2024) -
EI-Nexus: Towards Unmediated and Flexible Inter-Modality Local Feature Extraction and Matching for Event-Image Data
by: Yi, Zhonghua, et al.
Published: (2024) -
Learning to Highlight Audio by Watching Movies
by: Huang, Chao, et al.
Published: (2025) -
AoE: Always-on Egocentric Human Video Collection for Embodied AI
by: Yang, Bowen, et al.
Published: (2026) -
EgoEV-HandPose: Egocentric 3D Hand Pose Estimation and Gesture Recognition with Stereo Event Cameras
by: Wang, Luming, et al.
Published: (2026)