Saved in:
| Main Authors: | Cheng, Tongtong, Li, Rongzhen, Xiong, Yixin, Zhang, Tao, Wang, Jing, Liu, Kai |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.06072 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latent Space
by: Zhu, Jian, et al.
Published: (2025)
by: Zhu, Jian, et al.
Published: (2025)
Abductive Ego-View Accident Video Understanding for Safe Driving Perception
by: Fang, Jianwu, et al.
Published: (2024)
by: Fang, Jianwu, et al.
Published: (2024)
FlowAD: Ego-Scene Interactive Modeling for Autonomous Driving
by: Guo, Mingzhe, et al.
Published: (2026)
by: Guo, Mingzhe, et al.
Published: (2026)
Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding
by: Zhang, Haoyu, et al.
Published: (2025)
by: Zhang, Haoyu, et al.
Published: (2025)
Understanding Attention Mechanism in Video Diffusion Models
by: Liu, Bingyan, et al.
Published: (2025)
by: Liu, Bingyan, et al.
Published: (2025)
EgoSound: Benchmarking Sound Understanding in Egocentric Videos
by: Zhu, Bingwen, et al.
Published: (2026)
by: Zhu, Bingwen, et al.
Published: (2026)
EgoVLM: Policy Optimization for Egocentric Video Understanding
by: Vinod, Ashwin, et al.
Published: (2025)
by: Vinod, Ashwin, et al.
Published: (2025)
EgoDyn-Bench: Evaluating Ego-Motion Understanding in Vision-Centric Foundation Models for Autonomous Driving
by: Schäfer, Finn Rasmus, et al.
Published: (2026)
by: Schäfer, Finn Rasmus, et al.
Published: (2026)
EgoExo-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos
by: Xu, Jilan, et al.
Published: (2025)
by: Xu, Jilan, et al.
Published: (2025)
Intention-driven Ego-to-Exo Video Generation
by: Luo, Hongchen, et al.
Published: (2024)
by: Luo, Hongchen, et al.
Published: (2024)
Eyes Wide Open: Ego Proactive Video-LLM for Streaming Video
by: Zhang, Yulin, et al.
Published: (2025)
by: Zhang, Yulin, et al.
Published: (2025)
EgoGraph: Temporal Knowledge Graph for Egocentric Video Understanding
by: Sun, Shitong, et al.
Published: (2026)
by: Sun, Shitong, et al.
Published: (2026)
EgoExo-WM: Unlocking Exo Video for Ego World Models
by: Tran, Danny, et al.
Published: (2026)
by: Tran, Danny, et al.
Published: (2026)
VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI
by: Cheng, Sijie, et al.
Published: (2024)
by: Cheng, Sijie, et al.
Published: (2024)
Predicting Video Slot Attention Queries from Random Slot-Feature Pairs
by: Zhao, Rongzhen, et al.
Published: (2025)
by: Zhao, Rongzhen, et al.
Published: (2025)
Internalizing Temporal Consistency in Video Object-Centric Learning without Explicit Regularization
by: Zhao, Rongzhen, et al.
Published: (2026)
by: Zhao, Rongzhen, et al.
Published: (2026)
UniDrive-WM: Unified Understanding, Planning and Generation World Model For Autonomous Driving
by: Xiong, Zhexiao, et al.
Published: (2026)
by: Xiong, Zhexiao, et al.
Published: (2026)
Causality Model for Semantic Understanding on Videos
by: Yicong, Li
Published: (2025)
by: Yicong, Li
Published: (2025)
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation
by: Gao, Hongcheng, et al.
Published: (2025)
by: Gao, Hongcheng, et al.
Published: (2025)
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning
by: Chen, Yi, et al.
Published: (2023)
by: Chen, Yi, et al.
Published: (2023)
EgoAVU: Egocentric Audio-Visual Understanding
by: Seth, Ashish, et al.
Published: (2026)
by: Seth, Ashish, et al.
Published: (2026)
EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting
by: Zhang, Daiwei, et al.
Published: (2024)
by: Zhang, Daiwei, et al.
Published: (2024)
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding
by: Zhang, Boqiang, et al.
Published: (2025)
by: Zhang, Boqiang, et al.
Published: (2025)
Minerva-Ego: Spatiotemporal Hints for Egocentric Video Understanding
by: Nagrani, Arsha, et al.
Published: (2026)
by: Nagrani, Arsha, et al.
Published: (2026)
EgoInteract: Synthetic Egocentric Videos Generation for Interaction Understanding and Anticipation
by: Leonardi, Rosario, et al.
Published: (2026)
by: Leonardi, Rosario, et al.
Published: (2026)
Cycle Consistency in Video Object-Centric Learning
by: Zhao, Rongzhen, et al.
Published: (2026)
by: Zhao, Rongzhen, et al.
Published: (2026)
Drive-JEPA: Video JEPA Meets Multimodal Trajectory Distillation for End-to-End Driving
by: Wang, Linhan, et al.
Published: (2026)
by: Wang, Linhan, et al.
Published: (2026)
SurveillanceVQA-589K: A Benchmark for Comprehensive Surveillance Video-Language Understanding with Large Models
by: Liu, Bo, et al.
Published: (2025)
by: Liu, Bo, et al.
Published: (2025)
EgoExo-Con: Exploring View-Invariant Video Temporal Understanding
by: Jung, Minjoon, et al.
Published: (2025)
by: Jung, Minjoon, et al.
Published: (2025)
Understanding Long Videos with Multimodal Language Models
by: Ranasinghe, Kanchana, et al.
Published: (2024)
by: Ranasinghe, Kanchana, et al.
Published: (2024)
ST-Think: How Multimodal Large Language Models Reason About 4D Worlds from Ego-Centric Videos
by: Wu, Peiran, et al.
Published: (2025)
by: Wu, Peiran, et al.
Published: (2025)
Hier-EgoPack: Hierarchical Egocentric Video Understanding with Diverse Task Perspectives
by: Peirone, Simone Alberto, et al.
Published: (2025)
by: Peirone, Simone Alberto, et al.
Published: (2025)
Ego-InBetween: Generating Object State Transitions in Ego-Centric Videos
by: Ge, Mengmeng, et al.
Published: (2026)
by: Ge, Mengmeng, et al.
Published: (2026)
PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology
by: Sun, Yuxuan, et al.
Published: (2024)
by: Sun, Yuxuan, et al.
Published: (2024)
Ego-VPA: Egocentric Video Understanding with Parameter-efficient Adaptation
by: Wu, Tz-Ying, et al.
Published: (2024)
by: Wu, Tz-Ying, et al.
Published: (2024)
Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal Perspective
by: Chen, Meiqi, et al.
Published: (2024)
by: Chen, Meiqi, et al.
Published: (2024)
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
by: Wang, Yi, et al.
Published: (2024)
by: Wang, Yi, et al.
Published: (2024)
CauCLIP: Bridging the Sim-to-Real Gap in Surgical Video Understanding via Causality-Inspired Vision-Language Modeling
by: He, Yuxin, et al.
Published: (2026)
by: He, Yuxin, et al.
Published: (2026)
EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation
by: Pei, Baoqi, et al.
Published: (2024)
by: Pei, Baoqi, et al.
Published: (2024)
MM-Ego: Towards Building Egocentric Multimodal LLMs for Video QA
by: Ye, Hanrong, et al.
Published: (2024)
by: Ye, Hanrong, et al.
Published: (2024)
Similar Items
-
Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latent Space
by: Zhu, Jian, et al.
Published: (2025) -
Abductive Ego-View Accident Video Understanding for Safe Driving Perception
by: Fang, Jianwu, et al.
Published: (2024) -
FlowAD: Ego-Scene Interactive Modeling for Autonomous Driving
by: Guo, Mingzhe, et al.
Published: (2026) -
Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding
by: Zhang, Haoyu, et al.
Published: (2025) -
Understanding Attention Mechanism in Video Diffusion Models
by: Liu, Bingyan, et al.
Published: (2025)