Saved in:
| Main Authors: | Sun, Pengzhan, Xiao, Junbin, Tse, Tze Ho Elden, Li, Yicong, Akula, Arjun, Yao, Angela |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.13621 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Improving Human Motion Plausibility with Body Momentum
by: Nguyen, Ha Linh, et al.
Published: (2025)
by: Nguyen, Ha Linh, et al.
Published: (2025)
DAS3R: Dynamics-Aware Gaussian Splatting for Static Scene Reconstruction
by: Xu, Kai, et al.
Published: (2024)
by: Xu, Kai, et al.
Published: (2024)
Can I Trust Your Answer? Visually Grounded Video Question Answering
by: Xiao, Junbin, et al.
Published: (2023)
by: Xiao, Junbin, et al.
Published: (2023)
A Constrained Optimization Approach for Gaussian Splatting from Coarsely-posed Images and Noisy Lidar Point Clouds
by: Peng, Jizong, et al.
Published: (2025)
by: Peng, Jizong, et al.
Published: (2025)
Humans as Checkerboards: Calibrating Camera Motion Scale for World-Coordinate Human Mesh Recovery
by: Yang, Fengyuan, et al.
Published: (2024)
by: Yang, Fengyuan, et al.
Published: (2024)
Leveraging RGB Images for Pre-Training of Event-Based Hand Pose Estimation
by: Liu, Ruicong, et al.
Published: (2025)
by: Liu, Ruicong, et al.
Published: (2025)
TIGeR: Text-Instructed Generation and Refinement for Template-Free Hand-Object Interaction
by: Huang, Yiyao, et al.
Published: (2025)
by: Huang, Yiyao, et al.
Published: (2025)
Ego-Grounding for Personalized Question-Answering in Egocentric Videos
by: Xiao, Junbin, et al.
Published: (2026)
by: Xiao, Junbin, et al.
Published: (2026)
SA-GS: Semantic-Aware Gaussian Splatting for Large Scene Reconstruction with Geometry Constrain
by: Xiong, Butian, et al.
Published: (2024)
by: Xiong, Butian, et al.
Published: (2024)
EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering
by: Zhou, Sheng, et al.
Published: (2025)
by: Zhou, Sheng, et al.
Published: (2025)
Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics
by: Tse, Tze Ho Elden, et al.
Published: (2025)
by: Tse, Tze Ho Elden, et al.
Published: (2025)
GeoReF: Geometric Alignment Across Shape Variation for Category-level Object Pose Refinement
by: Zheng, Linfang, et al.
Published: (2024)
by: Zheng, Linfang, et al.
Published: (2024)
High-Resolution Spatiotemporal Modeling with Global-Local State Space Models for Video-Based Human Pose Estimation
by: Feng, Runyang, et al.
Published: (2025)
by: Feng, Runyang, et al.
Published: (2025)
Ego2World: Compiling Egocentric Cooking Videos into Executable Worlds for Belief-State Planning
by: Cheng, Qinchuan, et al.
Published: (2026)
by: Cheng, Qinchuan, et al.
Published: (2026)
EgoBlind: Towards Egocentric Visual Assistance for the Blind
by: Xiao, Junbin, et al.
Published: (2025)
by: Xiao, Junbin, et al.
Published: (2025)
Analyzing the Synthetic-to-Real Domain Gap in 3D Hand Pose Estimation
by: Zhao, Zhuoran, et al.
Published: (2025)
by: Zhao, Zhuoran, et al.
Published: (2025)
Question-Answering Dense Video Events
by: Qin, Hangyu, et al.
Published: (2024)
by: Qin, Hangyu, et al.
Published: (2024)
TeleEgo: Benchmarking Egocentric AI Assistants in the Wild
by: Yan, Jiaqi, et al.
Published: (2025)
by: Yan, Jiaqi, et al.
Published: (2025)
On the Consistency of Video Large Language Models in Temporal Comprehension
by: Jung, Minjoon, et al.
Published: (2024)
by: Jung, Minjoon, et al.
Published: (2024)
EgoLife: Towards Egocentric Life Assistant
by: Yang, Jingkang, et al.
Published: (2025)
by: Yang, Jingkang, et al.
Published: (2025)
Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis
by: Li, Lei-lei, et al.
Published: (2025)
by: Li, Lei-lei, et al.
Published: (2025)
Scene-Text Grounding for Text-Based Video Question Answering
by: Zhou, Sheng, et al.
Published: (2024)
by: Zhou, Sheng, et al.
Published: (2024)
MuKV: Multi-Grained KV Cache Compression for Long Streaming Video Question-Answering
by: Xiao, Junbin, et al.
Published: (2026)
by: Xiao, Junbin, et al.
Published: (2026)
VPG: Visual Prefix Guidance for Autoregressive Image and Video Generation
by: Liao, Xinyao, et al.
Published: (2026)
by: Liao, Xinyao, et al.
Published: (2026)
Some Modalities are More Equal Than Others: Decoding and Architecting Multimodal Integration in MLLMs
by: Chen, Tianle, et al.
Published: (2025)
by: Chen, Tianle, et al.
Published: (2025)
Building Egocentric Procedural AI Assistant: Methods, Benchmarks, and Challenges
by: Li, Junlong, et al.
Published: (2025)
by: Li, Junlong, et al.
Published: (2025)
VideoQA in the Era of LLMs: An Empirical Study
by: Xiao, Junbin, et al.
Published: (2024)
by: Xiao, Junbin, et al.
Published: (2024)
Intention-Conditioned Long-Term Human Egocentric Action Forecasting
by: Mascaro, Esteve Valls, et al.
Published: (2022)
by: Mascaro, Esteve Valls, et al.
Published: (2022)
EgoSelf: From Memory to Personalized Egocentric Assistant
by: Wang, Yanshuo, et al.
Published: (2026)
by: Wang, Yanshuo, et al.
Published: (2026)
EgoExo-Con: Exploring View-Invariant Video Temporal Understanding
by: Jung, Minjoon, et al.
Published: (2025)
by: Jung, Minjoon, et al.
Published: (2025)
ALGO: Object-Grounded Visual Commonsense Reasoning for Open-World Egocentric Action Recognition
by: Kundu, Sanjoy, et al.
Published: (2024)
by: Kundu, Sanjoy, et al.
Published: (2024)
REAR: Rethinking Visual Autoregressive Models via Generator-Tokenizer Consistency Regularization
by: He, Qiyuan, et al.
Published: (2025)
by: He, Qiyuan, et al.
Published: (2025)
Grounded Question-Answering in Long Egocentric Videos
by: Di, Shangzhe, et al.
Published: (2023)
by: Di, Shangzhe, et al.
Published: (2023)
AnchorDS: Anchoring Dynamic Sources for Semantically Consistent Text-to-3D Generation
by: Zhu, Jiayin, et al.
Published: (2025)
by: Zhu, Jiayin, et al.
Published: (2025)
Simultaneous Detection and Interaction Reasoning for Object-Centric Action Recognition
by: Li, Xunsong, et al.
Published: (2024)
by: Li, Xunsong, et al.
Published: (2024)
Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model
by: Huang, Yifei, et al.
Published: (2024)
by: Huang, Yifei, et al.
Published: (2024)
HRVDA: High-Resolution Visual Document Assistant
by: Liu, Chaohu, et al.
Published: (2024)
by: Liu, Chaohu, et al.
Published: (2024)
Fine-grained Spatiotemporal Grounding on Egocentric Videos
by: Liang, Shuo, et al.
Published: (2025)
by: Liang, Shuo, et al.
Published: (2025)
Discovering Novel Actions from Open World Egocentric Videos with Object-Grounded Visual Commonsense Reasoning
by: Kundu, Sanjoy, et al.
Published: (2023)
by: Kundu, Sanjoy, et al.
Published: (2023)
WHOLE: World-Grounded Hand-Object Lifted from Egocentric Videos
by: Ye, Yufei, et al.
Published: (2026)
by: Ye, Yufei, et al.
Published: (2026)
Similar Items
-
Improving Human Motion Plausibility with Body Momentum
by: Nguyen, Ha Linh, et al.
Published: (2025) -
DAS3R: Dynamics-Aware Gaussian Splatting for Static Scene Reconstruction
by: Xu, Kai, et al.
Published: (2024) -
Can I Trust Your Answer? Visually Grounded Video Question Answering
by: Xiao, Junbin, et al.
Published: (2023) -
A Constrained Optimization Approach for Gaussian Splatting from Coarsely-posed Images and Noisy Lidar Point Clouds
by: Peng, Jizong, et al.
Published: (2025) -
Humans as Checkerboards: Calibrating Camera Motion Scale for World-Coordinate Human Mesh Recovery
by: Yang, Fengyuan, et al.
Published: (2024)