Saved in:
| Main Authors: | Zhang, Binjie, Shou, Mike Zheng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.19852 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
EgoExo-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos
by: Xu, Jilan, et al.
Published: (2025)
by: Xu, Jilan, et al.
Published: (2025)
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
by: Gu, Yuchao, et al.
Published: (2025)
by: Gu, Yuchao, et al.
Published: (2025)
ShowUI-$π$: Flow-based Generative Models as GUI Dexterous Hands
by: Hu, Siyuan, et al.
Published: (2025)
by: Hu, Siyuan, et al.
Published: (2025)
Bridging Information Asymmetry in Text-video Retrieval: A Data-centric Approach
by: Bai, Zechen, et al.
Published: (2024)
by: Bai, Zechen, et al.
Published: (2024)
Code2Video: A Code-centric Paradigm for Educational Video Generation
by: Chen, Yanzhe, et al.
Published: (2025)
by: Chen, Yanzhe, et al.
Published: (2025)
TPDiff: Temporal Pyramid Video Diffusion Model
by: Ran, Lingmin, et al.
Published: (2025)
by: Ran, Lingmin, et al.
Published: (2025)
D-AR: Diffusion via Autoregressive Models
by: Gao, Ziteng, et al.
Published: (2025)
by: Gao, Ziteng, et al.
Published: (2025)
The Invisible EgoHand: 3D Hand Forecasting through EgoBody Pose Estimation
by: Hatano, Masashi, et al.
Published: (2025)
by: Hatano, Masashi, et al.
Published: (2025)
PCIE_EgoHandPose Solution for EgoExo4D Hand Pose Challenge
by: Chen, Feng, et al.
Published: (2024)
by: Chen, Feng, et al.
Published: (2024)
Encore: Conditioning Trajectory Forecasting via Biased Ego Rehearsals
by: Wong, Conghao, et al.
Published: (2026)
by: Wong, Conghao, et al.
Published: (2026)
Ego3DT: Tracking Every 3D Object in Ego-centric Videos
by: Hao, Shengyu, et al.
Published: (2024)
by: Hao, Shengyu, et al.
Published: (2024)
TrajMamba: An Ego-Motion-Guided Mamba Model for Pedestrian Trajectory Prediction from an Egocentric Perspective
by: Peng, Yusheng, et al.
Published: (2026)
by: Peng, Yusheng, et al.
Published: (2026)
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
by: Huang, Yifei, et al.
Published: (2024)
by: Huang, Yifei, et al.
Published: (2024)
VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
by: Lin, Kevin Qinghong, et al.
Published: (2025)
by: Lin, Kevin Qinghong, et al.
Published: (2025)
EgoHandICL: Egocentric 3D Hand Reconstruction with In-Context Learning
by: Xie, Binzhu, et al.
Published: (2026)
by: Xie, Binzhu, et al.
Published: (2026)
Leveraging Driver Field-of-View for Multimodal Ego-Trajectory Prediction
by: Akbiyik, M. Eren, et al.
Published: (2023)
by: Akbiyik, M. Eren, et al.
Published: (2023)
Optimizing Ego Vehicle Trajectory Prediction: The Graph Enhancement Approach
by: Sharma, Sushil, et al.
Published: (2023)
by: Sharma, Sushil, et al.
Published: (2023)
EgoNav: Egocentric Scene-aware Human Trajectory Prediction
by: Wang, Weizhuo, et al.
Published: (2024)
by: Wang, Weizhuo, et al.
Published: (2024)
P-Flow: Prompting Visual Effects Generation
by: Zhao, Rui, et al.
Published: (2026)
by: Zhao, Rui, et al.
Published: (2026)
Show-o2: Improved Native Unified Multimodal Models
by: Xie, Jinheng, et al.
Published: (2025)
by: Xie, Jinheng, et al.
Published: (2025)
DiffSim: Taming Diffusion Models for Evaluating Visual Similarity
by: Song, Yiren, et al.
Published: (2024)
by: Song, Yiren, et al.
Published: (2024)
Edit2Perceive: Image Editing Diffusion Models Are Strong Dense Perceivers
by: Shi, Yiqing, et al.
Published: (2025)
by: Shi, Yiqing, et al.
Published: (2025)
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
by: Zhao, Rui, et al.
Published: (2025)
by: Zhao, Rui, et al.
Published: (2025)
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
by: Mao, Weijia, et al.
Published: (2025)
by: Mao, Weijia, et al.
Published: (2025)
SAM-I2V: Upgrading SAM to Support Promptable Video Segmentation with Less than 0.2% Training Cost
by: Mei, Haiyang, et al.
Published: (2025)
by: Mei, Haiyang, et al.
Published: (2025)
Estimating Body and Hand Motion in an Ego-sensed World
by: Yi, Brent, et al.
Published: (2024)
by: Yi, Brent, et al.
Published: (2024)
Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions
by: Zhang, David Junhao, et al.
Published: (2024)
by: Zhang, David Junhao, et al.
Published: (2024)
DEGround: An Effective Baseline for Ego-centric 3D Visual Grounding with a Homogeneous Framework
by: Zhang, Yani, et al.
Published: (2025)
by: Zhang, Yani, et al.
Published: (2025)
EventEgoHands: Event-based Egocentric 3D Hand Mesh Reconstruction
by: Hara, Ryosei, et al.
Published: (2025)
by: Hara, Ryosei, et al.
Published: (2025)
EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models
by: Bai, Zechen, et al.
Published: (2025)
by: Bai, Zechen, et al.
Published: (2025)
Novel Diffusion Models for Multimodal 3D Hand Trajectory Prediction
by: Ma, Junyi, et al.
Published: (2025)
by: Ma, Junyi, et al.
Published: (2025)
EgoTraj-Bench: Towards Robust Trajectory Prediction Under Ego-view Noisy Observations
by: Liu, Jiayi, et al.
Published: (2025)
by: Liu, Jiayi, et al.
Published: (2025)
EgoSpot:Egocentric Multimodal Control for Hands-Free Mobile Manipulation
by: Zhang, Ganlin, et al.
Published: (2023)
by: Zhang, Ganlin, et al.
Published: (2023)
Multi-human Interactive Talking Dataset
by: Zhu, Zeyu, et al.
Published: (2025)
by: Zhu, Zeyu, et al.
Published: (2025)
PANDA: Towards Generalist Video Anomaly Detection via Agentic AI Engineer
by: Yang, Zhiwei, et al.
Published: (2025)
by: Yang, Zhiwei, et al.
Published: (2025)
MakeAnything: Harnessing Diffusion Transformers for Multi-Domain Procedural Sequence Generation
by: Song, Yiren, et al.
Published: (2025)
by: Song, Yiren, et al.
Published: (2025)
Impossible Videos
by: Bai, Zechen, et al.
Published: (2025)
by: Bai, Zechen, et al.
Published: (2025)
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data
by: Song, Yiren, et al.
Published: (2025)
by: Song, Yiren, et al.
Published: (2025)
LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer
by: Song, Yiren, et al.
Published: (2025)
by: Song, Yiren, et al.
Published: (2025)
Automated Movie Generation via Multi-Agent CoT Planning
by: Wu, Weijia, et al.
Published: (2025)
by: Wu, Weijia, et al.
Published: (2025)
Similar Items
-
EgoExo-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos
by: Xu, Jilan, et al.
Published: (2025) -
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
by: Gu, Yuchao, et al.
Published: (2025) -
ShowUI-$π$: Flow-based Generative Models as GUI Dexterous Hands
by: Hu, Siyuan, et al.
Published: (2025) -
Bridging Information Asymmetry in Text-video Retrieval: A Data-centric Approach
by: Bai, Zechen, et al.
Published: (2024) -
Code2Video: A Code-centric Paradigm for Educational Video Generation
by: Chen, Yanzhe, et al.
Published: (2025)