:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Binjie, Shou, Mike Zheng
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2508.19852
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

EgoExo-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos
by: Xu, Jilan, et al.
Published: (2025)

Long-Context Autoregressive Video Modeling with Next-Frame Prediction
by: Gu, Yuchao, et al.
Published: (2025)

ShowUI-$π$: Flow-based Generative Models as GUI Dexterous Hands
by: Hu, Siyuan, et al.
Published: (2025)

Bridging Information Asymmetry in Text-video Retrieval: A Data-centric Approach
by: Bai, Zechen, et al.
Published: (2024)

Code2Video: A Code-centric Paradigm for Educational Video Generation
by: Chen, Yanzhe, et al.
Published: (2025)

TPDiff: Temporal Pyramid Video Diffusion Model
by: Ran, Lingmin, et al.
Published: (2025)

D-AR: Diffusion via Autoregressive Models
by: Gao, Ziteng, et al.
Published: (2025)

The Invisible EgoHand: 3D Hand Forecasting through EgoBody Pose Estimation
by: Hatano, Masashi, et al.
Published: (2025)

PCIE_EgoHandPose Solution for EgoExo4D Hand Pose Challenge
by: Chen, Feng, et al.
Published: (2024)

Encore: Conditioning Trajectory Forecasting via Biased Ego Rehearsals
by: Wong, Conghao, et al.
Published: (2026)

Ego3DT: Tracking Every 3D Object in Ego-centric Videos
by: Hao, Shengyu, et al.
Published: (2024)

TrajMamba: An Ego-Motion-Guided Mamba Model for Pedestrian Trajectory Prediction from an Egocentric Perspective
by: Peng, Yusheng, et al.
Published: (2026)

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
by: Huang, Yifei, et al.
Published: (2024)

VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
by: Lin, Kevin Qinghong, et al.
Published: (2025)

EgoHandICL: Egocentric 3D Hand Reconstruction with In-Context Learning
by: Xie, Binzhu, et al.
Published: (2026)

Leveraging Driver Field-of-View for Multimodal Ego-Trajectory Prediction
by: Akbiyik, M. Eren, et al.
Published: (2023)

Optimizing Ego Vehicle Trajectory Prediction: The Graph Enhancement Approach
by: Sharma, Sushil, et al.
Published: (2023)

EgoNav: Egocentric Scene-aware Human Trajectory Prediction
by: Wang, Weizhuo, et al.
Published: (2024)

P-Flow: Prompting Visual Effects Generation
by: Zhao, Rui, et al.
Published: (2026)

Show-o2: Improved Native Unified Multimodal Models
by: Xie, Jinheng, et al.
Published: (2025)

DiffSim: Taming Diffusion Models for Evaluating Visual Similarity
by: Song, Yiren, et al.
Published: (2024)

Edit2Perceive: Image Editing Diffusion Models Are Strong Dense Perceivers
by: Shi, Yiqing, et al.
Published: (2025)

DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
by: Zhao, Rui, et al.
Published: (2025)

UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
by: Mao, Weijia, et al.
Published: (2025)

SAM-I2V: Upgrading SAM to Support Promptable Video Segmentation with Less than 0.2% Training Cost
by: Mei, Haiyang, et al.
Published: (2025)

Estimating Body and Hand Motion in an Ego-sensed World
by: Yi, Brent, et al.
Published: (2024)

Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions
by: Zhang, David Junhao, et al.
Published: (2024)

DEGround: An Effective Baseline for Ego-centric 3D Visual Grounding with a Homogeneous Framework
by: Zhang, Yani, et al.
Published: (2025)

EventEgoHands: Event-based Egocentric 3D Hand Mesh Reconstruction
by: Hara, Ryosei, et al.
Published: (2025)

EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models
by: Bai, Zechen, et al.
Published: (2025)

Novel Diffusion Models for Multimodal 3D Hand Trajectory Prediction
by: Ma, Junyi, et al.
Published: (2025)

EgoTraj-Bench: Towards Robust Trajectory Prediction Under Ego-view Noisy Observations
by: Liu, Jiayi, et al.
Published: (2025)

EgoSpot:Egocentric Multimodal Control for Hands-Free Mobile Manipulation
by: Zhang, Ganlin, et al.
Published: (2023)

Multi-human Interactive Talking Dataset
by: Zhu, Zeyu, et al.
Published: (2025)

PANDA: Towards Generalist Video Anomaly Detection via Agentic AI Engineer
by: Yang, Zhiwei, et al.
Published: (2025)

MakeAnything: Harnessing Diffusion Transformers for Multi-Domain Procedural Sequence Generation
by: Song, Yiren, et al.
Published: (2025)

Impossible Videos
by: Bai, Zechen, et al.
Published: (2025)

OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data
by: Song, Yiren, et al.
Published: (2025)

LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer
by: Song, Yiren, et al.
Published: (2025)

Automated Movie Generation via Multi-Agent CoT Planning
by: Wu, Weijia, et al.
Published: (2025)