:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Etaat, Daniel, Kalaria, Dvij, Rahmanian, Nima, Sastry, Shankar
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2503.20936
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

TT4D: A Pipeline and Dataset for Table Tennis 4D Reconstruction From Monocular Videos
by: Rahmanian, Nima, et al.
Published: (2026)

Real-time Accident Anticipation for Autonomous Driving Through Monocular Depth-Enhanced 3D Modeling
by: Liao, Haicheng, et al.
Published: (2024)

Towards Ball Spin and Trajectory Analysis in Table Tennis Broadcast Videos via Physically Grounded Synthetic-to-Real Transfer
by: Kienzle, Daniel, et al.
Published: (2025)

On Moving Object Segmentation from Monocular Video with Transformers
by: Homeyer, Christian, et al.
Published: (2024)

α-RACER: Real-Time Algorithm for Game-Theoretic Motion Planning and Control in Autonomous Racing using Near-Potential Function
by: Kalaria, Dvij, et al.
Published: (2024)

GFlow: Recovering 4D World from Monocular Video
by: Wang, Shizun, et al.
Published: (2024)

Towards Scene Graph Anticipation
by: Peddi, Rohith, et al.
Published: (2024)

VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model
by: Zuo, Qi, et al.
Published: (2024)

MV-S2V: Multi-View Subject-Consistent Video Generation
by: Song, Ziyang, et al.
Published: (2026)

Rethinking Video Human-Object Interaction: Set Prediction over Time for Unified Detection and Anticipation
by: Luo, Yuanhao, et al.
Published: (2026)

SpatialDreamer: Self-supervised Stereo Video Synthesis from Monocular Input
by: Lv, Zhen, et al.
Published: (2024)

iHuman: Instant Animatable Digital Humans From Monocular Videos
by: Paudel, Pramish, et al.
Published: (2024)

Automated Tennis Player and Ball Tracking with Court Keypoints Detection (Hawk Eye System)
by: Desu, Venkata Manikanta, et al.
Published: (2025)

Semantically Guided Representation Learning For Action Anticipation
by: Diko, Anxhelo, et al.
Published: (2024)

VideoArtGS: Building Digital Twins of Articulated Objects from Monocular Video
by: Liu, Yu, et al.
Published: (2025)

Endo3R: Unified Online Reconstruction from Dynamic Monocular Endoscopic Video
by: Guo, Jiaxin, et al.
Published: (2025)

MV-RAG: Retrieval Augmented Multiview Diffusion
by: Dayani, Yosef, et al.
Published: (2025)

Short-term Object Interaction Anticipation with Disentangled Object Detection @ Ego4D Short Term Object Interaction Anticipation Challenge
by: Cho, Hyunjin, et al.
Published: (2024)

Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos
by: Liang, Hanxue, et al.
Published: (2024)

Semantically Guided Action Anticipation
by: Diko, Anxhelo, et al.
Published: (2024)

MV-Swin-T: Mammogram Classification with Multi-view Swin Transformer
by: Sarker, Sushmita, et al.
Published: (2024)

4D3R: Motion-Aware Neural Reconstruction and Rendering of Dynamic Scenes from Monocular Videos
by: Guo, Mengqi, et al.
Published: (2025)

Tamaththul3D: High-Fidelity 3D Saudi Sign Language Avatars from Monocular Video
by: Alghamdi, Eyad, et al.
Published: (2026)

Predict and Resist: Long-Term Accident Anticipation under Sensor Noise
by: Liu, Xingcheng, et al.
Published: (2025)

Intention-Guided Cognitive Reasoning for Egocentric Long-Term Action Anticipation
by: Chu, Qiaohui, et al.
Published: (2025)

GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion
by: Tang, Jiapeng, et al.
Published: (2024)

The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise
by: Ban, Yuanhao, et al.
Published: (2024)

EQ-TAA: Equivariant Traffic Accident Anticipation via Diffusion-Based Accident Video Synthesis
by: Fang, Jianwu, et al.
Published: (2025)

LATTE: Latent Trajectory Embedding for Diffusion-Generated Image Detection
by: Vasilcoiu, Ana, et al.
Published: (2025)

TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models
by: YU, Mark, et al.
Published: (2025)

MsFIN: Multi-scale Feature Interaction Network for Traffic Accident Anticipation
by: Wu, Tongshuai, et al.
Published: (2025)

Focusable Monocular Depth Estimation
by: Du, Yuxin, et al.
Published: (2026)

GGAvatar: Reconstructing Garment-Separated 3D Gaussian Splatting Avatars from Monocular Video
by: Chen, Jingxuan
Published: (2024)

DiffVAS: Diffusion-Guided Visual Active Search in Partially Observable Environments
by: Sarkar, Anindya, et al.
Published: (2026)

GeoSAM-3D: Geodesic Prompt Propagation for Open-Vocabulary 3D Scene Segmentation from Monocular Video
by: Sharma, Arun
Published: (2026)

LATTE: Learning to Think with Vision Specialists
by: Ma, Zixian, et al.
Published: (2024)

Comparing Learning Paradigms for Egocentric Video Summarization
by: Wen, Daniel
Published: (2025)

Technical Report for Ego4D Long-Term Action Anticipation Challenge 2025
by: Chu, Qiaohui, et al.
Published: (2025)

MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds
by: Tang, Zhenggang, et al.
Published: (2024)

MV-MOS: Multi-View Feature Fusion for 3D Moving Object Segmentation
by: Cheng, Jintao, et al.
Published: (2024)