:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Galoaa, Bishoy, Moezzi, Shayda, Bai, Xiangyu, Ostadabbas, Sarah
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.18856
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MoReGen: Multi-Agent Motion-Reasoning Engine for Code-based Text-to-Video Synthesis
by: Bai, Xiangyu, et al.
Published: (2025)

PanoWorld: Geometry-Consistent Panoramic Video World Modeling
by: Jiang, Le, et al.
Published: (2026)

Structure Over Scale: Learning Visual Reasoning from Pedagogical Video
by: Galoaa, Bishoy, et al.
Published: (2026)

Lang2Motion: Bridging Language and Motion through Joint Embedding Spaces
by: Galoaa, Bishoy, et al.
Published: (2025)

Track and Caption Any Motion: Query-Free Motion Discovery and Description in Videos
by: Galoaa, Bishoy, et al.
Published: (2025)

HORNet: Task-Guided Frame Selection for Video Question Answering with Vision-Language Models
by: Bai, Xiangyu, et al.
Published: (2026)

Look Around and Pay Attention: Multi-camera Point Tracking Reimagined with Transformers
by: Galoaa, Bishoy, et al.
Published: (2025)

K-Track: Kalman-Enhanced Tracking for Accelerating Deep Point Trackers on Edge Devices
by: Galoaa, Bishoy, et al.
Published: (2025)

Broadening View Synthesis of Dynamic Scenes from Constrained Monocular Videos
by: Jiang, Le, et al.
Published: (2025)

UniTrack: Differentiable Graph Representation Learning for Multi-Object Tracking
by: Galoaa, Bishoy, et al.
Published: (2026)

Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
by: Deng, Andong, et al.
Published: (2024)

Physics-Grounded Motion Forecasting via Equation Discovery for Trajectory-Guided Image-to-Video Generation
by: Feng, Tao, et al.
Published: (2025)

PhyGround: Benchmarking Physical Reasoning in Generative World Models
by: Lin, Juyi, et al.
Published: (2026)

Think with Grounding: Curriculum Reinforced Reasoning with Video Grounding for Long Video Understanding
by: Chen, Houlun, et al.
Published: (2026)

Open-o3-Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence
by: Meng, Jiahao, et al.
Published: (2025)

PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation
by: Huang, Yidong, et al.
Published: (2026)

Motion Dreamer: Boundary Conditional Motion Reasoning for Physically Coherent Video Generation
by: Xu, Tianshuo, et al.
Published: (2024)

Commonsense Video Question Answering through Video-Grounded Entailment Tree Reasoning
by: Liu, Huabin, et al.
Published: (2025)

PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation
by: Xue, Qiyao, et al.
Published: (2024)

Flowing from Reasoning to Motion: Learning 3D Hand Trajectory Prediction from Egocentric Human Interaction Videos
by: Chen, Mingfei, et al.
Published: (2025)

When Thinking Drifts: Evidential Grounding for Robust Video Reasoning
by: Luo, Mi, et al.
Published: (2025)

VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning
by: Liu, Ye, et al.
Published: (2025)

VideoTemp-o3: Harmonizing Temporal Grounding and Video Understanding in Agentic Thinking-with-Videos
by: Liu, Wenqi, et al.
Published: (2026)

VisionCoach: Reinforcing Grounded Video Reasoning via Visual-Perception Prompting
by: Lee, Daeun, et al.
Published: (2026)

MUPA: Towards Multi-Path Agentic Reasoning for Grounded Video Question Answering
by: Dang, Jisheng, et al.
Published: (2025)

VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation
by: Yu, Shoubin, et al.
Published: (2025)

Joint Flow Trajectory Optimization For Feasible Robot Motion Generation from Video Demonstrations
by: Dong, Xiaoxiang, et al.
Published: (2025)

WorldReel: 4D Video Generation with Consistent Geometry and Motion Modeling
by: Fang, Shaoheng, et al.
Published: (2025)

Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation
by: Yariv, Guy, et al.
Published: (2025)

TAR-TVG: Enhancing VLMs with Timestamp Anchor-Constrained Reasoning for Temporal Video Grounding
by: Guo, Chaohong, et al.
Published: (2025)

MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs
by: Yuan, Jiakang, et al.
Published: (2025)

One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory
by: Zheng, Chenhao, et al.
Published: (2025)

3DAxisPrompt: Promoting the 3D Grounding and Reasoning in GPT-4o
by: Liu, Dingning, et al.
Published: (2025)

Coordinating Multiple Conditions for Trajectory-Controlled Human Motion Generation
by: Cai, Deli, et al.
Published: (2026)

Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
by: Wang, Haibo, et al.
Published: (2024)

FlashMotion: Few-Step Controllable Video Generation with Trajectory Guidance
by: Li, Quanhao, et al.
Published: (2026)

MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance
by: Li, Quanhao, et al.
Published: (2025)

Enhancing Bandwidth Efficiency for Video Motion Transfer Applications using Deep Learning Based Keypoint Prediction
by: Bai, Xue, et al.
Published: (2024)

IVR-R1: Refining Trajectories through Iterative Visual-Grounded Reasoning in Reinforcement Learning
by: Li, Chenghao, et al.
Published: (2026)

Towards Efficient Real-Time Video Motion Transfer via Generative Time Series Modeling
by: Haque, Tasmiah, et al.
Published: (2025)