Saved in:
| Main Authors: | Galoaa, Bishoy, Moezzi, Shayda, Bai, Xiangyu, Ostadabbas, Sarah |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.18856 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MoReGen: Multi-Agent Motion-Reasoning Engine for Code-based Text-to-Video Synthesis
by: Bai, Xiangyu, et al.
Published: (2025)
by: Bai, Xiangyu, et al.
Published: (2025)
PanoWorld: Geometry-Consistent Panoramic Video World Modeling
by: Jiang, Le, et al.
Published: (2026)
by: Jiang, Le, et al.
Published: (2026)
Structure Over Scale: Learning Visual Reasoning from Pedagogical Video
by: Galoaa, Bishoy, et al.
Published: (2026)
by: Galoaa, Bishoy, et al.
Published: (2026)
Lang2Motion: Bridging Language and Motion through Joint Embedding Spaces
by: Galoaa, Bishoy, et al.
Published: (2025)
by: Galoaa, Bishoy, et al.
Published: (2025)
Track and Caption Any Motion: Query-Free Motion Discovery and Description in Videos
by: Galoaa, Bishoy, et al.
Published: (2025)
by: Galoaa, Bishoy, et al.
Published: (2025)
HORNet: Task-Guided Frame Selection for Video Question Answering with Vision-Language Models
by: Bai, Xiangyu, et al.
Published: (2026)
by: Bai, Xiangyu, et al.
Published: (2026)
Look Around and Pay Attention: Multi-camera Point Tracking Reimagined with Transformers
by: Galoaa, Bishoy, et al.
Published: (2025)
by: Galoaa, Bishoy, et al.
Published: (2025)
K-Track: Kalman-Enhanced Tracking for Accelerating Deep Point Trackers on Edge Devices
by: Galoaa, Bishoy, et al.
Published: (2025)
by: Galoaa, Bishoy, et al.
Published: (2025)
Broadening View Synthesis of Dynamic Scenes from Constrained Monocular Videos
by: Jiang, Le, et al.
Published: (2025)
by: Jiang, Le, et al.
Published: (2025)
UniTrack: Differentiable Graph Representation Learning for Multi-Object Tracking
by: Galoaa, Bishoy, et al.
Published: (2026)
by: Galoaa, Bishoy, et al.
Published: (2026)
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
by: Deng, Andong, et al.
Published: (2024)
by: Deng, Andong, et al.
Published: (2024)
Physics-Grounded Motion Forecasting via Equation Discovery for Trajectory-Guided Image-to-Video Generation
by: Feng, Tao, et al.
Published: (2025)
by: Feng, Tao, et al.
Published: (2025)
PhyGround: Benchmarking Physical Reasoning in Generative World Models
by: Lin, Juyi, et al.
Published: (2026)
by: Lin, Juyi, et al.
Published: (2026)
Think with Grounding: Curriculum Reinforced Reasoning with Video Grounding for Long Video Understanding
by: Chen, Houlun, et al.
Published: (2026)
by: Chen, Houlun, et al.
Published: (2026)
Open-o3-Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence
by: Meng, Jiahao, et al.
Published: (2025)
by: Meng, Jiahao, et al.
Published: (2025)
PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation
by: Huang, Yidong, et al.
Published: (2026)
by: Huang, Yidong, et al.
Published: (2026)
Motion Dreamer: Boundary Conditional Motion Reasoning for Physically Coherent Video Generation
by: Xu, Tianshuo, et al.
Published: (2024)
by: Xu, Tianshuo, et al.
Published: (2024)
Commonsense Video Question Answering through Video-Grounded Entailment Tree Reasoning
by: Liu, Huabin, et al.
Published: (2025)
by: Liu, Huabin, et al.
Published: (2025)
PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation
by: Xue, Qiyao, et al.
Published: (2024)
by: Xue, Qiyao, et al.
Published: (2024)
Flowing from Reasoning to Motion: Learning 3D Hand Trajectory Prediction from Egocentric Human Interaction Videos
by: Chen, Mingfei, et al.
Published: (2025)
by: Chen, Mingfei, et al.
Published: (2025)
When Thinking Drifts: Evidential Grounding for Robust Video Reasoning
by: Luo, Mi, et al.
Published: (2025)
by: Luo, Mi, et al.
Published: (2025)
VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning
by: Liu, Ye, et al.
Published: (2025)
by: Liu, Ye, et al.
Published: (2025)
VideoTemp-o3: Harmonizing Temporal Grounding and Video Understanding in Agentic Thinking-with-Videos
by: Liu, Wenqi, et al.
Published: (2026)
by: Liu, Wenqi, et al.
Published: (2026)
VisionCoach: Reinforcing Grounded Video Reasoning via Visual-Perception Prompting
by: Lee, Daeun, et al.
Published: (2026)
by: Lee, Daeun, et al.
Published: (2026)
MUPA: Towards Multi-Path Agentic Reasoning for Grounded Video Question Answering
by: Dang, Jisheng, et al.
Published: (2025)
by: Dang, Jisheng, et al.
Published: (2025)
VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation
by: Yu, Shoubin, et al.
Published: (2025)
by: Yu, Shoubin, et al.
Published: (2025)
Joint Flow Trajectory Optimization For Feasible Robot Motion Generation from Video Demonstrations
by: Dong, Xiaoxiang, et al.
Published: (2025)
by: Dong, Xiaoxiang, et al.
Published: (2025)
WorldReel: 4D Video Generation with Consistent Geometry and Motion Modeling
by: Fang, Shaoheng, et al.
Published: (2025)
by: Fang, Shaoheng, et al.
Published: (2025)
Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation
by: Yariv, Guy, et al.
Published: (2025)
by: Yariv, Guy, et al.
Published: (2025)
TAR-TVG: Enhancing VLMs with Timestamp Anchor-Constrained Reasoning for Temporal Video Grounding
by: Guo, Chaohong, et al.
Published: (2025)
by: Guo, Chaohong, et al.
Published: (2025)
MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs
by: Yuan, Jiakang, et al.
Published: (2025)
by: Yuan, Jiakang, et al.
Published: (2025)
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory
by: Zheng, Chenhao, et al.
Published: (2025)
by: Zheng, Chenhao, et al.
Published: (2025)
3DAxisPrompt: Promoting the 3D Grounding and Reasoning in GPT-4o
by: Liu, Dingning, et al.
Published: (2025)
by: Liu, Dingning, et al.
Published: (2025)
Coordinating Multiple Conditions for Trajectory-Controlled Human Motion Generation
by: Cai, Deli, et al.
Published: (2026)
by: Cai, Deli, et al.
Published: (2026)
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
by: Wang, Haibo, et al.
Published: (2024)
by: Wang, Haibo, et al.
Published: (2024)
FlashMotion: Few-Step Controllable Video Generation with Trajectory Guidance
by: Li, Quanhao, et al.
Published: (2026)
by: Li, Quanhao, et al.
Published: (2026)
MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance
by: Li, Quanhao, et al.
Published: (2025)
by: Li, Quanhao, et al.
Published: (2025)
Enhancing Bandwidth Efficiency for Video Motion Transfer Applications using Deep Learning Based Keypoint Prediction
by: Bai, Xue, et al.
Published: (2024)
by: Bai, Xue, et al.
Published: (2024)
IVR-R1: Refining Trajectories through Iterative Visual-Grounded Reasoning in Reinforcement Learning
by: Li, Chenghao, et al.
Published: (2026)
by: Li, Chenghao, et al.
Published: (2026)
Towards Efficient Real-Time Video Motion Transfer via Generative Time Series Modeling
by: Haque, Tasmiah, et al.
Published: (2025)
by: Haque, Tasmiah, et al.
Published: (2025)
Similar Items
-
MoReGen: Multi-Agent Motion-Reasoning Engine for Code-based Text-to-Video Synthesis
by: Bai, Xiangyu, et al.
Published: (2025) -
PanoWorld: Geometry-Consistent Panoramic Video World Modeling
by: Jiang, Le, et al.
Published: (2026) -
Structure Over Scale: Learning Visual Reasoning from Pedagogical Video
by: Galoaa, Bishoy, et al.
Published: (2026) -
Lang2Motion: Bridging Language and Motion through Joint Embedding Spaces
by: Galoaa, Bishoy, et al.
Published: (2025) -
Track and Caption Any Motion: Query-Free Motion Discovery and Description in Videos
by: Galoaa, Bishoy, et al.
Published: (2025)