Saved in:
| Main Authors: | Galoaa, Bishoy, Ostadabbas, Sarah |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.10607 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Lang2Motion: Bridging Language and Motion through Joint Embedding Spaces
by: Galoaa, Bishoy, et al.
Published: (2025)
by: Galoaa, Bishoy, et al.
Published: (2025)
Motion-o: Trajectory-Grounded Video Reasoning
by: Galoaa, Bishoy, et al.
Published: (2026)
by: Galoaa, Bishoy, et al.
Published: (2026)
K-Track: Kalman-Enhanced Tracking for Accelerating Deep Point Trackers on Edge Devices
by: Galoaa, Bishoy, et al.
Published: (2025)
by: Galoaa, Bishoy, et al.
Published: (2025)
Structure Over Scale: Learning Visual Reasoning from Pedagogical Video
by: Galoaa, Bishoy, et al.
Published: (2026)
by: Galoaa, Bishoy, et al.
Published: (2026)
HORNet: Task-Guided Frame Selection for Video Question Answering with Vision-Language Models
by: Bai, Xiangyu, et al.
Published: (2026)
by: Bai, Xiangyu, et al.
Published: (2026)
MoReGen: Multi-Agent Motion-Reasoning Engine for Code-based Text-to-Video Synthesis
by: Bai, Xiangyu, et al.
Published: (2025)
by: Bai, Xiangyu, et al.
Published: (2025)
UniTrack: Differentiable Graph Representation Learning for Multi-Object Tracking
by: Galoaa, Bishoy, et al.
Published: (2026)
by: Galoaa, Bishoy, et al.
Published: (2026)
Look Around and Pay Attention: Multi-camera Point Tracking Reimagined with Transformers
by: Galoaa, Bishoy, et al.
Published: (2025)
by: Galoaa, Bishoy, et al.
Published: (2025)
PanoWorld: Geometry-Consistent Panoramic Video World Modeling
by: Jiang, Le, et al.
Published: (2026)
by: Jiang, Le, et al.
Published: (2026)
Segment Any Motion in Videos
by: Huang, Nan, et al.
Published: (2025)
by: Huang, Nan, et al.
Published: (2025)
Motion Anything: Any to Motion Generation
by: Zhang, Zeyu, et al.
Published: (2025)
by: Zhang, Zeyu, et al.
Published: (2025)
Motion by Queries: Identity-Motion Trade-offs in Text-to-Video Generation
by: Atzmon, Yuval, et al.
Published: (2024)
by: Atzmon, Yuval, et al.
Published: (2024)
Encoder-Free Human Motion Understanding via Structured Motion Descriptions
by: Zhang, Yao, et al.
Published: (2026)
by: Zhang, Yao, et al.
Published: (2026)
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation
by: Wu, Shengqiong, et al.
Published: (2025)
by: Wu, Shengqiong, et al.
Published: (2025)
Towards Fine-Grained Human Motion Video Captioning
by: Song, Guorui, et al.
Published: (2025)
by: Song, Guorui, et al.
Published: (2025)
Guided Attention for Interpretable Motion Captioning
by: Radouane, Karim, et al.
Published: (2023)
by: Radouane, Karim, et al.
Published: (2023)
LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning
by: Li, Zhe, et al.
Published: (2024)
by: Li, Zhe, et al.
Published: (2024)
MotionClone: Training-Free Motion Cloning for Controllable Video Generation
by: Ling, Pengyang, et al.
Published: (2024)
by: Ling, Pengyang, et al.
Published: (2024)
FlowMotion: Training-Free Flow Guidance for Video Motion Transfer
by: Wang, Zhen, et al.
Published: (2026)
by: Wang, Zhen, et al.
Published: (2026)
Broadening View Synthesis of Dynamic Scenes from Constrained Monocular Videos
by: Jiang, Le, et al.
Published: (2025)
by: Jiang, Le, et al.
Published: (2025)
AnyAct: Towards Human Reenactment of Character Motion From Video
by: Chen, Liuhan, et al.
Published: (2026)
by: Chen, Liuhan, et al.
Published: (2026)
MotionTrack: Learning Motion Predictor for Multiple Object Tracking
by: Xiao, Changcheng, et al.
Published: (2023)
by: Xiao, Changcheng, et al.
Published: (2023)
Towards Understanding Camera Motions in Any Video
by: Lin, Zhiqiu, et al.
Published: (2025)
by: Lin, Zhiqiu, et al.
Published: (2025)
OwlCap: Harmonizing Motion-Detail for Video Captioning via HMD-270K and Caption Set Equivalence Reward
by: Zhong, Chunlin, et al.
Published: (2025)
by: Zhong, Chunlin, et al.
Published: (2025)
Distinguish Any Fake Videos: Unleashing the Power of Large-scale Data and Motion Features
by: Ji, Lichuan, et al.
Published: (2024)
by: Ji, Lichuan, et al.
Published: (2024)
AnyI2V: Animating Any Conditional Image with Motion Control
by: Li, Ziye, et al.
Published: (2025)
by: Li, Ziye, et al.
Published: (2025)
DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild
by: Ye, Weicai, et al.
Published: (2024)
by: Ye, Weicai, et al.
Published: (2024)
VoCap: Video Object Captioning and Segmentation from Any Prompt
by: Uijlings, Jasper, et al.
Published: (2025)
by: Uijlings, Jasper, et al.
Published: (2025)
Beyond Caption-Based Queries for Video Moment Retrieval
by: Pujol-Perich, David, et al.
Published: (2026)
by: Pujol-Perich, David, et al.
Published: (2026)
Generative Video Motion Editing with 3D Point Tracks
by: Lee, Yao-Chih, et al.
Published: (2025)
by: Lee, Yao-Chih, et al.
Published: (2025)
AnyLift: Scaling Motion Reconstruction from Internet Videos via 2D Diffusion
by: Li, Hongjie, et al.
Published: (2026)
by: Li, Hongjie, et al.
Published: (2026)
OmniControl: Control Any Joint at Any Time for Human Motion Generation
by: Xie, Yiming, et al.
Published: (2023)
by: Xie, Yiming, et al.
Published: (2023)
MoCHA: Denoising Caption Supervision for Motion-Text Retrieval
by: Warner, Nikolai, et al.
Published: (2026)
by: Warner, Nikolai, et al.
Published: (2026)
Training-Free Motion-Guided Video Generation with Enhanced Temporal Consistency Using Motion Consistency Loss
by: Zhang, Xinyu, et al.
Published: (2025)
by: Zhang, Xinyu, et al.
Published: (2025)
Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer
by: Shi, Qingyu, et al.
Published: (2025)
by: Shi, Qingyu, et al.
Published: (2025)
Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation
by: Fei, Yang, et al.
Published: (2025)
by: Fei, Yang, et al.
Published: (2025)
Transformer with Controlled Attention for Synchronous Motion Captioning
by: Radouane, Karim, et al.
Published: (2024)
by: Radouane, Karim, et al.
Published: (2024)
AnyMo: Scaling Any-Modality Conditional Motion Generation with Masked Modeling
by: Li, Yiheng, et al.
Published: (2026)
by: Li, Yiheng, et al.
Published: (2026)
Motion Prompting: Controlling Video Generation with Motion Trajectories
by: Geng, Daniel, et al.
Published: (2024)
by: Geng, Daniel, et al.
Published: (2024)
FingerCap: Fine-grained Finger-level Hand Motion Captioning
by: Shen, Xin, et al.
Published: (2025)
by: Shen, Xin, et al.
Published: (2025)
Similar Items
-
Lang2Motion: Bridging Language and Motion through Joint Embedding Spaces
by: Galoaa, Bishoy, et al.
Published: (2025) -
Motion-o: Trajectory-Grounded Video Reasoning
by: Galoaa, Bishoy, et al.
Published: (2026) -
K-Track: Kalman-Enhanced Tracking for Accelerating Deep Point Trackers on Edge Devices
by: Galoaa, Bishoy, et al.
Published: (2025) -
Structure Over Scale: Learning Visual Reasoning from Pedagogical Video
by: Galoaa, Bishoy, et al.
Published: (2026) -
HORNet: Task-Guided Frame Selection for Video Question Answering with Vision-Language Models
by: Bai, Xiangyu, et al.
Published: (2026)