Saved in:
| Main Authors: | Galoaa, Bishoy, Bai, Xiangyu, Ostadabbas, Sarah |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.10617 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Track and Caption Any Motion: Query-Free Motion Discovery and Description in Videos
by: Galoaa, Bishoy, et al.
Published: (2025)
by: Galoaa, Bishoy, et al.
Published: (2025)
Motion-o: Trajectory-Grounded Video Reasoning
by: Galoaa, Bishoy, et al.
Published: (2026)
by: Galoaa, Bishoy, et al.
Published: (2026)
HORNet: Task-Guided Frame Selection for Video Question Answering with Vision-Language Models
by: Bai, Xiangyu, et al.
Published: (2026)
by: Bai, Xiangyu, et al.
Published: (2026)
Structure Over Scale: Learning Visual Reasoning from Pedagogical Video
by: Galoaa, Bishoy, et al.
Published: (2026)
by: Galoaa, Bishoy, et al.
Published: (2026)
K-Track: Kalman-Enhanced Tracking for Accelerating Deep Point Trackers on Edge Devices
by: Galoaa, Bishoy, et al.
Published: (2025)
by: Galoaa, Bishoy, et al.
Published: (2025)
MoReGen: Multi-Agent Motion-Reasoning Engine for Code-based Text-to-Video Synthesis
by: Bai, Xiangyu, et al.
Published: (2025)
by: Bai, Xiangyu, et al.
Published: (2025)
UniTrack: Differentiable Graph Representation Learning for Multi-Object Tracking
by: Galoaa, Bishoy, et al.
Published: (2026)
by: Galoaa, Bishoy, et al.
Published: (2026)
Look Around and Pay Attention: Multi-camera Point Tracking Reimagined with Transformers
by: Galoaa, Bishoy, et al.
Published: (2025)
by: Galoaa, Bishoy, et al.
Published: (2025)
PanoWorld: Geometry-Consistent Panoramic Video World Modeling
by: Jiang, Le, et al.
Published: (2026)
by: Jiang, Le, et al.
Published: (2026)
LangBridge: Interpreting Image as a Combination of Language Embeddings
by: Liao, Jiaqi, et al.
Published: (2025)
by: Liao, Jiaqi, et al.
Published: (2025)
Tri-Modal Motion Retrieval by Learning a Joint Embedding Space
by: Yin, Kangning, et al.
Published: (2024)
by: Yin, Kangning, et al.
Published: (2024)
Multi-Modal Motion Retrieval by Learning a Fine-Grained Joint Embedding Space
by: Yu, Shiyao, et al.
Published: (2025)
by: Yu, Shiyao, et al.
Published: (2025)
JointMotion: Joint Self-Supervision for Joint Motion Prediction
by: Wagner, Royden, et al.
Published: (2024)
by: Wagner, Royden, et al.
Published: (2024)
LangPose: Language-Aligned Motion for Robust 3D Human Pose Estimation
by: Liao, Longyun, et al.
Published: (2024)
by: Liao, Longyun, et al.
Published: (2024)
MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding
by: Wang, Yuan, et al.
Published: (2024)
by: Wang, Yuan, et al.
Published: (2024)
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models
by: Chefer, Hila, et al.
Published: (2025)
by: Chefer, Hila, et al.
Published: (2025)
SymphoMotion: Joint Control of Camera Motion and Object Dynamics for Coherent Video Generation
by: Zhang, Guiyu, et al.
Published: (2026)
by: Zhang, Guiyu, et al.
Published: (2026)
Broadening View Synthesis of Dynamic Scenes from Constrained Monocular Videos
by: Jiang, Le, et al.
Published: (2025)
by: Jiang, Le, et al.
Published: (2025)
Holistic-Motion2D: Scalable Whole-body Human Motion Generation in 2D Space
by: Wang, Yuan, et al.
Published: (2024)
by: Wang, Yuan, et al.
Published: (2024)
Discriminately Treating Motion Components Evolves Joint Depth and Ego-Motion Learning
by: Zhang, Mengtan, et al.
Published: (2025)
by: Zhang, Mengtan, et al.
Published: (2025)
LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding
by: Li, Hao, et al.
Published: (2024)
by: Li, Hao, et al.
Published: (2024)
Motion2Motion: Cross-topology Motion Transfer with Sparse Correspondence
by: Chen, Ling-Hao, et al.
Published: (2025)
by: Chen, Ling-Hao, et al.
Published: (2025)
Joint-Motion Mutual Learning for Pose Estimation in Videos
by: Wu, Sifan, et al.
Published: (2024)
by: Wu, Sifan, et al.
Published: (2024)
MoLingo: Motion-Language Alignment for Text-to-Motion Generation
by: He, Yannan, et al.
Published: (2025)
by: He, Yannan, et al.
Published: (2025)
Exploring Motion-Language Alignment for Text-driven Motion Generation
by: Gu, Ruxi, et al.
Published: (2026)
by: Gu, Ruxi, et al.
Published: (2026)
Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing
by: Zuo, Yi, et al.
Published: (2024)
by: Zuo, Yi, et al.
Published: (2024)
MotionBridge: Dynamic Video Inbetweening with Flexible Controls
by: Tanveer, Maham, et al.
Published: (2024)
by: Tanveer, Maham, et al.
Published: (2024)
JointTuner: Appearance-Motion Adaptive Joint Training for Customized Video Generation
by: Chen, Fangda, et al.
Published: (2025)
by: Chen, Fangda, et al.
Published: (2025)
FlowFeat: Pixel-Dense Embedding of Motion Profiles
by: Araslanov, Nikita, et al.
Published: (2025)
by: Araslanov, Nikita, et al.
Published: (2025)
Generative Motion Stylization of Cross-structure Characters within Canonical Motion Space
by: Zhang, Jiaxu, et al.
Published: (2024)
by: Zhang, Jiaxu, et al.
Published: (2024)
LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning
by: Li, Zhe, et al.
Published: (2024)
by: Li, Zhe, et al.
Published: (2024)
The Language of Motion: Unifying Verbal and Non-verbal Language of 3D Human Motion
by: Chen, Changan, et al.
Published: (2024)
by: Chen, Changan, et al.
Published: (2024)
Generative Human Motion Stylization in Latent Space
by: Guo, Chuan, et al.
Published: (2024)
by: Guo, Chuan, et al.
Published: (2024)
Tora2: Motion and Appearance Customized Diffusion Transformer for Multi-Entity Video Generation
by: Zhang, Zhenghao, et al.
Published: (2025)
by: Zhang, Zhenghao, et al.
Published: (2025)
Controllable Long-term Motion Generation with Extended Joint Targets
by: Lee, Eunjong, et al.
Published: (2025)
by: Lee, Eunjong, et al.
Published: (2025)
IAM: Identity-Aware Human Motion and Shape Joint Generation
by: Jia, Wenqi, et al.
Published: (2026)
by: Jia, Wenqi, et al.
Published: (2026)
Modelling the Distribution of Human Motion for Sign Language Assessment
by: Cory, Oliver, et al.
Published: (2024)
by: Cory, Oliver, et al.
Published: (2024)
Motion-adaptive Separable Collaborative Filters for Blind Motion Deblurring
by: Liu, Chengxu, et al.
Published: (2024)
by: Liu, Chengxu, et al.
Published: (2024)
MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space
by: Xiao, Lixing, et al.
Published: (2025)
by: Xiao, Lixing, et al.
Published: (2025)
EgoMotion: Hierarchical Reasoning and Diffusion for Egocentric Vision-Language Motion Generation
by: Hou, Ruibing, et al.
Published: (2026)
by: Hou, Ruibing, et al.
Published: (2026)
Similar Items
-
Track and Caption Any Motion: Query-Free Motion Discovery and Description in Videos
by: Galoaa, Bishoy, et al.
Published: (2025) -
Motion-o: Trajectory-Grounded Video Reasoning
by: Galoaa, Bishoy, et al.
Published: (2026) -
HORNet: Task-Guided Frame Selection for Video Question Answering with Vision-Language Models
by: Bai, Xiangyu, et al.
Published: (2026) -
Structure Over Scale: Learning Visual Reasoning from Pedagogical Video
by: Galoaa, Bishoy, et al.
Published: (2026) -
K-Track: Kalman-Enhanced Tracking for Accelerating Deep Point Trackers on Edge Devices
by: Galoaa, Bishoy, et al.
Published: (2025)