:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Galoaa, Bishoy, Bai, Xiangyu, Ostadabbas, Sarah
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2512.10617
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Track and Caption Any Motion: Query-Free Motion Discovery and Description in Videos
by: Galoaa, Bishoy, et al.
Published: (2025)

Motion-o: Trajectory-Grounded Video Reasoning
by: Galoaa, Bishoy, et al.
Published: (2026)

HORNet: Task-Guided Frame Selection for Video Question Answering with Vision-Language Models
by: Bai, Xiangyu, et al.
Published: (2026)

Structure Over Scale: Learning Visual Reasoning from Pedagogical Video
by: Galoaa, Bishoy, et al.
Published: (2026)

K-Track: Kalman-Enhanced Tracking for Accelerating Deep Point Trackers on Edge Devices
by: Galoaa, Bishoy, et al.
Published: (2025)

MoReGen: Multi-Agent Motion-Reasoning Engine for Code-based Text-to-Video Synthesis
by: Bai, Xiangyu, et al.
Published: (2025)

UniTrack: Differentiable Graph Representation Learning for Multi-Object Tracking
by: Galoaa, Bishoy, et al.
Published: (2026)

Look Around and Pay Attention: Multi-camera Point Tracking Reimagined with Transformers
by: Galoaa, Bishoy, et al.
Published: (2025)

PanoWorld: Geometry-Consistent Panoramic Video World Modeling
by: Jiang, Le, et al.
Published: (2026)

LangBridge: Interpreting Image as a Combination of Language Embeddings
by: Liao, Jiaqi, et al.
Published: (2025)

Tri-Modal Motion Retrieval by Learning a Joint Embedding Space
by: Yin, Kangning, et al.
Published: (2024)

Multi-Modal Motion Retrieval by Learning a Fine-Grained Joint Embedding Space
by: Yu, Shiyao, et al.
Published: (2025)

JointMotion: Joint Self-Supervision for Joint Motion Prediction
by: Wagner, Royden, et al.
Published: (2024)

LangPose: Language-Aligned Motion for Robust 3D Human Pose Estimation
by: Liao, Longyun, et al.
Published: (2024)

MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding
by: Wang, Yuan, et al.
Published: (2024)

VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models
by: Chefer, Hila, et al.
Published: (2025)

SymphoMotion: Joint Control of Camera Motion and Object Dynamics for Coherent Video Generation
by: Zhang, Guiyu, et al.
Published: (2026)

Broadening View Synthesis of Dynamic Scenes from Constrained Monocular Videos
by: Jiang, Le, et al.
Published: (2025)

Holistic-Motion2D: Scalable Whole-body Human Motion Generation in 2D Space
by: Wang, Yuan, et al.
Published: (2024)

Discriminately Treating Motion Components Evolves Joint Depth and Ego-Motion Learning
by: Zhang, Mengtan, et al.
Published: (2025)

LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding
by: Li, Hao, et al.
Published: (2024)

Motion2Motion: Cross-topology Motion Transfer with Sparse Correspondence
by: Chen, Ling-Hao, et al.
Published: (2025)

Joint-Motion Mutual Learning for Pose Estimation in Videos
by: Wu, Sifan, et al.
Published: (2024)

MoLingo: Motion-Language Alignment for Text-to-Motion Generation
by: He, Yannan, et al.
Published: (2025)

Exploring Motion-Language Alignment for Text-driven Motion Generation
by: Gu, Ruxi, et al.
Published: (2026)

Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing
by: Zuo, Yi, et al.
Published: (2024)

MotionBridge: Dynamic Video Inbetweening with Flexible Controls
by: Tanveer, Maham, et al.
Published: (2024)

JointTuner: Appearance-Motion Adaptive Joint Training for Customized Video Generation
by: Chen, Fangda, et al.
Published: (2025)

FlowFeat: Pixel-Dense Embedding of Motion Profiles
by: Araslanov, Nikita, et al.
Published: (2025)

Generative Motion Stylization of Cross-structure Characters within Canonical Motion Space
by: Zhang, Jiaxu, et al.
Published: (2024)

LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning
by: Li, Zhe, et al.
Published: (2024)

The Language of Motion: Unifying Verbal and Non-verbal Language of 3D Human Motion
by: Chen, Changan, et al.
Published: (2024)

Generative Human Motion Stylization in Latent Space
by: Guo, Chuan, et al.
Published: (2024)

Tora2: Motion and Appearance Customized Diffusion Transformer for Multi-Entity Video Generation
by: Zhang, Zhenghao, et al.
Published: (2025)

Controllable Long-term Motion Generation with Extended Joint Targets
by: Lee, Eunjong, et al.
Published: (2025)

IAM: Identity-Aware Human Motion and Shape Joint Generation
by: Jia, Wenqi, et al.
Published: (2026)

Modelling the Distribution of Human Motion for Sign Language Assessment
by: Cory, Oliver, et al.
Published: (2024)

Motion-adaptive Separable Collaborative Filters for Blind Motion Deblurring
by: Liu, Chengxu, et al.
Published: (2024)

MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space
by: Xiao, Lixing, et al.
Published: (2025)

EgoMotion: Hierarchical Reasoning and Diffusion for Egocentric Vision-Language Motion Generation
by: Hou, Ruibing, et al.
Published: (2026)