Saved in:
| Main Authors: | Zhang, Xuling, Zhang, Ziru, Wang, Yuyang, Lee, Lik-hang, Hui, Pan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.00925 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Audio Matters Too! Enhancing Markerless Motion Capture with Audio Signals for String Performance Capture
by: Jin, Yitong, et al.
Published: (2024)
by: Jin, Yitong, et al.
Published: (2024)
StableMoFusion: Towards Robust and Efficient Diffusion-based Motion Generation Framework
by: Huang, Yiheng, et al.
Published: (2024)
by: Huang, Yiheng, et al.
Published: (2024)
MotionPro: A Precise Motion Controller for Image-to-Video Generation
by: Zhang, Zhongwei, et al.
Published: (2025)
by: Zhang, Zhongwei, et al.
Published: (2025)
MMoFusion: Multi-modal Co-Speech Motion Generation with Diffusion Model
by: Wang, Sen, et al.
Published: (2024)
by: Wang, Sen, et al.
Published: (2024)
Where to Focus: Query-Modulated Multimodal Keyframe Selection for Long Video Understanding
by: Wang, Shaoguang, et al.
Published: (2026)
by: Wang, Shaoguang, et al.
Published: (2026)
AMD: Autoregressive Motion Diffusion
by: Han, Bo, et al.
Published: (2023)
by: Han, Bo, et al.
Published: (2023)
TriPSS: A Tri-Modal Keyframe Extraction Framework Using Perceptual, Structural, and Semantic Representations
by: Cakmak, Mert Can, et al.
Published: (2025)
by: Cakmak, Mert Can, et al.
Published: (2025)
Stepwise Schema-Guided Prompting Framework with Parameter Efficient Instruction Tuning for Multimedia Event Extraction
by: Yuan, Xiang, et al.
Published: (2025)
by: Yuan, Xiang, et al.
Published: (2025)
Text-controlled Motion Mamba: Text-Instructed Temporal Grounding of Human Motion
by: Wang, Xinghan, et al.
Published: (2024)
by: Wang, Xinghan, et al.
Published: (2024)
Efficient Sub-pixel Motion Compensation in Learned Video Codecs
by: Ladune, Théo, et al.
Published: (2025)
by: Ladune, Théo, et al.
Published: (2025)
Harmony-Aware Music-driven Motion Synthesis with Perceptual Constraint on UGC Datasets
by: Wu, Xinyi, et al.
Published: (2025)
by: Wu, Xinyi, et al.
Published: (2025)
PP-Motion: Physical-Perceptual Fidelity Evaluation for Human Motion Generation
by: Zhao, Sihan, et al.
Published: (2025)
by: Zhao, Sihan, et al.
Published: (2025)
MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
by: Zhang, Yuang, et al.
Published: (2024)
by: Zhang, Yuang, et al.
Published: (2024)
MotionBeat: Motion-Aligned Music Representation via Embodied Contrastive Learning and Bar-Equivariant Contact-Aware Encoding
by: Wang, Xuanchen, et al.
Published: (2025)
by: Wang, Xuanchen, et al.
Published: (2025)
AV1 Motion Vector Fidelity and Application for Efficient Optical Flow
by: Zouein, Julien, et al.
Published: (2025)
by: Zouein, Julien, et al.
Published: (2025)
Multimodal Cyber-physical Interaction in XR: Hybrid Doctoral Thesis Defense
by: Alhilal, Ahmad, et al.
Published: (2026)
by: Alhilal, Ahmad, et al.
Published: (2026)
KeyVideoLLM: Towards Large-scale Video Keyframe Selection
by: Liang, Hao, et al.
Published: (2024)
by: Liang, Hao, et al.
Published: (2024)
Mesquite MoCap: Democratizing Real-Time Motion Capture with Affordable, Bodyworn IoT Sensors and WebXR SLAM
by: Vanani, Poojan, et al.
Published: (2025)
by: Vanani, Poojan, et al.
Published: (2025)
PlanMoGPT: Flow-Enhanced Progressive Planning for Text to Motion Synthesis
by: Jin, Chuhao, et al.
Published: (2025)
by: Jin, Chuhao, et al.
Published: (2025)
Human Motion Video Generation: A Survey
by: Xue, Haiwei, et al.
Published: (2025)
by: Xue, Haiwei, et al.
Published: (2025)
MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance
by: Li, Quanhao, et al.
Published: (2025)
by: Li, Quanhao, et al.
Published: (2025)
Compression Metadata-assisted RoI Extraction and Adaptive Inference for Efficient Video Analytics
by: Wang, Chengzhi, et al.
Published: (2025)
by: Wang, Chengzhi, et al.
Published: (2025)
SpeechEE: A Novel Benchmark for Speech Event Extraction
by: Wang, Bin, et al.
Published: (2024)
by: Wang, Bin, et al.
Published: (2024)
DanceCamAnimator: Keyframe-Based Controllable 3D Dance Camera Synthesis
by: Wang, Zixuan, et al.
Published: (2024)
by: Wang, Zixuan, et al.
Published: (2024)
MotionDreamer: One-to-Many Motion Synthesis with Localized Generative Masked Transformer
by: Wang, Yilin, et al.
Published: (2025)
by: Wang, Yilin, et al.
Published: (2025)
LocoMotion: Learning Motion-Focused Video-Language Representations
by: Doughty, Hazel, et al.
Published: (2024)
by: Doughty, Hazel, et al.
Published: (2024)
MeMo: Attentional Momentum for Real-time Audio-visual Speaker Extraction under Impaired Visual Conditions
by: Li, Junjie, et al.
Published: (2025)
by: Li, Junjie, et al.
Published: (2025)
Recognizing Everything from All Modalities at Once: Grounded Multimodal Universal Information Extraction
by: Zhang, Meishan, et al.
Published: (2024)
by: Zhang, Meishan, et al.
Published: (2024)
REArtGS: Reconstructing and Generating Articulated Objects via 3D Gaussian Splatting with Geometric and Motion Constraints
by: Wu, Di, et al.
Published: (2025)
by: Wu, Di, et al.
Published: (2025)
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation
by: Wang, Zhouxia, et al.
Published: (2023)
by: Wang, Zhouxia, et al.
Published: (2023)
PersoNo: Personalised Notification Urgency Classifier in Mixed Reality
by: Zheng, Jingyao, et al.
Published: (2025)
by: Zheng, Jingyao, et al.
Published: (2025)
ViMo: Generating Motions from Casual Videos
by: Qiu, Liangdong, et al.
Published: (2024)
by: Qiu, Liangdong, et al.
Published: (2024)
FAST-ME: Foundation-aware Adaptive Stopping for Motion Estimation for Efficient IoT Video Analysis
by: Panagidi, Kakia, et al.
Published: (2026)
by: Panagidi, Kakia, et al.
Published: (2026)
Towards Robust and Controllable Text-to-Motion via Masked Autoregressive Diffusion
by: Zhang, Zongye, et al.
Published: (2025)
by: Zhang, Zongye, et al.
Published: (2025)
OTCR: Optimal Transmission, Compression and Representation for Multimodal Information Extraction
by: Li, Yang, et al.
Published: (2025)
by: Li, Yang, et al.
Published: (2025)
Generating Attribute-Aware Human Motions from Textual Prompt
by: Wang, Xinghan, et al.
Published: (2025)
by: Wang, Xinghan, et al.
Published: (2025)
Benchmarking and Improving LVLMs on Event Extraction from Multimedia Documents
by: Xing, Fuyu, et al.
Published: (2025)
by: Xing, Fuyu, et al.
Published: (2025)
Unison: Harmonizing Motion, Speech, and Sound for Human-Centric Audio-Video Generation
by: Cheng, Shihao, et al.
Published: (2026)
by: Cheng, Shihao, et al.
Published: (2026)
KSDiff: Keyframe-Augmented Speech-Aware Dual-Path Diffusion for Facial Animation
by: Lyu, Tianle, et al.
Published: (2025)
by: Lyu, Tianle, et al.
Published: (2025)
CueNet: Robust Audio-Visual Speaker Extraction through Cross-Modal Cue Mining and Interaction
by: Wang, Jiadong, et al.
Published: (2026)
by: Wang, Jiadong, et al.
Published: (2026)
Similar Items
-
Audio Matters Too! Enhancing Markerless Motion Capture with Audio Signals for String Performance Capture
by: Jin, Yitong, et al.
Published: (2024) -
StableMoFusion: Towards Robust and Efficient Diffusion-based Motion Generation Framework
by: Huang, Yiheng, et al.
Published: (2024) -
MotionPro: A Precise Motion Controller for Image-to-Video Generation
by: Zhang, Zhongwei, et al.
Published: (2025) -
MMoFusion: Multi-modal Co-Speech Motion Generation with Diffusion Model
by: Wang, Sen, et al.
Published: (2024) -
Where to Focus: Query-Modulated Multimodal Keyframe Selection for Long Video Understanding
by: Wang, Shaoguang, et al.
Published: (2026)