:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Xuling, Zhang, Ziru, Wang, Yuyang, Lee, Lik-hang, Hui, Pan
Format:	Preprint
Published:	2024
Subjects:	Multimedia
Online Access:	https://arxiv.org/abs/2407.00925
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Audio Matters Too! Enhancing Markerless Motion Capture with Audio Signals for String Performance Capture
by: Jin, Yitong, et al.
Published: (2024)

StableMoFusion: Towards Robust and Efficient Diffusion-based Motion Generation Framework
by: Huang, Yiheng, et al.
Published: (2024)

MotionPro: A Precise Motion Controller for Image-to-Video Generation
by: Zhang, Zhongwei, et al.
Published: (2025)

MMoFusion: Multi-modal Co-Speech Motion Generation with Diffusion Model
by: Wang, Sen, et al.
Published: (2024)

Where to Focus: Query-Modulated Multimodal Keyframe Selection for Long Video Understanding
by: Wang, Shaoguang, et al.
Published: (2026)

AMD: Autoregressive Motion Diffusion
by: Han, Bo, et al.
Published: (2023)

TriPSS: A Tri-Modal Keyframe Extraction Framework Using Perceptual, Structural, and Semantic Representations
by: Cakmak, Mert Can, et al.
Published: (2025)

Stepwise Schema-Guided Prompting Framework with Parameter Efficient Instruction Tuning for Multimedia Event Extraction
by: Yuan, Xiang, et al.
Published: (2025)

Text-controlled Motion Mamba: Text-Instructed Temporal Grounding of Human Motion
by: Wang, Xinghan, et al.
Published: (2024)

Efficient Sub-pixel Motion Compensation in Learned Video Codecs
by: Ladune, Théo, et al.
Published: (2025)

Harmony-Aware Music-driven Motion Synthesis with Perceptual Constraint on UGC Datasets
by: Wu, Xinyi, et al.
Published: (2025)

PP-Motion: Physical-Perceptual Fidelity Evaluation for Human Motion Generation
by: Zhao, Sihan, et al.
Published: (2025)

MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
by: Zhang, Yuang, et al.
Published: (2024)

MotionBeat: Motion-Aligned Music Representation via Embodied Contrastive Learning and Bar-Equivariant Contact-Aware Encoding
by: Wang, Xuanchen, et al.
Published: (2025)

AV1 Motion Vector Fidelity and Application for Efficient Optical Flow
by: Zouein, Julien, et al.
Published: (2025)

Multimodal Cyber-physical Interaction in XR: Hybrid Doctoral Thesis Defense
by: Alhilal, Ahmad, et al.
Published: (2026)

KeyVideoLLM: Towards Large-scale Video Keyframe Selection
by: Liang, Hao, et al.
Published: (2024)

Mesquite MoCap: Democratizing Real-Time Motion Capture with Affordable, Bodyworn IoT Sensors and WebXR SLAM
by: Vanani, Poojan, et al.
Published: (2025)

PlanMoGPT: Flow-Enhanced Progressive Planning for Text to Motion Synthesis
by: Jin, Chuhao, et al.
Published: (2025)

Human Motion Video Generation: A Survey
by: Xue, Haiwei, et al.
Published: (2025)

MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance
by: Li, Quanhao, et al.
Published: (2025)

Compression Metadata-assisted RoI Extraction and Adaptive Inference for Efficient Video Analytics
by: Wang, Chengzhi, et al.
Published: (2025)

SpeechEE: A Novel Benchmark for Speech Event Extraction
by: Wang, Bin, et al.
Published: (2024)

DanceCamAnimator: Keyframe-Based Controllable 3D Dance Camera Synthesis
by: Wang, Zixuan, et al.
Published: (2024)

MotionDreamer: One-to-Many Motion Synthesis with Localized Generative Masked Transformer
by: Wang, Yilin, et al.
Published: (2025)

LocoMotion: Learning Motion-Focused Video-Language Representations
by: Doughty, Hazel, et al.
Published: (2024)

MeMo: Attentional Momentum for Real-time Audio-visual Speaker Extraction under Impaired Visual Conditions
by: Li, Junjie, et al.
Published: (2025)

Recognizing Everything from All Modalities at Once: Grounded Multimodal Universal Information Extraction
by: Zhang, Meishan, et al.
Published: (2024)

REArtGS: Reconstructing and Generating Articulated Objects via 3D Gaussian Splatting with Geometric and Motion Constraints
by: Wu, Di, et al.
Published: (2025)

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation
by: Wang, Zhouxia, et al.
Published: (2023)

PersoNo: Personalised Notification Urgency Classifier in Mixed Reality
by: Zheng, Jingyao, et al.
Published: (2025)

ViMo: Generating Motions from Casual Videos
by: Qiu, Liangdong, et al.
Published: (2024)

FAST-ME: Foundation-aware Adaptive Stopping for Motion Estimation for Efficient IoT Video Analysis
by: Panagidi, Kakia, et al.
Published: (2026)

Towards Robust and Controllable Text-to-Motion via Masked Autoregressive Diffusion
by: Zhang, Zongye, et al.
Published: (2025)

OTCR: Optimal Transmission, Compression and Representation for Multimodal Information Extraction
by: Li, Yang, et al.
Published: (2025)

Generating Attribute-Aware Human Motions from Textual Prompt
by: Wang, Xinghan, et al.
Published: (2025)

Benchmarking and Improving LVLMs on Event Extraction from Multimedia Documents
by: Xing, Fuyu, et al.
Published: (2025)

Unison: Harmonizing Motion, Speech, and Sound for Human-Centric Audio-Video Generation
by: Cheng, Shihao, et al.
Published: (2026)

KSDiff: Keyframe-Augmented Speech-Aware Dual-Path Diffusion for Facial Animation
by: Lyu, Tianle, et al.
Published: (2025)

CueNet: Robust Audio-Visual Speaker Extraction through Cross-Modal Cue Mining and Interaction
by: Wang, Jiadong, et al.
Published: (2026)