:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xu, Yutong, Du, Junhao, Wang, Jiahe, Ning, Yuwei, Cao, Sihan Zhou Yang
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Human-Computer Interaction Multimedia
Online Access:	https://arxiv.org/abs/2403.17708
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model
by: He, Xu, et al.
Published: (2024)

Editing Physiological Signals in Videos Using Latent Representations
by: Zhou, Tianwen, et al.
Published: (2025)

EyeNavGS: A 6-DoF Navigation Dataset and Record-n-Replay Software for Real-World 3DGS Scenes in VR
by: Ding, Zihao, et al.
Published: (2025)

SVFAP: Self-supervised Video Facial Affect Perceiver
by: Sun, Licai, et al.
Published: (2023)

MindCine: Multimodal EEG-to-Video Reconstruction with Large-Scale Pretrained Models
by: Zhou, Tian-Yi, et al.
Published: (2026)

VideoMap: Supporting Video Editing Exploration, Brainstorming, and Prototyping in the Latent Space
by: Lin, David Chuan-En, et al.
Published: (2022)

BATON: A Multimodal Benchmark for Bidirectional Automation Transition Observation in Naturalistic Driving
by: Wang, Yuhang, et al.
Published: (2026)

MindCross: Fast New Subject Adaptation with Limited Data for Cross-subject Video Reconstruction from Brain Signals
by: Liu, Xuan-Hao, et al.
Published: (2025)

Videogenic: Identifying Highlight Moments in Videos with Professional Photographs as a Prior
by: Lin, David Chuan-En, et al.
Published: (2022)

Seeing, Hearing, and Knowing Together: Multimodal Strategies in Deepfake Videos Detection
by: Chen, Chen, et al.
Published: (2026)

AttentionBender: Manipulating Cross-Attention in Video Diffusion Transformers as a Creative Probe
by: Cole, Adam, et al.
Published: (2026)

G3R: Generating Rich and Fine-grained mmWave Radar Data from 2D Videos for Generalized Gesture Recognition
by: Deng, Kaikai, et al.
Published: (2024)

Advancing Talking Head Generation: A Comprehensive Survey of Multi-Modal Methodologies, Datasets, Evaluation Metrics, and Loss Functions
by: Rakesh, Vineet Kumar, et al.
Published: (2025)

FastPerson: Enhancing Video Learning through Effective Video Summarization that Preserves Linguistic and Visual Contexts
by: Kawamura, Kazuki, et al.
Published: (2024)

Plug-and-Play Clarifier: A Zero-Shot Multimodal Framework for Egocentric Intent Disambiguation
by: Yang, Sicheng, et al.
Published: (2025)

Color When It Counts: Grayscale-Guided Online Triggering for Always-On Streaming Video Sensing
by: Cai, Weitong, et al.
Published: (2026)

Secure & Personalized Music-to-Video Generation via CHARCHA
by: Agarwal, Mehul, et al.
Published: (2025)

ReactDiff: Fundamental Multiple Appropriate Facial Reaction Diffusion Model
by: Cheng, Luo, et al.
Published: (2025)

SentiAvatar: Towards Expressive and Interactive Digital Humans
by: Jin, Chuhao, et al.
Published: (2026)

ReactFace: Online Multiple Appropriate Facial Reaction Generation in Dyadic Interactions
by: Luo, Cheng, et al.
Published: (2023)

Across-Game Engagement Modelling via Few-Shot Learning
by: Pinitas, Kosmas, et al.
Published: (2024)

Emotion Based Prediction in the Context of Optimized Trajectory Planning for Immersive Learning
by: Sungheetha, Akey, et al.
Published: (2023)

MS2Mesh-XR: Multi-modal Sketch-to-Mesh Generation in XR Environments
by: Tong, Yuqi, et al.
Published: (2024)

Generative Timelines for Instructed Visual Assembly
by: Pardo, Alejandro, et al.
Published: (2024)

Shu Dao: A Calligraphy Score Framework Linking Calligraphy, Music, and Performance
by: Huang, Lican
Published: (2026)

LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs
by: Sun, Boyuan, et al.
Published: (2025)

Learning High-Quality Navigation and Zooming on Omnidirectional Images in Virtual Reality
by: Cao, Zidong, et al.
Published: (2024)

SkinGEN: an Explainable Dermatology Diagnosis-to-Generation Framework with Interactive Vision-Language Models
by: Lin, Bo, et al.
Published: (2024)

DiffMesh: A Motion-aware Diffusion Framework for Human Mesh Recovery from Videos
by: Zheng, Ce, et al.
Published: (2023)

"I Can See Forever!": Evaluating Real-time VideoLLMs for Assisting Individuals with Visual Impairments
by: Zhang, Ziyi, et al.
Published: (2025)

Code2Video: A Code-centric Paradigm for Educational Video Generation
by: Chen, Yanzhe, et al.
Published: (2025)

Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation
by: Ki, Taekyung, et al.
Published: (2026)

WebXR, A-Frame and Networked-Aframe as a Basis for an Open Metaverse: A Conceptual Architecture
by: Macario, Giuseppe
Published: (2024)

adder-viz: Real-Time Visualization Software for Transcoding Event Video
by: Freeman, Andrew C., et al.
Published: (2025)

Focus360: Guiding User Attention in Immersive Videos for VR
by: Silva, Paulo Vitor S., et al.
Published: (2026)

Unveiling the Visual Rhetoric of Persuasive Cartography: A Case Study of the Design of Octopus Maps
by: Lin, Daocheng, et al.
Published: (2025)

Soundify: Matching Sound Effects to Video
by: Lin, David Chuan-En, et al.
Published: (2021)

SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation
by: Yin, Wanqi, et al.
Published: (2025)

MV-Crafter: An Intelligent System for Music-guided Video Generation
by: Chen, Chuer, et al.
Published: (2025)

ComVi: Context-Aware Optimized Comment Display in Video Playback
by: Kim, Minsun, et al.
Published: (2026)