Saved in:
| Main Authors: | Xu, Yutong, Du, Junhao, Wang, Jiahe, Ning, Yuwei, Cao, Sihan Zhou Yang |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2403.17708 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model
by: He, Xu, et al.
Published: (2024)
by: He, Xu, et al.
Published: (2024)
Editing Physiological Signals in Videos Using Latent Representations
by: Zhou, Tianwen, et al.
Published: (2025)
by: Zhou, Tianwen, et al.
Published: (2025)
EyeNavGS: A 6-DoF Navigation Dataset and Record-n-Replay Software for Real-World 3DGS Scenes in VR
by: Ding, Zihao, et al.
Published: (2025)
by: Ding, Zihao, et al.
Published: (2025)
SVFAP: Self-supervised Video Facial Affect Perceiver
by: Sun, Licai, et al.
Published: (2023)
by: Sun, Licai, et al.
Published: (2023)
MindCine: Multimodal EEG-to-Video Reconstruction with Large-Scale Pretrained Models
by: Zhou, Tian-Yi, et al.
Published: (2026)
by: Zhou, Tian-Yi, et al.
Published: (2026)
VideoMap: Supporting Video Editing Exploration, Brainstorming, and Prototyping in the Latent Space
by: Lin, David Chuan-En, et al.
Published: (2022)
by: Lin, David Chuan-En, et al.
Published: (2022)
BATON: A Multimodal Benchmark for Bidirectional Automation Transition Observation in Naturalistic Driving
by: Wang, Yuhang, et al.
Published: (2026)
by: Wang, Yuhang, et al.
Published: (2026)
MindCross: Fast New Subject Adaptation with Limited Data for Cross-subject Video Reconstruction from Brain Signals
by: Liu, Xuan-Hao, et al.
Published: (2025)
by: Liu, Xuan-Hao, et al.
Published: (2025)
Videogenic: Identifying Highlight Moments in Videos with Professional Photographs as a Prior
by: Lin, David Chuan-En, et al.
Published: (2022)
by: Lin, David Chuan-En, et al.
Published: (2022)
Seeing, Hearing, and Knowing Together: Multimodal Strategies in Deepfake Videos Detection
by: Chen, Chen, et al.
Published: (2026)
by: Chen, Chen, et al.
Published: (2026)
AttentionBender: Manipulating Cross-Attention in Video Diffusion Transformers as a Creative Probe
by: Cole, Adam, et al.
Published: (2026)
by: Cole, Adam, et al.
Published: (2026)
G3R: Generating Rich and Fine-grained mmWave Radar Data from 2D Videos for Generalized Gesture Recognition
by: Deng, Kaikai, et al.
Published: (2024)
by: Deng, Kaikai, et al.
Published: (2024)
Advancing Talking Head Generation: A Comprehensive Survey of Multi-Modal Methodologies, Datasets, Evaluation Metrics, and Loss Functions
by: Rakesh, Vineet Kumar, et al.
Published: (2025)
by: Rakesh, Vineet Kumar, et al.
Published: (2025)
FastPerson: Enhancing Video Learning through Effective Video Summarization that Preserves Linguistic and Visual Contexts
by: Kawamura, Kazuki, et al.
Published: (2024)
by: Kawamura, Kazuki, et al.
Published: (2024)
Plug-and-Play Clarifier: A Zero-Shot Multimodal Framework for Egocentric Intent Disambiguation
by: Yang, Sicheng, et al.
Published: (2025)
by: Yang, Sicheng, et al.
Published: (2025)
Color When It Counts: Grayscale-Guided Online Triggering for Always-On Streaming Video Sensing
by: Cai, Weitong, et al.
Published: (2026)
by: Cai, Weitong, et al.
Published: (2026)
Secure & Personalized Music-to-Video Generation via CHARCHA
by: Agarwal, Mehul, et al.
Published: (2025)
by: Agarwal, Mehul, et al.
Published: (2025)
ReactDiff: Fundamental Multiple Appropriate Facial Reaction Diffusion Model
by: Cheng, Luo, et al.
Published: (2025)
by: Cheng, Luo, et al.
Published: (2025)
SentiAvatar: Towards Expressive and Interactive Digital Humans
by: Jin, Chuhao, et al.
Published: (2026)
by: Jin, Chuhao, et al.
Published: (2026)
ReactFace: Online Multiple Appropriate Facial Reaction Generation in Dyadic Interactions
by: Luo, Cheng, et al.
Published: (2023)
by: Luo, Cheng, et al.
Published: (2023)
Across-Game Engagement Modelling via Few-Shot Learning
by: Pinitas, Kosmas, et al.
Published: (2024)
by: Pinitas, Kosmas, et al.
Published: (2024)
Emotion Based Prediction in the Context of Optimized Trajectory Planning for Immersive Learning
by: Sungheetha, Akey, et al.
Published: (2023)
by: Sungheetha, Akey, et al.
Published: (2023)
MS2Mesh-XR: Multi-modal Sketch-to-Mesh Generation in XR Environments
by: Tong, Yuqi, et al.
Published: (2024)
by: Tong, Yuqi, et al.
Published: (2024)
Generative Timelines for Instructed Visual Assembly
by: Pardo, Alejandro, et al.
Published: (2024)
by: Pardo, Alejandro, et al.
Published: (2024)
Shu Dao: A Calligraphy Score Framework Linking Calligraphy, Music, and Performance
by: Huang, Lican
Published: (2026)
by: Huang, Lican
Published: (2026)
LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs
by: Sun, Boyuan, et al.
Published: (2025)
by: Sun, Boyuan, et al.
Published: (2025)
Learning High-Quality Navigation and Zooming on Omnidirectional Images in Virtual Reality
by: Cao, Zidong, et al.
Published: (2024)
by: Cao, Zidong, et al.
Published: (2024)
SkinGEN: an Explainable Dermatology Diagnosis-to-Generation Framework with Interactive Vision-Language Models
by: Lin, Bo, et al.
Published: (2024)
by: Lin, Bo, et al.
Published: (2024)
DiffMesh: A Motion-aware Diffusion Framework for Human Mesh Recovery from Videos
by: Zheng, Ce, et al.
Published: (2023)
by: Zheng, Ce, et al.
Published: (2023)
"I Can See Forever!": Evaluating Real-time VideoLLMs for Assisting Individuals with Visual Impairments
by: Zhang, Ziyi, et al.
Published: (2025)
by: Zhang, Ziyi, et al.
Published: (2025)
Code2Video: A Code-centric Paradigm for Educational Video Generation
by: Chen, Yanzhe, et al.
Published: (2025)
by: Chen, Yanzhe, et al.
Published: (2025)
Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation
by: Ki, Taekyung, et al.
Published: (2026)
by: Ki, Taekyung, et al.
Published: (2026)
WebXR, A-Frame and Networked-Aframe as a Basis for an Open Metaverse: A Conceptual Architecture
by: Macario, Giuseppe
Published: (2024)
by: Macario, Giuseppe
Published: (2024)
adder-viz: Real-Time Visualization Software for Transcoding Event Video
by: Freeman, Andrew C., et al.
Published: (2025)
by: Freeman, Andrew C., et al.
Published: (2025)
Focus360: Guiding User Attention in Immersive Videos for VR
by: Silva, Paulo Vitor S., et al.
Published: (2026)
by: Silva, Paulo Vitor S., et al.
Published: (2026)
Unveiling the Visual Rhetoric of Persuasive Cartography: A Case Study of the Design of Octopus Maps
by: Lin, Daocheng, et al.
Published: (2025)
by: Lin, Daocheng, et al.
Published: (2025)
Soundify: Matching Sound Effects to Video
by: Lin, David Chuan-En, et al.
Published: (2021)
by: Lin, David Chuan-En, et al.
Published: (2021)
SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation
by: Yin, Wanqi, et al.
Published: (2025)
by: Yin, Wanqi, et al.
Published: (2025)
MV-Crafter: An Intelligent System for Music-guided Video Generation
by: Chen, Chuer, et al.
Published: (2025)
by: Chen, Chuer, et al.
Published: (2025)
ComVi: Context-Aware Optimized Comment Display in Video Playback
by: Kim, Minsun, et al.
Published: (2026)
by: Kim, Minsun, et al.
Published: (2026)
Similar Items
-
Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model
by: He, Xu, et al.
Published: (2024) -
Editing Physiological Signals in Videos Using Latent Representations
by: Zhou, Tianwen, et al.
Published: (2025) -
EyeNavGS: A 6-DoF Navigation Dataset and Record-n-Replay Software for Real-World 3DGS Scenes in VR
by: Ding, Zihao, et al.
Published: (2025) -
SVFAP: Self-supervised Video Facial Affect Perceiver
by: Sun, Licai, et al.
Published: (2023) -
MindCine: Multimodal EEG-to-Video Reconstruction with Large-Scale Pretrained Models
by: Zhou, Tian-Yi, et al.
Published: (2026)