Saved in:
| Main Authors: | Peng, Yichen, Song, Jyun-Ting, Jung, Siyeol, Liu, Ruofan, Liu, Haiyang, Chu, Xuangeng, Liu, Ruicong, Wu, Erwin, Koike, Hideki, Kitani, Kris |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.23165 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
UniLS: End-to-End Audio-Driven Avatars for Unified Listening and Speaking
by: Chu, Xuangeng, et al.
Published: (2025)
by: Chu, Xuangeng, et al.
Published: (2025)
PersonaGesture: Single-Reference Co-Speech Gesture Personalization for Unseen Speakers
by: Zhang, Xiangyue, et al.
Published: (2026)
by: Zhang, Xiangyue, et al.
Published: (2026)
Harmony4D: A Video Dataset for In-The-Wild Close Human Interactions
by: Khirodkar, Rawal, et al.
Published: (2024)
by: Khirodkar, Rawal, et al.
Published: (2024)
Cosh-DiT: Co-Speech Gesture Video Synthesis via Hybrid Audio-Visual Diffusion Transformers
by: Sun, Yasheng, et al.
Published: (2025)
by: Sun, Yasheng, et al.
Published: (2025)
Ground Reaction Inertial Poser: Physics-based Human Motion Capture from Sparse IMUs and Insole Pressure Sensors
by: Hori, Ryosuke, et al.
Published: (2026)
by: Hori, Ryosuke, et al.
Published: (2026)
DiffListener: Discrete Diffusion Model for Listener Generation
by: Jung, Siyeol, et al.
Published: (2025)
by: Jung, Siyeol, et al.
Published: (2025)
Joint Diffusion for Universal Hand-Object Grasp Generation
by: Cao, Jinkun, et al.
Published: (2024)
by: Cao, Jinkun, et al.
Published: (2024)
Generalizable and Animatable Gaussian Head Avatar
by: Chu, Xuangeng, et al.
Published: (2024)
by: Chu, Xuangeng, et al.
Published: (2024)
Cross-Modal Emotion Transfer for Emotion Editing in Talking Face Video
by: Choi, Chanhyuk, et al.
Published: (2026)
by: Choi, Chanhyuk, et al.
Published: (2026)
Don't Look Twice: Faster Video Transformers with Run-Length Tokenization
by: Choudhury, Rohan, et al.
Published: (2024)
by: Choudhury, Rohan, et al.
Published: (2024)
Multi-Object Tracking by Hierarchical Visual Representations
by: Cao, Jinkun, et al.
Published: (2024)
by: Cao, Jinkun, et al.
Published: (2024)
Intentional Gesture: Deliver Your Intentions with Gestures for Speech
by: Liu, Pinxin, et al.
Published: (2025)
by: Liu, Pinxin, et al.
Published: (2025)
DyStream: Streaming Dyadic Talking Heads Generation via Flow Matching-based Autoregressive Model
by: Chen, Bohong, et al.
Published: (2025)
by: Chen, Bohong, et al.
Published: (2025)
Accelerating Vision Transformers with Adaptive Patch Sizes
by: Choudhury, Rohan, et al.
Published: (2025)
by: Choudhury, Rohan, et al.
Published: (2025)
GestureLSM: Latent Shortcut based Co-Speech Gesture Generation with Spatial-Temporal Modeling
by: Liu, Pinxin, et al.
Published: (2025)
by: Liu, Pinxin, et al.
Published: (2025)
Towards Interactive Intelligence for Digital Humans
by: Cai, Yiyi, et al.
Published: (2025)
by: Cai, Yiyi, et al.
Published: (2025)
Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve Adjustment
by: Cui, Ziteng, et al.
Published: (2025)
by: Cui, Ziteng, et al.
Published: (2025)
I2-NeRF: Learning Neural Radiance Fields Under Physically-Grounded Media Interactions
by: Liu, Shuhong, et al.
Published: (2025)
by: Liu, Shuhong, et al.
Published: (2025)
Random Channel Ablation for Robust Hand Gesture Classification with Multimodal Biosignals
by: Bimbraw, Keshav, et al.
Published: (2024)
by: Bimbraw, Keshav, et al.
Published: (2024)
G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis
by: Ye, Yufei, et al.
Published: (2024)
by: Ye, Yufei, et al.
Published: (2024)
Evaluating a VR System for Collecting Safety-Critical Vehicle-Pedestrian Interactions
by: Weng, Erica, et al.
Published: (2023)
by: Weng, Erica, et al.
Published: (2023)
CacheFlow: Fast Human Motion Prediction by Cached Normalizing Flow
by: Maeda, Takahiro, et al.
Published: (2025)
by: Maeda, Takahiro, et al.
Published: (2025)
JaywalkerVR: A VR System for Collecting Safety-Critical Pedestrian-Vehicle Interactions
by: Mukoya, Kenta, et al.
Published: (2024)
by: Mukoya, Kenta, et al.
Published: (2024)
Crafting Query-Aware Selective Attention for Single Image Super-Resolution
by: Kim, Junyoung, et al.
Published: (2025)
by: Kim, Junyoung, et al.
Published: (2025)
EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling
by: Liu, Haiyang, et al.
Published: (2023)
by: Liu, Haiyang, et al.
Published: (2023)
PixelDiT: Pixel Diffusion Transformers for Image Generation
by: Yu, Yongsheng, et al.
Published: (2025)
by: Yu, Yongsheng, et al.
Published: (2025)
GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM
by: Bimbraw, Keshav, et al.
Published: (2024)
by: Bimbraw, Keshav, et al.
Published: (2024)
Bootstrapping Linear Models for Fast Online Adaptation in Human-Agent Collaboration
by: Newman, Benjamin A, et al.
Published: (2024)
by: Newman, Benjamin A, et al.
Published: (2024)
SocialDirector: Training-Free Social Interaction Control for Multi-Person Video Generation
by: Ouyang, Liangyang, et al.
Published: (2026)
by: Ouyang, Liangyang, et al.
Published: (2026)
RAWild: Sensor-Agnostic RAW Object Detection via Physics-Guided Curve and Grid Modeling
by: Liu, Shuhong, et al.
Published: (2026)
by: Liu, Shuhong, et al.
Published: (2026)
Zero-Shot Multi-Object Scene Completion
by: Iwase, Shun, et al.
Published: (2024)
by: Iwase, Shun, et al.
Published: (2024)
Learning Human-to-Humanoid Real-Time Whole-Body Teleoperation
by: He, Tairan, et al.
Published: (2024)
by: He, Tairan, et al.
Published: (2024)
LuxDiT: Lighting Estimation with Video Diffusion Transformer
by: Liang, Ruofan, et al.
Published: (2025)
by: Liang, Ruofan, et al.
Published: (2025)
REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image
by: Ma, Xiaoxuan, et al.
Published: (2026)
by: Ma, Xiaoxuan, et al.
Published: (2026)
MGF: Mixed Gaussian Flow for Diverse Trajectory Prediction
by: Chen, Jiahe, et al.
Published: (2024)
by: Chen, Jiahe, et al.
Published: (2024)
ExpertAF: Expert Actionable Feedback from Video
by: Ashutosh, Kumar, et al.
Published: (2024)
by: Ashutosh, Kumar, et al.
Published: (2024)
Generalizable Neural Human Renderer
by: Masuda, Mana, et al.
Published: (2024)
by: Masuda, Mana, et al.
Published: (2024)
Environmental Understanding Vision-Language Model for Embodied Agent
by: Bang, Jinsik, et al.
Published: (2026)
by: Bang, Jinsik, et al.
Published: (2026)
Personalizing Causal Audio-Driven Facial Motion via Dynamic Multi-modal Retrieval
by: Chu, Xuangeng, et al.
Published: (2026)
by: Chu, Xuangeng, et al.
Published: (2026)
Multimodal Emotion Coupling via Speech-to-Facial and Bodily Gestures in Dyadic Interaction
by: Herbuela, Von Ralph Dane Marquez, et al.
Published: (2025)
by: Herbuela, Von Ralph Dane Marquez, et al.
Published: (2025)
Similar Items
-
UniLS: End-to-End Audio-Driven Avatars for Unified Listening and Speaking
by: Chu, Xuangeng, et al.
Published: (2025) -
PersonaGesture: Single-Reference Co-Speech Gesture Personalization for Unseen Speakers
by: Zhang, Xiangyue, et al.
Published: (2026) -
Harmony4D: A Video Dataset for In-The-Wild Close Human Interactions
by: Khirodkar, Rawal, et al.
Published: (2024) -
Cosh-DiT: Co-Speech Gesture Video Synthesis via Hybrid Audio-Visual Diffusion Transformers
by: Sun, Yasheng, et al.
Published: (2025) -
Ground Reaction Inertial Poser: Physics-based Human Motion Capture from Sparse IMUs and Insole Pressure Sensors
by: Hori, Ryosuke, et al.
Published: (2026)