:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Peng, Yichen, Song, Jyun-Ting, Jung, Siyeol, Liu, Ruofan, Liu, Haiyang, Chu, Xuangeng, Liu, Ruicong, Wu, Erwin, Koike, Hideki, Kitani, Kris
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.23165
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

UniLS: End-to-End Audio-Driven Avatars for Unified Listening and Speaking
by: Chu, Xuangeng, et al.
Published: (2025)

PersonaGesture: Single-Reference Co-Speech Gesture Personalization for Unseen Speakers
by: Zhang, Xiangyue, et al.
Published: (2026)

Harmony4D: A Video Dataset for In-The-Wild Close Human Interactions
by: Khirodkar, Rawal, et al.
Published: (2024)

Cosh-DiT: Co-Speech Gesture Video Synthesis via Hybrid Audio-Visual Diffusion Transformers
by: Sun, Yasheng, et al.
Published: (2025)

Ground Reaction Inertial Poser: Physics-based Human Motion Capture from Sparse IMUs and Insole Pressure Sensors
by: Hori, Ryosuke, et al.
Published: (2026)

DiffListener: Discrete Diffusion Model for Listener Generation
by: Jung, Siyeol, et al.
Published: (2025)

Joint Diffusion for Universal Hand-Object Grasp Generation
by: Cao, Jinkun, et al.
Published: (2024)

Generalizable and Animatable Gaussian Head Avatar
by: Chu, Xuangeng, et al.
Published: (2024)

Cross-Modal Emotion Transfer for Emotion Editing in Talking Face Video
by: Choi, Chanhyuk, et al.
Published: (2026)

Don't Look Twice: Faster Video Transformers with Run-Length Tokenization
by: Choudhury, Rohan, et al.
Published: (2024)

Multi-Object Tracking by Hierarchical Visual Representations
by: Cao, Jinkun, et al.
Published: (2024)

Intentional Gesture: Deliver Your Intentions with Gestures for Speech
by: Liu, Pinxin, et al.
Published: (2025)

DyStream: Streaming Dyadic Talking Heads Generation via Flow Matching-based Autoregressive Model
by: Chen, Bohong, et al.
Published: (2025)

Accelerating Vision Transformers with Adaptive Patch Sizes
by: Choudhury, Rohan, et al.
Published: (2025)

GestureLSM: Latent Shortcut based Co-Speech Gesture Generation with Spatial-Temporal Modeling
by: Liu, Pinxin, et al.
Published: (2025)

Towards Interactive Intelligence for Digital Humans
by: Cai, Yiyi, et al.
Published: (2025)

Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve Adjustment
by: Cui, Ziteng, et al.
Published: (2025)

I2-NeRF: Learning Neural Radiance Fields Under Physically-Grounded Media Interactions
by: Liu, Shuhong, et al.
Published: (2025)

Random Channel Ablation for Robust Hand Gesture Classification with Multimodal Biosignals
by: Bimbraw, Keshav, et al.
Published: (2024)

G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis
by: Ye, Yufei, et al.
Published: (2024)

Evaluating a VR System for Collecting Safety-Critical Vehicle-Pedestrian Interactions
by: Weng, Erica, et al.
Published: (2023)

CacheFlow: Fast Human Motion Prediction by Cached Normalizing Flow
by: Maeda, Takahiro, et al.
Published: (2025)

JaywalkerVR: A VR System for Collecting Safety-Critical Pedestrian-Vehicle Interactions
by: Mukoya, Kenta, et al.
Published: (2024)

Crafting Query-Aware Selective Attention for Single Image Super-Resolution
by: Kim, Junyoung, et al.
Published: (2025)

EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling
by: Liu, Haiyang, et al.
Published: (2023)

PixelDiT: Pixel Diffusion Transformers for Image Generation
by: Yu, Yongsheng, et al.
Published: (2025)

GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM
by: Bimbraw, Keshav, et al.
Published: (2024)

Bootstrapping Linear Models for Fast Online Adaptation in Human-Agent Collaboration
by: Newman, Benjamin A, et al.
Published: (2024)

SocialDirector: Training-Free Social Interaction Control for Multi-Person Video Generation
by: Ouyang, Liangyang, et al.
Published: (2026)

RAWild: Sensor-Agnostic RAW Object Detection via Physics-Guided Curve and Grid Modeling
by: Liu, Shuhong, et al.
Published: (2026)

Zero-Shot Multi-Object Scene Completion
by: Iwase, Shun, et al.
Published: (2024)

Learning Human-to-Humanoid Real-Time Whole-Body Teleoperation
by: He, Tairan, et al.
Published: (2024)

LuxDiT: Lighting Estimation with Video Diffusion Transformer
by: Liang, Ruofan, et al.
Published: (2025)

REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image
by: Ma, Xiaoxuan, et al.
Published: (2026)

MGF: Mixed Gaussian Flow for Diverse Trajectory Prediction
by: Chen, Jiahe, et al.
Published: (2024)

ExpertAF: Expert Actionable Feedback from Video
by: Ashutosh, Kumar, et al.
Published: (2024)

Generalizable Neural Human Renderer
by: Masuda, Mana, et al.
Published: (2024)

Environmental Understanding Vision-Language Model for Embodied Agent
by: Bang, Jinsik, et al.
Published: (2026)

Personalizing Causal Audio-Driven Facial Motion via Dynamic Multi-modal Retrieval
by: Chu, Xuangeng, et al.
Published: (2026)

Multimodal Emotion Coupling via Speech-to-Facial and Bodily Gestures in Dyadic Interaction
by: Herbuela, Von Ralph Dane Marquez, et al.
Published: (2025)