Saved in:
| Main Authors: | Lin, Zhi-Yi, Markhorst, Thomas, Chew, Jouh Yeong, Zhang, Xucong |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.08125 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MuPPet: Multi-person 2D-to-3D Pose Lifting
by: Markhorst, Thomas, et al.
Published: (2026)
by: Markhorst, Thomas, et al.
Published: (2026)
GazeHTA: End-to-end Gaze Target Detection with Head-Target Association
by: Lin, Zhi-Yi, et al.
Published: (2024)
by: Lin, Zhi-Yi, et al.
Published: (2024)
EmbodiedHead: Real-Time Listening and Speaking Avatar for Conversational Agents
by: Zhang, Yu, et al.
Published: (2026)
by: Zhang, Yu, et al.
Published: (2026)
Ready-to-React: Online Reaction Policy for Two-Character Interaction Generation
by: Cen, Zhi, et al.
Published: (2025)
by: Cen, Zhi, et al.
Published: (2025)
3D Kinematics Estimation from Video with a Biomechanical Model and Synthetic Training Data
by: Lin, Zhi-Yi, et al.
Published: (2024)
by: Lin, Zhi-Yi, et al.
Published: (2024)
UniGaze: Towards Universal Gaze Estimation via Large-scale Pre-Training
by: Qin, Jiawei, et al.
Published: (2025)
by: Qin, Jiawei, et al.
Published: (2025)
End-to-end Listen, Look, Speak and Act
by: Wang, Siyin, et al.
Published: (2025)
by: Wang, Siyin, et al.
Published: (2025)
Pushing Joint Image Denoising and Classification to the Edge
by: Markhorst, Thomas C, et al.
Published: (2024)
by: Markhorst, Thomas C, et al.
Published: (2024)
VividListener: Expressive and Controllable Listener Dynamics Modeling for Multi-Modal Responsive Interaction
by: Li, Shiying, et al.
Published: (2025)
by: Li, Shiying, et al.
Published: (2025)
UniLS: End-to-End Audio-Driven Avatars for Unified Listening and Speaking
by: Chu, Xuangeng, et al.
Published: (2025)
by: Chu, Xuangeng, et al.
Published: (2025)
AttentionLut: Attention Fusion-based Canonical Polyadic LUT for Real-time Image Enhancement
by: Fu, Kang, et al.
Published: (2024)
by: Fu, Kang, et al.
Published: (2024)
Domain-Adaptive Full-Face Gaze Estimation via Novel-View-Synthesis and Feature Disentanglement
by: Qin, Jiawei, et al.
Published: (2023)
by: Qin, Jiawei, et al.
Published: (2023)
PrivateGaze: Preserving User Privacy in Black-box Mobile Gaze Tracking Services
by: Du, Lingyu, et al.
Published: (2024)
by: Du, Lingyu, et al.
Published: (2024)
Listening Deepfake Detection: A New Perspective Beyond Speaking-Centric Forgery Analysis
by: Liu, Miao, et al.
Published: (2026)
by: Liu, Miao, et al.
Published: (2026)
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
by: Long, Lin, et al.
Published: (2025)
by: Long, Lin, et al.
Published: (2025)
Beyond Words: Multimodal LLM Knows When to Speak
by: Liao, Zikai, et al.
Published: (2025)
by: Liao, Zikai, et al.
Published: (2025)
Interactive Humanoid: Online Full-Body Motion Reaction Synthesis with Social Affordance Canonicalization and Forecasting
by: Liu, Yunze, et al.
Published: (2023)
by: Liu, Yunze, et al.
Published: (2023)
CASHG: Context-Aware Stylized Online Handwriting Generation
by: Shin, Jinsu, et al.
Published: (2026)
by: Shin, Jinsu, et al.
Published: (2026)
MEgoHand: Multimodal Egocentric Hand-Object Interaction Motion Generation
by: Zhou, Bohan, et al.
Published: (2025)
by: Zhou, Bohan, et al.
Published: (2025)
GGPT: Geometry Grounded Point Transformer
by: Chen, Yutong, et al.
Published: (2026)
by: Chen, Yutong, et al.
Published: (2026)
CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation
by: Liu, Xi, et al.
Published: (2024)
by: Liu, Xi, et al.
Published: (2024)
Defending Text-to-image Diffusion Models: Surprising Efficacy of Textual Perturbations Against Backdoor Attacks
by: Chew, Oscar, et al.
Published: (2024)
by: Chew, Oscar, et al.
Published: (2024)
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement
by: Hu, Teng, et al.
Published: (2025)
by: Hu, Teng, et al.
Published: (2025)
Gaze-Guided 3D Hand Motion Prediction for Detecting Intent in Egocentric Grasping Tasks
by: He, Yufei, et al.
Published: (2025)
by: He, Yufei, et al.
Published: (2025)
Multimodal Information Interaction for Medical Image Segmentation
by: Fan, Xinxin, et al.
Published: (2024)
by: Fan, Xinxin, et al.
Published: (2024)
Poly-Autoregressive Prediction for Modeling Interactions
by: Thakkar, Neerja, et al.
Published: (2025)
by: Thakkar, Neerja, et al.
Published: (2025)
ReactFace: Online Multiple Appropriate Facial Reaction Generation in Dyadic Interactions
by: Luo, Cheng, et al.
Published: (2023)
by: Luo, Cheng, et al.
Published: (2023)
VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing
by: Wang, Ke, et al.
Published: (2025)
by: Wang, Ke, et al.
Published: (2025)
Speak While Watching: Unleashing TRUE Real-Time Video Understanding Capability of Multimodal Large Language Models
by: Lin, Junyan, et al.
Published: (2026)
by: Lin, Junyan, et al.
Published: (2026)
HunyuanVideo-HOMA: Generic Human-Object Interaction in Multimodal Driven Human Animation
by: Huang, Ziyao, et al.
Published: (2025)
by: Huang, Ziyao, et al.
Published: (2025)
MIRe: Enhancing Multimodal Queries Representation via Fusion-Free Modality Interaction for Multimodal Retrieval
by: Ju, Yeong-Joon, et al.
Published: (2024)
by: Ju, Yeong-Joon, et al.
Published: (2024)
ViSpeak: Visual Instruction Feedback in Streaming Videos
by: Fu, Shenghao, et al.
Published: (2025)
by: Fu, Shenghao, et al.
Published: (2025)
GDPO-Listener: Expressive Interactive Head Generation via Auto-Regressive Flow Matching and Group reward-Decoupled Policy Optimization
by: Jin, Zhangyu, et al.
Published: (2026)
by: Jin, Zhangyu, et al.
Published: (2026)
PointCubeNet: 3D Part-level Reasoning with 3x3x3 Point Cloud Blocks
by: Kim, Da-Yeong, et al.
Published: (2025)
by: Kim, Da-Yeong, et al.
Published: (2025)
DANCE: Density-agnostic and Class-aware Network for Point Cloud Completion
by: Kim, Da-Yeong, et al.
Published: (2025)
by: Kim, Da-Yeong, et al.
Published: (2025)
OmniResponse: Online Multimodal Conversational Response Generation in Dyadic Interactions
by: Luo, Cheng, et al.
Published: (2025)
by: Luo, Cheng, et al.
Published: (2025)
ParaUni: Enhance Generation in Unified Multimodal Model with Reinforcement-driven Hierarchical Parallel Information Interaction
by: Tan, Jiangtong, et al.
Published: (2025)
by: Tan, Jiangtong, et al.
Published: (2025)
Speak, Segment, Track, Navigate: An Interactive System for Video-Guided Skull-Base Surgery
by: Mao, Jecia Z. Y., et al.
Published: (2026)
by: Mao, Jecia Z. Y., et al.
Published: (2026)
Decoupling Layout from Glyph in Online Chinese Handwriting Generation
by: Ren, Min-Si, et al.
Published: (2024)
by: Ren, Min-Si, et al.
Published: (2024)
Let ViT Speak: Generative Language-Image Pre-training
by: Fang, Yan, et al.
Published: (2026)
by: Fang, Yan, et al.
Published: (2026)
Similar Items
-
MuPPet: Multi-person 2D-to-3D Pose Lifting
by: Markhorst, Thomas, et al.
Published: (2026) -
GazeHTA: End-to-end Gaze Target Detection with Head-Target Association
by: Lin, Zhi-Yi, et al.
Published: (2024) -
EmbodiedHead: Real-Time Listening and Speaking Avatar for Conversational Agents
by: Zhang, Yu, et al.
Published: (2026) -
Ready-to-React: Online Reaction Policy for Two-Character Interaction Generation
by: Cen, Zhi, et al.
Published: (2025) -
3D Kinematics Estimation from Video with a Biomechanical Model and Synthetic Training Data
by: Lin, Zhi-Yi, et al.
Published: (2024)