:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lin, Zhi-Yi, Markhorst, Thomas, Chew, Jouh Yeong, Zhang, Xucong
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2604.08125
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MuPPet: Multi-person 2D-to-3D Pose Lifting
by: Markhorst, Thomas, et al.
Published: (2026)

GazeHTA: End-to-end Gaze Target Detection with Head-Target Association
by: Lin, Zhi-Yi, et al.
Published: (2024)

EmbodiedHead: Real-Time Listening and Speaking Avatar for Conversational Agents
by: Zhang, Yu, et al.
Published: (2026)

Ready-to-React: Online Reaction Policy for Two-Character Interaction Generation
by: Cen, Zhi, et al.
Published: (2025)

3D Kinematics Estimation from Video with a Biomechanical Model and Synthetic Training Data
by: Lin, Zhi-Yi, et al.
Published: (2024)

UniGaze: Towards Universal Gaze Estimation via Large-scale Pre-Training
by: Qin, Jiawei, et al.
Published: (2025)

End-to-end Listen, Look, Speak and Act
by: Wang, Siyin, et al.
Published: (2025)

Pushing Joint Image Denoising and Classification to the Edge
by: Markhorst, Thomas C, et al.
Published: (2024)

VividListener: Expressive and Controllable Listener Dynamics Modeling for Multi-Modal Responsive Interaction
by: Li, Shiying, et al.
Published: (2025)

UniLS: End-to-End Audio-Driven Avatars for Unified Listening and Speaking
by: Chu, Xuangeng, et al.
Published: (2025)

AttentionLut: Attention Fusion-based Canonical Polyadic LUT for Real-time Image Enhancement
by: Fu, Kang, et al.
Published: (2024)

Domain-Adaptive Full-Face Gaze Estimation via Novel-View-Synthesis and Feature Disentanglement
by: Qin, Jiawei, et al.
Published: (2023)

PrivateGaze: Preserving User Privacy in Black-box Mobile Gaze Tracking Services
by: Du, Lingyu, et al.
Published: (2024)

Listening Deepfake Detection: A New Perspective Beyond Speaking-Centric Forgery Analysis
by: Liu, Miao, et al.
Published: (2026)

Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
by: Long, Lin, et al.
Published: (2025)

Beyond Words: Multimodal LLM Knows When to Speak
by: Liao, Zikai, et al.
Published: (2025)

Interactive Humanoid: Online Full-Body Motion Reaction Synthesis with Social Affordance Canonicalization and Forecasting
by: Liu, Yunze, et al.
Published: (2023)

CASHG: Context-Aware Stylized Online Handwriting Generation
by: Shin, Jinsu, et al.
Published: (2026)

MEgoHand: Multimodal Egocentric Hand-Object Interaction Motion Generation
by: Zhou, Bohan, et al.
Published: (2025)

GGPT: Geometry Grounded Point Transformer
by: Chen, Yutong, et al.
Published: (2026)

CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation
by: Liu, Xi, et al.
Published: (2024)

Defending Text-to-image Diffusion Models: Surprising Efficacy of Textual Perturbations Against Backdoor Attacks
by: Chew, Oscar, et al.
Published: (2024)

PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement
by: Hu, Teng, et al.
Published: (2025)

Gaze-Guided 3D Hand Motion Prediction for Detecting Intent in Egocentric Grasping Tasks
by: He, Yufei, et al.
Published: (2025)

Multimodal Information Interaction for Medical Image Segmentation
by: Fan, Xinxin, et al.
Published: (2024)

Poly-Autoregressive Prediction for Modeling Interactions
by: Thakkar, Neerja, et al.
Published: (2025)

ReactFace: Online Multiple Appropriate Facial Reaction Generation in Dyadic Interactions
by: Luo, Cheng, et al.
Published: (2023)

VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing
by: Wang, Ke, et al.
Published: (2025)

Speak While Watching: Unleashing TRUE Real-Time Video Understanding Capability of Multimodal Large Language Models
by: Lin, Junyan, et al.
Published: (2026)

HunyuanVideo-HOMA: Generic Human-Object Interaction in Multimodal Driven Human Animation
by: Huang, Ziyao, et al.
Published: (2025)

MIRe: Enhancing Multimodal Queries Representation via Fusion-Free Modality Interaction for Multimodal Retrieval
by: Ju, Yeong-Joon, et al.
Published: (2024)

ViSpeak: Visual Instruction Feedback in Streaming Videos
by: Fu, Shenghao, et al.
Published: (2025)

GDPO-Listener: Expressive Interactive Head Generation via Auto-Regressive Flow Matching and Group reward-Decoupled Policy Optimization
by: Jin, Zhangyu, et al.
Published: (2026)

PointCubeNet: 3D Part-level Reasoning with 3x3x3 Point Cloud Blocks
by: Kim, Da-Yeong, et al.
Published: (2025)

DANCE: Density-agnostic and Class-aware Network for Point Cloud Completion
by: Kim, Da-Yeong, et al.
Published: (2025)

OmniResponse: Online Multimodal Conversational Response Generation in Dyadic Interactions
by: Luo, Cheng, et al.
Published: (2025)

ParaUni: Enhance Generation in Unified Multimodal Model with Reinforcement-driven Hierarchical Parallel Information Interaction
by: Tan, Jiangtong, et al.
Published: (2025)

Speak, Segment, Track, Navigate: An Interactive System for Video-Guided Skull-Base Surgery
by: Mao, Jecia Z. Y., et al.
Published: (2026)

Decoupling Layout from Glyph in Online Chinese Handwriting Generation
by: Ren, Min-Si, et al.
Published: (2024)

Let ViT Speak: Generative Language-Image Pre-training
by: Fang, Yan, et al.
Published: (2026)