:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Tanke, Julian, Shibuya, Takashi, Uchida, Kengo, Saito, Koichi, Mitsufuji, Yuki
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2505.09827
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training
by: Uchida, Kengo, et al.
Published: (2024)

TalkCuts: A Large-Scale Dataset for Multi-Shot Human Speech Video Generation
by: Chen, Jiaben, et al.
Published: (2025)

HumanGif: Single-View Human Diffusion with Generative Prior
by: Hu, Shoukang, et al.
Published: (2025)

Step-by-Step Video-to-Audio Synthesis via Negative Audio Guidance
by: Hayakawa, Akio, et al.
Published: (2025)

Efficiency without Compromise: CLIP-aided Text-to-Image GANs with Increased Diversity
by: Kobayashi, Yuya, et al.
Published: (2025)

Beyond Talking -- Generating Holistic 3D Human Dyadic Motion for Communication
by: Sun, Mingze, et al.
Published: (2024)

Efficient Listener: Dyadic Facial Motion Synthesis via Action Diffusion
by: Wang, Zesheng, et al.
Published: (2025)

MMDisCo: Multi-Modal Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation
by: Hayakawa, Akio, et al.
Published: (2024)

Facial Expression Generation Aligned with Human Preference for Natural Dyadic Interaction
by: Chen, Xu, et al.
Published: (2026)

Seamless Interaction: Dyadic Audiovisual Motion Modeling and Large-Scale Dataset
by: Agrawal, Vasu, et al.
Published: (2025)

Massively Multi-Person 3D Human Motion Forecasting with Scene Context
by: Mueller, Felix B, et al.
Published: (2024)

Dyadic Interaction Modeling for Social Behavior Generation
by: Tran, Minh, et al.
Published: (2024)

TITAN-Guide: Taming Inference-Time AligNment for Guided Text-to-Video Diffusion Models
by: Simon, Christian, et al.
Published: (2025)

AutoRefiner: Improving Autoregressive Video Diffusion Models via Reflective Refinement Over the Stochastic Sampling Path
by: Yu, Zhengyang, et al.
Published: (2025)

TraSCE: Trajectory Steering for Concept Erasure
by: Jain, Anubhav, et al.
Published: (2024)

Classifier-Free Guidance inside the Attraction Basin May Cause Memorization
by: Jain, Anubhav, et al.
Published: (2024)

MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
by: Cheng, Ho Kei, et al.
Published: (2024)

Forging and Removing Latent-Noise Diffusion Watermarks Using a Single Image
by: Jain, Anubhav, et al.
Published: (2025)

HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes
by: Takida, Yuhta, et al.
Published: (2023)

Echoes Over Time: Unlocking Length Generalization in Video-to-Audio Generation Models
by: Simon, Christian, et al.
Published: (2026)

CCStereo: Audio-Visual Contextual and Contrastive Learning for Binaural Audio Generation
by: Chen, Yuanhong, et al.
Published: (2025)

EgoAnimate: Generating Human Animations from Egocentric top-down Views
by: Türkoglu, G. Kutay, et al.
Published: (2025)

Inter-Stance: A Dyadic Multimodal Corpus for Conversational Stance Analysis
by: Zhang, Xiang, et al.
Published: (2026)

INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations
by: Zhu, Yongming, et al.
Published: (2024)

GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping
by: Seo, Junyoung, et al.
Published: (2024)

InterDyad: Interactive Dyadic Speech-to-Video Generation by Querying Intermediate Visual Guidance
by: Pan, Dongwei, et al.
Published: (2026)

StereoSync: Spatially-Aware Stereo Audio Generation from Video
by: Marinoni, Christian, et al.
Published: (2025)

Interpersonal Relationship Analysis with Dyadic EEG Signals via Learning Spatial-Temporal Patterns
by: Ji, Wenqi, et al.
Published: (2024)

DyaDiT: A Multi-Modal Diffusion Transformer for Socially Favorable Dyadic Gesture Generation
by: Peng, Yichen, et al.
Published: (2026)

Schrodinger Audio-Visual Editor: Object-Level Audiovisual Removal
by: Xu, Weihan, et al.
Published: (2025)

Social Agent: Mastering Dyadic Nonverbal Behavior Generation via Conversational LLM Agents
by: Zhang, Zeyi, et al.
Published: (2025)

Motion Mamba: Efficient and Long Sequence Motion Generation
by: Zhang, Zeyu, et al.
Published: (2024)

DeepFake Detection in Dyadic Video Calls using Point of Gaze Tracking
by: Kohler, Odin, et al.
Published: (2025)

Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
by: Yang, Shiqi, et al.
Published: (2024)

OmniResponse: Online Multimodal Conversational Response Generation in Dyadic Interactions
by: Luo, Cheng, et al.
Published: (2025)

DyStream: Streaming Dyadic Talking Heads Generation via Flow Matching-based Autoregressive Model
by: Chen, Bohong, et al.
Published: (2025)

ReactFace: Online Multiple Appropriate Facial Reaction Generation in Dyadic Interactions
by: Luo, Cheng, et al.
Published: (2023)

AdvMT: Adversarial Motion Transformer for Long-term Human Motion Prediction
by: Idrees, Sarmad, et al.
Published: (2024)

Stereo Sound Event Localization and Detection with Onscreen/offscreen Classification
by: Shimada, Kazuki, et al.
Published: (2025)

Read the Room: Inferring Social Context Through Dyadic Interaction Recognition in Cyber-physical-social Infrastructure Systems
by: Lin, Cheyu, et al.
Published: (2025)