Saved in:
| Main Authors: | Tanke, Julian, Shibuya, Takashi, Uchida, Kengo, Saito, Koichi, Mitsufuji, Yuki |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.09827 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training
by: Uchida, Kengo, et al.
Published: (2024)
by: Uchida, Kengo, et al.
Published: (2024)
TalkCuts: A Large-Scale Dataset for Multi-Shot Human Speech Video Generation
by: Chen, Jiaben, et al.
Published: (2025)
by: Chen, Jiaben, et al.
Published: (2025)
HumanGif: Single-View Human Diffusion with Generative Prior
by: Hu, Shoukang, et al.
Published: (2025)
by: Hu, Shoukang, et al.
Published: (2025)
Step-by-Step Video-to-Audio Synthesis via Negative Audio Guidance
by: Hayakawa, Akio, et al.
Published: (2025)
by: Hayakawa, Akio, et al.
Published: (2025)
Efficiency without Compromise: CLIP-aided Text-to-Image GANs with Increased Diversity
by: Kobayashi, Yuya, et al.
Published: (2025)
by: Kobayashi, Yuya, et al.
Published: (2025)
Beyond Talking -- Generating Holistic 3D Human Dyadic Motion for Communication
by: Sun, Mingze, et al.
Published: (2024)
by: Sun, Mingze, et al.
Published: (2024)
Efficient Listener: Dyadic Facial Motion Synthesis via Action Diffusion
by: Wang, Zesheng, et al.
Published: (2025)
by: Wang, Zesheng, et al.
Published: (2025)
MMDisCo: Multi-Modal Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation
by: Hayakawa, Akio, et al.
Published: (2024)
by: Hayakawa, Akio, et al.
Published: (2024)
Facial Expression Generation Aligned with Human Preference for Natural Dyadic Interaction
by: Chen, Xu, et al.
Published: (2026)
by: Chen, Xu, et al.
Published: (2026)
Seamless Interaction: Dyadic Audiovisual Motion Modeling and Large-Scale Dataset
by: Agrawal, Vasu, et al.
Published: (2025)
by: Agrawal, Vasu, et al.
Published: (2025)
Massively Multi-Person 3D Human Motion Forecasting with Scene Context
by: Mueller, Felix B, et al.
Published: (2024)
by: Mueller, Felix B, et al.
Published: (2024)
Dyadic Interaction Modeling for Social Behavior Generation
by: Tran, Minh, et al.
Published: (2024)
by: Tran, Minh, et al.
Published: (2024)
TITAN-Guide: Taming Inference-Time AligNment for Guided Text-to-Video Diffusion Models
by: Simon, Christian, et al.
Published: (2025)
by: Simon, Christian, et al.
Published: (2025)
AutoRefiner: Improving Autoregressive Video Diffusion Models via Reflective Refinement Over the Stochastic Sampling Path
by: Yu, Zhengyang, et al.
Published: (2025)
by: Yu, Zhengyang, et al.
Published: (2025)
TraSCE: Trajectory Steering for Concept Erasure
by: Jain, Anubhav, et al.
Published: (2024)
by: Jain, Anubhav, et al.
Published: (2024)
Classifier-Free Guidance inside the Attraction Basin May Cause Memorization
by: Jain, Anubhav, et al.
Published: (2024)
by: Jain, Anubhav, et al.
Published: (2024)
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
by: Cheng, Ho Kei, et al.
Published: (2024)
by: Cheng, Ho Kei, et al.
Published: (2024)
Forging and Removing Latent-Noise Diffusion Watermarks Using a Single Image
by: Jain, Anubhav, et al.
Published: (2025)
by: Jain, Anubhav, et al.
Published: (2025)
HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes
by: Takida, Yuhta, et al.
Published: (2023)
by: Takida, Yuhta, et al.
Published: (2023)
Echoes Over Time: Unlocking Length Generalization in Video-to-Audio Generation Models
by: Simon, Christian, et al.
Published: (2026)
by: Simon, Christian, et al.
Published: (2026)
CCStereo: Audio-Visual Contextual and Contrastive Learning for Binaural Audio Generation
by: Chen, Yuanhong, et al.
Published: (2025)
by: Chen, Yuanhong, et al.
Published: (2025)
EgoAnimate: Generating Human Animations from Egocentric top-down Views
by: Türkoglu, G. Kutay, et al.
Published: (2025)
by: Türkoglu, G. Kutay, et al.
Published: (2025)
Inter-Stance: A Dyadic Multimodal Corpus for Conversational Stance Analysis
by: Zhang, Xiang, et al.
Published: (2026)
by: Zhang, Xiang, et al.
Published: (2026)
INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations
by: Zhu, Yongming, et al.
Published: (2024)
by: Zhu, Yongming, et al.
Published: (2024)
GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping
by: Seo, Junyoung, et al.
Published: (2024)
by: Seo, Junyoung, et al.
Published: (2024)
InterDyad: Interactive Dyadic Speech-to-Video Generation by Querying Intermediate Visual Guidance
by: Pan, Dongwei, et al.
Published: (2026)
by: Pan, Dongwei, et al.
Published: (2026)
StereoSync: Spatially-Aware Stereo Audio Generation from Video
by: Marinoni, Christian, et al.
Published: (2025)
by: Marinoni, Christian, et al.
Published: (2025)
Interpersonal Relationship Analysis with Dyadic EEG Signals via Learning Spatial-Temporal Patterns
by: Ji, Wenqi, et al.
Published: (2024)
by: Ji, Wenqi, et al.
Published: (2024)
DyaDiT: A Multi-Modal Diffusion Transformer for Socially Favorable Dyadic Gesture Generation
by: Peng, Yichen, et al.
Published: (2026)
by: Peng, Yichen, et al.
Published: (2026)
Schrodinger Audio-Visual Editor: Object-Level Audiovisual Removal
by: Xu, Weihan, et al.
Published: (2025)
by: Xu, Weihan, et al.
Published: (2025)
Social Agent: Mastering Dyadic Nonverbal Behavior Generation via Conversational LLM Agents
by: Zhang, Zeyi, et al.
Published: (2025)
by: Zhang, Zeyi, et al.
Published: (2025)
Motion Mamba: Efficient and Long Sequence Motion Generation
by: Zhang, Zeyu, et al.
Published: (2024)
by: Zhang, Zeyu, et al.
Published: (2024)
DeepFake Detection in Dyadic Video Calls using Point of Gaze Tracking
by: Kohler, Odin, et al.
Published: (2025)
by: Kohler, Odin, et al.
Published: (2025)
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
by: Yang, Shiqi, et al.
Published: (2024)
by: Yang, Shiqi, et al.
Published: (2024)
OmniResponse: Online Multimodal Conversational Response Generation in Dyadic Interactions
by: Luo, Cheng, et al.
Published: (2025)
by: Luo, Cheng, et al.
Published: (2025)
DyStream: Streaming Dyadic Talking Heads Generation via Flow Matching-based Autoregressive Model
by: Chen, Bohong, et al.
Published: (2025)
by: Chen, Bohong, et al.
Published: (2025)
ReactFace: Online Multiple Appropriate Facial Reaction Generation in Dyadic Interactions
by: Luo, Cheng, et al.
Published: (2023)
by: Luo, Cheng, et al.
Published: (2023)
AdvMT: Adversarial Motion Transformer for Long-term Human Motion Prediction
by: Idrees, Sarmad, et al.
Published: (2024)
by: Idrees, Sarmad, et al.
Published: (2024)
Stereo Sound Event Localization and Detection with Onscreen/offscreen Classification
by: Shimada, Kazuki, et al.
Published: (2025)
by: Shimada, Kazuki, et al.
Published: (2025)
Read the Room: Inferring Social Context Through Dyadic Interaction Recognition in Cyber-physical-social Infrastructure Systems
by: Lin, Cheyu, et al.
Published: (2025)
by: Lin, Cheyu, et al.
Published: (2025)
Similar Items
-
MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training
by: Uchida, Kengo, et al.
Published: (2024) -
TalkCuts: A Large-Scale Dataset for Multi-Shot Human Speech Video Generation
by: Chen, Jiaben, et al.
Published: (2025) -
HumanGif: Single-View Human Diffusion with Generative Prior
by: Hu, Shoukang, et al.
Published: (2025) -
Step-by-Step Video-to-Audio Synthesis via Negative Audio Guidance
by: Hayakawa, Akio, et al.
Published: (2025) -
Efficiency without Compromise: CLIP-aided Text-to-Image GANs with Increased Diversity
by: Kobayashi, Yuya, et al.
Published: (2025)