:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Erscoi, Lelia, Kinnunen, Tomi
Format:	Preprint
Published:	2026
Subjects:	Audio and Speech Processing Artificial Intelligence Human-Computer Interaction
Online Access:	https://arxiv.org/abs/2605.28064
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Beamforming-LLM: What, Where and When Did I Miss?
by: Choudhari, Vishal
Published: (2025)

A Cross-Modal Approach to Silent Speech with LLM-Enhanced Recognition
by: Benster, Tyler, et al.
Published: (2024)

STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion Recognition
by: Chang, Yi, et al.
Published: (2024)

VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching
by: Guo, Yiwei, et al.
Published: (2023)

Cross-Lingual Speech Emotion Recognition: Humans vs. Self-Supervised Models
by: Han, Zhichen, et al.
Published: (2024)

Human Perception of Audio Deepfakes
by: Müller, Nicolas M., et al.
Published: (2021)

CabinSep: IR-Augmented Mask-Based MVDR for Real-Time In-Car Speech Separation with Distributed Heterogeneous Arrays
by: Han, Runduo, et al.
Published: (2025)

Step-Audio-EditX Technical Report
by: Yan, Chao, et al.
Published: (2025)

Between the AI and Me: Analysing Listeners' Perspectives on AI- and Human-Composed Progressive Metal Music
by: Sarmento, Pedro, et al.
Published: (2024)

InsightPulse: An IoT-based System for User Experience Interview Analysis
by: Lyu, Dian, et al.
Published: (2024)

Toward a Realistic Encoding Model of Auditory Affective Understanding in the Brain
by: Pan, Guandong, et al.
Published: (2025)

PersonaCite: VoC-Grounded Interviewable Agentic Synthetic AI Personas for Verifiable User and Design Research
by: Truss, Mario
Published: (2026)

Zero-Shot KWS for Children's Speech using Layer-Wise Features from SSL Models
by: Kutum, Subham, et al.
Published: (2025)

Freetalker: Controllable Speech and Text-Driven Gesture Generation Based on Diffusion Models for Enhanced Speaker Naturalness
by: Yang, Sicheng, et al.
Published: (2024)

Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction
by: Huang, Ailin, et al.
Published: (2025)

DeformTune: A Deformable XAI Music Prototype for Non-Musicians
by: Xu, Ziqing, et al.
Published: (2025)

A Theory-Based Explainable Deep Learning Architecture for Music Emotion
by: Fong, Hortense, et al.
Published: (2024)

Tidal MerzA: Combining affective modelling and autonomous code generation through Reinforcement Learning
by: Wilson, Elizabeth, et al.
Published: (2024)

Beyond Acoustic Emotion Recognition: Multimodal Pathos Analysis in Political Speech Using LLM-Based and Acoustic Emotion Models
by: Dietrich, Juergen
Published: (2026)

Speech-driven Personalized Gesture Synthetics: Harnessing Automatic Fuzzy Feature Inference
by: Zhang, Fan, et al.
Published: (2024)

Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
by: Xie, Zhifei, et al.
Published: (2024)

More-than-Human Storytelling: Designing Longitudinal Narrative Engagements with Generative AI
by: Fabre, Émilie, et al.
Published: (2025)

SynthScribe: Deep Multimodal Tools for Synthesizer Sound Retrieval and Exploration
by: Brade, Stephen, et al.
Published: (2023)

Exploring Situated Stabilities of a Rhythm Generation System through Variational Cross-Examination
by: Kotowski, Błażej, et al.
Published: (2025)

Revisiting Your Memory: Reconstruction of Affect-Contextualized Memory via EEG-guided Audiovisual Generation
by: Kwon, Joonwoo, et al.
Published: (2024)

EvolveCaptions: Empowering DHH Users Through Real-Time Collaborative Captioning
by: Wu, Liang-Yuan, et al.
Published: (2025)

Reimagining Dance: Real-time Music Co-creation between Dancers and AI
by: Vechtomova, Olga, et al.
Published: (2025)

LSTM-CNN Network for Audio Signature Analysis in Noisy Environments
by: Damacharla, Praveen, et al.
Published: (2023)

MCP2OSC: Parametric Control by Natural Language
by: Fan, Yuan-Yi
Published: (2025)

GMM-ResNext: Combining Generative and Discriminative Models for Speaker Verification
by: Yan, Hui, et al.
Published: (2024)

Interactive Melody Generation System for Enhancing the Creativity of Musicians
by: Hirawata, So, et al.
Published: (2024)

Tipping Points, Pulse Elasticity and Tonal Tension: An Empirical Study on What Generates Tipping Points
by: Naik, Canishk, et al.
Published: (2024)

Adaptation and Optimization of Automatic Speech Recognition (ASR) for the Maritime Domain in the Field of VHF Communication
by: Nakilcioglu, Emin Cagatay, et al.
Published: (2023)

Layer-Wise Analysis of Self-Supervised Representations for Age and Gender Classification in Children's Speech
by: Sinha, Abhijit, et al.
Published: (2025)

Emotion-Disentangled Embedding Alignment for Noise-Robust and Cross-Corpus Speech Emotion Recognition
by: Tiwari, Upasana, et al.
Published: (2025)

DiM-Gestor: Co-Speech Gesture Generation with Adaptive Layer Normalization Mamba-2
by: Zhang, Fan, et al.
Published: (2024)

Open-Source Conversational AI with SpeechBrain 1.0
by: Ravanelli, Mirco, et al.
Published: (2024)

Audio Turing Test: Benchmarking the Human-likeness of Large Language Model-based Text-to-Speech Systems in Chinese
by: Wang, Xihuai, et al.
Published: (2025)

Meta-Learning Approaches for Improving Detection of Unseen Speech Deepfakes
by: Kukanov, Ivan, et al.
Published: (2024)

A Near-Real-Time Processing Ego Speech Filtering Pipeline Designed for Speech Interruption During Human-Robot Interaction
by: Li, Yue, et al.
Published: (2024)