Saved in:
| Main Authors: | Erscoi, Lelia, Kinnunen, Tomi |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.28064 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Beamforming-LLM: What, Where and When Did I Miss?
by: Choudhari, Vishal
Published: (2025)
by: Choudhari, Vishal
Published: (2025)
A Cross-Modal Approach to Silent Speech with LLM-Enhanced Recognition
by: Benster, Tyler, et al.
Published: (2024)
by: Benster, Tyler, et al.
Published: (2024)
STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion Recognition
by: Chang, Yi, et al.
Published: (2024)
by: Chang, Yi, et al.
Published: (2024)
VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching
by: Guo, Yiwei, et al.
Published: (2023)
by: Guo, Yiwei, et al.
Published: (2023)
Cross-Lingual Speech Emotion Recognition: Humans vs. Self-Supervised Models
by: Han, Zhichen, et al.
Published: (2024)
by: Han, Zhichen, et al.
Published: (2024)
Human Perception of Audio Deepfakes
by: Müller, Nicolas M., et al.
Published: (2021)
by: Müller, Nicolas M., et al.
Published: (2021)
CabinSep: IR-Augmented Mask-Based MVDR for Real-Time In-Car Speech Separation with Distributed Heterogeneous Arrays
by: Han, Runduo, et al.
Published: (2025)
by: Han, Runduo, et al.
Published: (2025)
Step-Audio-EditX Technical Report
by: Yan, Chao, et al.
Published: (2025)
by: Yan, Chao, et al.
Published: (2025)
Between the AI and Me: Analysing Listeners' Perspectives on AI- and Human-Composed Progressive Metal Music
by: Sarmento, Pedro, et al.
Published: (2024)
by: Sarmento, Pedro, et al.
Published: (2024)
InsightPulse: An IoT-based System for User Experience Interview Analysis
by: Lyu, Dian, et al.
Published: (2024)
by: Lyu, Dian, et al.
Published: (2024)
Toward a Realistic Encoding Model of Auditory Affective Understanding in the Brain
by: Pan, Guandong, et al.
Published: (2025)
by: Pan, Guandong, et al.
Published: (2025)
PersonaCite: VoC-Grounded Interviewable Agentic Synthetic AI Personas for Verifiable User and Design Research
by: Truss, Mario
Published: (2026)
by: Truss, Mario
Published: (2026)
Zero-Shot KWS for Children's Speech using Layer-Wise Features from SSL Models
by: Kutum, Subham, et al.
Published: (2025)
by: Kutum, Subham, et al.
Published: (2025)
Freetalker: Controllable Speech and Text-Driven Gesture Generation Based on Diffusion Models for Enhanced Speaker Naturalness
by: Yang, Sicheng, et al.
Published: (2024)
by: Yang, Sicheng, et al.
Published: (2024)
Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction
by: Huang, Ailin, et al.
Published: (2025)
by: Huang, Ailin, et al.
Published: (2025)
DeformTune: A Deformable XAI Music Prototype for Non-Musicians
by: Xu, Ziqing, et al.
Published: (2025)
by: Xu, Ziqing, et al.
Published: (2025)
A Theory-Based Explainable Deep Learning Architecture for Music Emotion
by: Fong, Hortense, et al.
Published: (2024)
by: Fong, Hortense, et al.
Published: (2024)
Tidal MerzA: Combining affective modelling and autonomous code generation through Reinforcement Learning
by: Wilson, Elizabeth, et al.
Published: (2024)
by: Wilson, Elizabeth, et al.
Published: (2024)
Beyond Acoustic Emotion Recognition: Multimodal Pathos Analysis in Political Speech Using LLM-Based and Acoustic Emotion Models
by: Dietrich, Juergen
Published: (2026)
by: Dietrich, Juergen
Published: (2026)
Speech-driven Personalized Gesture Synthetics: Harnessing Automatic Fuzzy Feature Inference
by: Zhang, Fan, et al.
Published: (2024)
by: Zhang, Fan, et al.
Published: (2024)
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
by: Xie, Zhifei, et al.
Published: (2024)
by: Xie, Zhifei, et al.
Published: (2024)
More-than-Human Storytelling: Designing Longitudinal Narrative Engagements with Generative AI
by: Fabre, Émilie, et al.
Published: (2025)
by: Fabre, Émilie, et al.
Published: (2025)
SynthScribe: Deep Multimodal Tools for Synthesizer Sound Retrieval and Exploration
by: Brade, Stephen, et al.
Published: (2023)
by: Brade, Stephen, et al.
Published: (2023)
Exploring Situated Stabilities of a Rhythm Generation System through Variational Cross-Examination
by: Kotowski, Błażej, et al.
Published: (2025)
by: Kotowski, Błażej, et al.
Published: (2025)
Revisiting Your Memory: Reconstruction of Affect-Contextualized Memory via EEG-guided Audiovisual Generation
by: Kwon, Joonwoo, et al.
Published: (2024)
by: Kwon, Joonwoo, et al.
Published: (2024)
EvolveCaptions: Empowering DHH Users Through Real-Time Collaborative Captioning
by: Wu, Liang-Yuan, et al.
Published: (2025)
by: Wu, Liang-Yuan, et al.
Published: (2025)
Reimagining Dance: Real-time Music Co-creation between Dancers and AI
by: Vechtomova, Olga, et al.
Published: (2025)
by: Vechtomova, Olga, et al.
Published: (2025)
LSTM-CNN Network for Audio Signature Analysis in Noisy Environments
by: Damacharla, Praveen, et al.
Published: (2023)
by: Damacharla, Praveen, et al.
Published: (2023)
MCP2OSC: Parametric Control by Natural Language
by: Fan, Yuan-Yi
Published: (2025)
by: Fan, Yuan-Yi
Published: (2025)
GMM-ResNext: Combining Generative and Discriminative Models for Speaker Verification
by: Yan, Hui, et al.
Published: (2024)
by: Yan, Hui, et al.
Published: (2024)
Interactive Melody Generation System for Enhancing the Creativity of Musicians
by: Hirawata, So, et al.
Published: (2024)
by: Hirawata, So, et al.
Published: (2024)
Tipping Points, Pulse Elasticity and Tonal Tension: An Empirical Study on What Generates Tipping Points
by: Naik, Canishk, et al.
Published: (2024)
by: Naik, Canishk, et al.
Published: (2024)
Adaptation and Optimization of Automatic Speech Recognition (ASR) for the Maritime Domain in the Field of VHF Communication
by: Nakilcioglu, Emin Cagatay, et al.
Published: (2023)
by: Nakilcioglu, Emin Cagatay, et al.
Published: (2023)
Layer-Wise Analysis of Self-Supervised Representations for Age and Gender Classification in Children's Speech
by: Sinha, Abhijit, et al.
Published: (2025)
by: Sinha, Abhijit, et al.
Published: (2025)
Emotion-Disentangled Embedding Alignment for Noise-Robust and Cross-Corpus Speech Emotion Recognition
by: Tiwari, Upasana, et al.
Published: (2025)
by: Tiwari, Upasana, et al.
Published: (2025)
DiM-Gestor: Co-Speech Gesture Generation with Adaptive Layer Normalization Mamba-2
by: Zhang, Fan, et al.
Published: (2024)
by: Zhang, Fan, et al.
Published: (2024)
Open-Source Conversational AI with SpeechBrain 1.0
by: Ravanelli, Mirco, et al.
Published: (2024)
by: Ravanelli, Mirco, et al.
Published: (2024)
Audio Turing Test: Benchmarking the Human-likeness of Large Language Model-based Text-to-Speech Systems in Chinese
by: Wang, Xihuai, et al.
Published: (2025)
by: Wang, Xihuai, et al.
Published: (2025)
Meta-Learning Approaches for Improving Detection of Unseen Speech Deepfakes
by: Kukanov, Ivan, et al.
Published: (2024)
by: Kukanov, Ivan, et al.
Published: (2024)
A Near-Real-Time Processing Ego Speech Filtering Pipeline Designed for Speech Interruption During Human-Robot Interaction
by: Li, Yue, et al.
Published: (2024)
by: Li, Yue, et al.
Published: (2024)
Similar Items
-
Beamforming-LLM: What, Where and When Did I Miss?
by: Choudhari, Vishal
Published: (2025) -
A Cross-Modal Approach to Silent Speech with LLM-Enhanced Recognition
by: Benster, Tyler, et al.
Published: (2024) -
STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion Recognition
by: Chang, Yi, et al.
Published: (2024) -
VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching
by: Guo, Yiwei, et al.
Published: (2023) -
Cross-Lingual Speech Emotion Recognition: Humans vs. Self-Supervised Models
by: Han, Zhichen, et al.
Published: (2024)