:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Rojas-Galeano, Sergio
Format:	Preprint
Published:	2025
Subjects:	Human-Computer Interaction Artificial Intelligence Computation and Language Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2510.21715
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Beyond Acoustic Emotion Recognition: Multimodal Pathos Analysis in Political Speech Using LLM-Based and Acoustic Emotion Models
by: Dietrich, Juergen
Published: (2026)

Beamforming-LLM: What, Where and When Did I Miss?
by: Choudhari, Vishal
Published: (2025)

Reducing the Offline-Streaming Gap for Unified ASR Transducer with Consistency Regularization
by: Andrusenko, Andrei, et al.
Published: (2026)

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
by: Chen, Qian, et al.
Published: (2025)

Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction
by: Huang, Ailin, et al.
Published: (2025)

Step-Audio-EditX Technical Report
by: Yan, Chao, et al.
Published: (2025)

AAD-LLM: Neural Attention-Driven Auditory Scene Understanding
by: Jiang, Xilin, et al.
Published: (2025)

Cross-Lingual Speech Emotion Recognition: Humans vs. Self-Supervised Models
by: Han, Zhichen, et al.
Published: (2024)

Language Model Can Listen While Speaking
by: Ma, Ziyang, et al.
Published: (2024)

EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control
by: Chen, Haozhe, et al.
Published: (2024)

Open-Source Conversational AI with SpeechBrain 1.0
by: Ravanelli, Mirco, et al.
Published: (2024)

MIST: Multimodal Interactive Speech-based Tool-calling Conversational Assistants for Smart Homes
by: Chen, Maximillian, et al.
Published: (2026)

Escaping the BLEU Trap: A Signal-Grounded Framework with Decoupled Semantic Guidance for EEG-to-Text Decoding
by: Wang, Yuchen, et al.
Published: (2026)

Toward a Realistic Encoding Model of Auditory Affective Understanding in the Brain
by: Pan, Guandong, et al.
Published: (2025)

I Hear, Therefore I Trust: A Socio-Technical Investigation of Humans as Synthetic Speech Detectors
by: Erscoi, Lelia, et al.
Published: (2026)

InsightPulse: An IoT-based System for User Experience Interview Analysis
by: Lyu, Dian, et al.
Published: (2024)

Super Kawaii Vocalics: Amplifying the "Cute" Factor in Computer Voice
by: Mandai, Yuto, et al.
Published: (2025)

Inter(sectional) Alia(s): Ambiguity in Voice Agent Identity via Intersectional Japanese Self-Referents
by: Fujii, Takao, et al.
Published: (2025)

Qualitative Approaches to Voice UX
by: Seaborn, Katie, et al.
Published: (2024)

A Penny for Your Thoughts: Decoding Speech from Inexpensive Brain Signals
by: Auster, Quentin, et al.
Published: (2025)

Call2Instruct: Automated Pipeline for Generating Q&A Datasets from Call Center Recordings for LLM Fine-Tuning
by: Echeverria, Alex, et al.
Published: (2025)

Clip-TTS: Contrastive Text-content and Mel-spectrogram, A High-Quality Text-to-Speech Method based on Contextual Semantic Understanding
by: Liu, Tianyun
Published: (2025)

Audio Turing Test: Benchmarking the Human-likeness of Large Language Model-based Text-to-Speech Systems in Chinese
by: Wang, Xihuai, et al.
Published: (2025)

EmoHeal: An End-to-End System for Personalized Therapeutic Music Retrieval from Fine-grained Emotions
by: Wan, Xinchen, et al.
Published: (2025)

Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
by: Xie, Zhifei, et al.
Published: (2024)

DeformTune: A Deformable XAI Music Prototype for Non-Musicians
by: Xu, Ziqing, et al.
Published: (2025)

CabinSep: IR-Augmented Mask-Based MVDR for Real-Time In-Car Speech Separation with Distributed Heterogeneous Arrays
by: Han, Runduo, et al.
Published: (2025)

Exploring Situated Stabilities of a Rhythm Generation System through Variational Cross-Examination
by: Kotowski, Błażej, et al.
Published: (2025)

EvolveCaptions: Empowering DHH Users Through Real-Time Collaborative Captioning
by: Wu, Liang-Yuan, et al.
Published: (2025)

Reimagining Dance: Real-time Music Co-creation between Dancers and AI
by: Vechtomova, Olga, et al.
Published: (2025)

MCP2OSC: Parametric Control by Natural Language
by: Fan, Yuan-Yi
Published: (2025)

Human Perception of Audio Deepfakes
by: Müller, Nicolas M., et al.
Published: (2021)

SynthScribe: Deep Multimodal Tools for Synthesizer Sound Retrieval and Exploration
by: Brade, Stephen, et al.
Published: (2023)

Revisiting Your Memory: Reconstruction of Affect-Contextualized Memory via EEG-guided Audiovisual Generation
by: Kwon, Joonwoo, et al.
Published: (2024)

VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching
by: Guo, Yiwei, et al.
Published: (2023)

LSTM-CNN Network for Audio Signature Analysis in Noisy Environments
by: Damacharla, Praveen, et al.
Published: (2023)

A Cross-Modal Approach to Silent Speech with LLM-Enhanced Recognition
by: Benster, Tyler, et al.
Published: (2024)

STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion Recognition
by: Chang, Yi, et al.
Published: (2024)

GMM-ResNext: Combining Generative and Discriminative Models for Speaker Verification
by: Yan, Hui, et al.
Published: (2024)

A Theory-Based Explainable Deep Learning Architecture for Music Emotion
by: Fong, Hortense, et al.
Published: (2024)