Saved in:
| Main Author: | Rojas-Galeano, Sergio |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.21715 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Beyond Acoustic Emotion Recognition: Multimodal Pathos Analysis in Political Speech Using LLM-Based and Acoustic Emotion Models
by: Dietrich, Juergen
Published: (2026)
by: Dietrich, Juergen
Published: (2026)
Beamforming-LLM: What, Where and When Did I Miss?
by: Choudhari, Vishal
Published: (2025)
by: Choudhari, Vishal
Published: (2025)
Reducing the Offline-Streaming Gap for Unified ASR Transducer with Consistency Regularization
by: Andrusenko, Andrei, et al.
Published: (2026)
by: Andrusenko, Andrei, et al.
Published: (2026)
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
by: Chen, Qian, et al.
Published: (2025)
by: Chen, Qian, et al.
Published: (2025)
Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction
by: Huang, Ailin, et al.
Published: (2025)
by: Huang, Ailin, et al.
Published: (2025)
Step-Audio-EditX Technical Report
by: Yan, Chao, et al.
Published: (2025)
by: Yan, Chao, et al.
Published: (2025)
AAD-LLM: Neural Attention-Driven Auditory Scene Understanding
by: Jiang, Xilin, et al.
Published: (2025)
by: Jiang, Xilin, et al.
Published: (2025)
Cross-Lingual Speech Emotion Recognition: Humans vs. Self-Supervised Models
by: Han, Zhichen, et al.
Published: (2024)
by: Han, Zhichen, et al.
Published: (2024)
Language Model Can Listen While Speaking
by: Ma, Ziyang, et al.
Published: (2024)
by: Ma, Ziyang, et al.
Published: (2024)
EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control
by: Chen, Haozhe, et al.
Published: (2024)
by: Chen, Haozhe, et al.
Published: (2024)
Open-Source Conversational AI with SpeechBrain 1.0
by: Ravanelli, Mirco, et al.
Published: (2024)
by: Ravanelli, Mirco, et al.
Published: (2024)
MIST: Multimodal Interactive Speech-based Tool-calling Conversational Assistants for Smart Homes
by: Chen, Maximillian, et al.
Published: (2026)
by: Chen, Maximillian, et al.
Published: (2026)
Escaping the BLEU Trap: A Signal-Grounded Framework with Decoupled Semantic Guidance for EEG-to-Text Decoding
by: Wang, Yuchen, et al.
Published: (2026)
by: Wang, Yuchen, et al.
Published: (2026)
Toward a Realistic Encoding Model of Auditory Affective Understanding in the Brain
by: Pan, Guandong, et al.
Published: (2025)
by: Pan, Guandong, et al.
Published: (2025)
I Hear, Therefore I Trust: A Socio-Technical Investigation of Humans as Synthetic Speech Detectors
by: Erscoi, Lelia, et al.
Published: (2026)
by: Erscoi, Lelia, et al.
Published: (2026)
InsightPulse: An IoT-based System for User Experience Interview Analysis
by: Lyu, Dian, et al.
Published: (2024)
by: Lyu, Dian, et al.
Published: (2024)
Super Kawaii Vocalics: Amplifying the "Cute" Factor in Computer Voice
by: Mandai, Yuto, et al.
Published: (2025)
by: Mandai, Yuto, et al.
Published: (2025)
Inter(sectional) Alia(s): Ambiguity in Voice Agent Identity via Intersectional Japanese Self-Referents
by: Fujii, Takao, et al.
Published: (2025)
by: Fujii, Takao, et al.
Published: (2025)
Qualitative Approaches to Voice UX
by: Seaborn, Katie, et al.
Published: (2024)
by: Seaborn, Katie, et al.
Published: (2024)
A Penny for Your Thoughts: Decoding Speech from Inexpensive Brain Signals
by: Auster, Quentin, et al.
Published: (2025)
by: Auster, Quentin, et al.
Published: (2025)
Call2Instruct: Automated Pipeline for Generating Q&A Datasets from Call Center Recordings for LLM Fine-Tuning
by: Echeverria, Alex, et al.
Published: (2025)
by: Echeverria, Alex, et al.
Published: (2025)
Clip-TTS: Contrastive Text-content and Mel-spectrogram, A High-Quality Text-to-Speech Method based on Contextual Semantic Understanding
by: Liu, Tianyun
Published: (2025)
by: Liu, Tianyun
Published: (2025)
Audio Turing Test: Benchmarking the Human-likeness of Large Language Model-based Text-to-Speech Systems in Chinese
by: Wang, Xihuai, et al.
Published: (2025)
by: Wang, Xihuai, et al.
Published: (2025)
EmoHeal: An End-to-End System for Personalized Therapeutic Music Retrieval from Fine-grained Emotions
by: Wan, Xinchen, et al.
Published: (2025)
by: Wan, Xinchen, et al.
Published: (2025)
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
by: Xie, Zhifei, et al.
Published: (2024)
by: Xie, Zhifei, et al.
Published: (2024)
DeformTune: A Deformable XAI Music Prototype for Non-Musicians
by: Xu, Ziqing, et al.
Published: (2025)
by: Xu, Ziqing, et al.
Published: (2025)
CabinSep: IR-Augmented Mask-Based MVDR for Real-Time In-Car Speech Separation with Distributed Heterogeneous Arrays
by: Han, Runduo, et al.
Published: (2025)
by: Han, Runduo, et al.
Published: (2025)
Exploring Situated Stabilities of a Rhythm Generation System through Variational Cross-Examination
by: Kotowski, Błażej, et al.
Published: (2025)
by: Kotowski, Błażej, et al.
Published: (2025)
EvolveCaptions: Empowering DHH Users Through Real-Time Collaborative Captioning
by: Wu, Liang-Yuan, et al.
Published: (2025)
by: Wu, Liang-Yuan, et al.
Published: (2025)
Reimagining Dance: Real-time Music Co-creation between Dancers and AI
by: Vechtomova, Olga, et al.
Published: (2025)
by: Vechtomova, Olga, et al.
Published: (2025)
MCP2OSC: Parametric Control by Natural Language
by: Fan, Yuan-Yi
Published: (2025)
by: Fan, Yuan-Yi
Published: (2025)
Human Perception of Audio Deepfakes
by: Müller, Nicolas M., et al.
Published: (2021)
by: Müller, Nicolas M., et al.
Published: (2021)
SynthScribe: Deep Multimodal Tools for Synthesizer Sound Retrieval and Exploration
by: Brade, Stephen, et al.
Published: (2023)
by: Brade, Stephen, et al.
Published: (2023)
Revisiting Your Memory: Reconstruction of Affect-Contextualized Memory via EEG-guided Audiovisual Generation
by: Kwon, Joonwoo, et al.
Published: (2024)
by: Kwon, Joonwoo, et al.
Published: (2024)
VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching
by: Guo, Yiwei, et al.
Published: (2023)
by: Guo, Yiwei, et al.
Published: (2023)
LSTM-CNN Network for Audio Signature Analysis in Noisy Environments
by: Damacharla, Praveen, et al.
Published: (2023)
by: Damacharla, Praveen, et al.
Published: (2023)
A Cross-Modal Approach to Silent Speech with LLM-Enhanced Recognition
by: Benster, Tyler, et al.
Published: (2024)
by: Benster, Tyler, et al.
Published: (2024)
STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion Recognition
by: Chang, Yi, et al.
Published: (2024)
by: Chang, Yi, et al.
Published: (2024)
GMM-ResNext: Combining Generative and Discriminative Models for Speaker Verification
by: Yan, Hui, et al.
Published: (2024)
by: Yan, Hui, et al.
Published: (2024)
A Theory-Based Explainable Deep Learning Architecture for Music Emotion
by: Fong, Hortense, et al.
Published: (2024)
by: Fong, Hortense, et al.
Published: (2024)
Similar Items
-
Beyond Acoustic Emotion Recognition: Multimodal Pathos Analysis in Political Speech Using LLM-Based and Acoustic Emotion Models
by: Dietrich, Juergen
Published: (2026) -
Beamforming-LLM: What, Where and When Did I Miss?
by: Choudhari, Vishal
Published: (2025) -
Reducing the Offline-Streaming Gap for Unified ASR Transducer with Consistency Regularization
by: Andrusenko, Andrei, et al.
Published: (2026) -
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
by: Chen, Qian, et al.
Published: (2025) -
Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction
by: Huang, Ailin, et al.
Published: (2025)