:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Nasr, Seham, Ren, Zhao, Johnson, David
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Machine Learning Artificial Intelligence Sound
Online-Zugang:	https://arxiv.org/abs/2511.11691
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Modeling speech emotion with label variance and analyzing performance across speakers and unseen acoustic conditions
von: Mitra, Vikramjit, et al.
Veröffentlicht: (2025)

Fusion approaches for emotion recognition from speech using acoustic and text-based features
von: Pepino, Leonardo, et al.
Veröffentlicht: (2024)

Towards measuring fairness in speech recognition: Fair-Speech dataset
von: Veliche, Irina-Elena, et al.
Veröffentlicht: (2024)

Heterogeneous bimodal attention fusion for speech emotion recognition
von: Luo, Jiachen, et al.
Veröffentlicht: (2025)

learning discriminative features from spectrograms using center loss for speech emotion recognition
von: Dai, Dongyang, et al.
Veröffentlicht: (2025)

A vector quantized masked autoencoder for audiovisual speech emotion recognition
von: Sadok, Samir, et al.
Veröffentlicht: (2023)

Multi-channel multi-speaker transformer for speech recognition
von: Yifan, Guo, et al.
Veröffentlicht: (2026)

Throat and acoustic paired speech dataset for deep learning-based speech enhancement
von: Kim, Yunsik, et al.
Veröffentlicht: (2025)

Structured-Noise Masked Modeling for Video, Audio and Beyond
von: Bhowmik, Aritra, et al.
Veröffentlicht: (2025)

Beyond Fixed Frames: Dynamic Character-Aligned Speech Tokenization
von: Della Libera, Luca, et al.
Veröffentlicht: (2026)

Keyword spotting using convolutional neural network for speech recognition in Hindi
von: Bharti, Saru, et al.
Veröffentlicht: (2026)

Introduction to speech recognition
von: Dauphin, Gabriel
Veröffentlicht: (2024)

Adversarial multi-task underwater acoustic target recognition: towards robustness against various influential factors
von: Xie, Yuan, et al.
Veröffentlicht: (2024)

Switchable deep beamformer for high-quality and real-time passive acoustic mapping
von: Zeng, Yi, et al.
Veröffentlicht: (2024)

Developing multilingual speech synthesis system for Ojibwe, Mi'kmaq, and Maliseet
von: Wang, Shenran, et al.
Veröffentlicht: (2025)

Moshi: a speech-text foundation model for real-time dialogue
von: Défossez, Alexandre, et al.
Veröffentlicht: (2024)

AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection
von: Gong, Rong, et al.
Veröffentlicht: (2024)

Enhancing CTC-based speech recognition with diverse modeling units
von: Han, Shiyi, et al.
Veröffentlicht: (2024)

Differential privacy enables fair and accurate AI-based analysis of speech disorders while protecting patient data
von: Arasteh, Soroosh Tayebi, et al.
Veröffentlicht: (2024)

Futga: Towards Fine-grained Music Understanding through Temporally-enhanced Generative Augmentation
von: Wu, Junda, et al.
Veröffentlicht: (2024)

Zipformer: A faster and better encoder for automatic speech recognition
von: Yao, Zengwei, et al.
Veröffentlicht: (2023)

CR-CTC: Consistency regularization on CTC for improved speech recognition
von: Yao, Zengwei, et al.
Veröffentlicht: (2024)

Robustifying automatic speech recognition by extracting slowly varying features
von: Pizarro, Matías, et al.
Veröffentlicht: (2021)

Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
von: Maiti, Soumi, et al.
Veröffentlicht: (2023)

Versatile audio-visual learning for emotion recognition
von: Goncalves, Lucas, et al.
Veröffentlicht: (2023)

Preference-Based Learning in Audio Applications: A Systematic Analysis
von: Broukhim, Aaron, et al.
Veröffentlicht: (2025)

Model Merging Improves Zero-Shot Generalization in Bioacoustic Foundation Models
von: Marincione, Davide, et al.
Veröffentlicht: (2025)

Aria-MIDI: A Dataset of Piano MIDI Files for Symbolic Music Modeling
von: Bradshaw, Louis, et al.
Veröffentlicht: (2025)

Flowing Straighter with Conditional Flow Matching for Accurate Speech Enhancement
von: Cross, Mattias, et al.
Veröffentlicht: (2025)

AudioCodecBench: A Comprehensive Benchmark for Audio Codec Evaluation
von: Wang, Lu, et al.
Veröffentlicht: (2025)

AUDETER: A Large-scale Dataset for Deepfake Audio Detection in Open Worlds
von: Wang, Qizhou, et al.
Veröffentlicht: (2025)

Who Will Top the Charts? Multimodal Music Popularity Prediction via Adaptive Fusion of Modality Experts and Temporal Engagement Modeling
von: Choudhary, Yash, et al.
Veröffentlicht: (2025)

Privacy-Enhancing Infant Cry Classification with Federated Transformers and Denoising Regularization
von: Owino, Geofrey, et al.
Veröffentlicht: (2025)

Hookpad Aria: A Copilot for Songwriters
von: Donahue, Chris, et al.
Veröffentlicht: (2025)

DAFMSVC: One-Shot Singing Voice Conversion with Dual Attention Mechanism and Flow Matching
von: Chen, Wei, et al.
Veröffentlicht: (2025)

QAMRO: Quality-aware Adaptive Margin Ranking Optimization for Human-aligned Assessment of Audio Generation Systems
von: Wang, Chien-Chun, et al.
Veröffentlicht: (2025)

Survey on the Evaluation of Generative Models in Music
von: Lerch, Alexander, et al.
Veröffentlicht: (2025)

Improving Underwater Acoustic Classification Through Learnable Gabor Filter Convolution and Attention Mechanisms
von: Domingos, Lucas Cesar Ferreira, et al.
Veröffentlicht: (2025)

Evaluation of Deep Audio Representations for Hearables
von: Gröger, Fabian, et al.
Veröffentlicht: (2025)

Explicit Context-Driven Neural Acoustic Modeling for High-Fidelity RIR Generation
von: Si, Chen, et al.
Veröffentlicht: (2025)