:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Dixit, Satvik, Low, Daniel M., Elbanna, Gasser, Catania, Fabio, Ghosh, Satrajit S.
Format:	Preprint
Published:	2024
Subjects:	Sound Artificial Intelligence Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2409.09511
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Evaluating Speaker Identity Coding in Self-supervised Models and Humans
by: Elbanna, Gasser
Published: (2024)

Incorporating Talker Identity Aids With Improving Speech Recognition in Adversarial Environments
by: Alavilli, Sagarika, et al.
Published: (2024)

Predicting Heart Activity from Speech using Data-driven and Knowledge-based features
by: Elbanna, Gasser, et al.
Published: (2024)

Improving Speaker Representations Using Contrastive Losses on Multi-scale Features
by: Dixit, Satvik, et al.
Published: (2024)

MACE: Leveraging Audio for Evaluating Audio Captioning Systems
by: Dixit, Satvik, et al.
Published: (2024)

Vision Language Models Are Few-Shot Audio Spectrogram Classifiers
by: Dixit, Satvik, et al.
Published: (2024)

Learning Perceptually Relevant Temporal Envelope Morphing
by: Dixit, Satvik, et al.
Published: (2025)

Semantic-Emotional Resonance Embedding: A Semi-Supervised Paradigm for Cross-Lingual Speech Emotion Recognition
by: Zhao, Ya, et al.
Published: (2026)

Cross-Corpus Validation of Speech Emotion Recognition in Urdu using Domain-Knowledge Acoustic Features
by: Talpur, Unzela, et al.
Published: (2025)

Speech Emotion Recognition Using MFCC Features and LSTM-Based Deep Learning Model
by: Oluwademilade, Adelekun, et al.
Published: (2026)

Layer-Wise Analysis of Self-Supervised Acoustic Word Embeddings: A Study on Speech Emotion Recognition
by: Saliba, Alexandra, et al.
Published: (2024)

Exploring Local Interpretable Model-Agnostic Explanations for Speech Emotion Recognition with Distribution-Shift
by: Hjuler, Maja J., et al.
Published: (2025)

MATER: Multi-level Acoustic and Textual Emotion Representation for Interpretable Speech Emotion Recognition
by: Jon, Hyo Jin, et al.
Published: (2025)

Speech Emotion Recognition with ASR Integration
by: Li, Yuanchao
Published: (2026)

On the Contribution of Lexical Features to Speech Emotion Recognition
by: Combei, David
Published: (2025)

End-to-End Integration of Speech Emotion Recognition with Voice Activity Detection using Self-Supervised Learning Features
by: Yamashita, Natsuo, et al.
Published: (2024)

Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition
by: Shen, Siyuan, et al.
Published: (2024)

Interpreting End-to-End Deep Learning Models for Speech Source Localization Using Layer-wise Relevance Propagation
by: Comanducci, Luca, et al.
Published: (2024)

Revealing Emotional Clusters in Speaker Embeddings: A Contrastive Learning Strategy for Speech Emotion Recognition
by: Ulgen, Ismail Rasim, et al.
Published: (2024)

End-to-end Acoustic-linguistic Emotion and Intent Recognition Enhanced by Semi-supervised Learning
by: Ren, Zhao, et al.
Published: (2025)

Interpretable Embeddings of Speech Enhance and Explain Brain Encoding Performance of Audio Models
by: Shimizu, Riki, et al.
Published: (2025)

Investigation of Deep Neural Network Acoustic Modelling Approaches for Low Resource Accented Mandarin Speech Recognition
by: Xie, Xurong, et al.
Published: (2022)

Mellow: a small audio language model for reasoning
by: Deshmukh, Soham, et al.
Published: (2025)

EMO-SUPERB: An In-depth Look at Speech Emotion Recognition
by: Wu, Haibin, et al.
Published: (2024)

Dataset-Distillation Generative Model for Speech Emotion Recognition
by: Ritter-Gutierrez, Fabian, et al.
Published: (2024)

THAI Speech Emotion Recognition (THAI-SER) corpus
by: Wongpithayadisai, Jilamika, et al.
Published: (2025)

Iterative Prototype Refinement for Ambiguous Speech Emotion Recognition
by: Sun, Haoqin, et al.
Published: (2024)

Abusive Speech Detection in Indic Languages Using Acoustic Features
by: Spiesberger, Anika A., et al.
Published: (2024)

Unifying Listener Scoring Scales: Comparison Learning Framework for Speech Quality Assessment and Continuous Speech Emotion Recognition
by: Hu, Cheng-Hung, et al.
Published: (2025)

Investigating Effective Speaker Property Privacy Protection in Federated Learning for Speech Emotion Recognition
by: Tan, Chao, et al.
Published: (2024)

PCQ: Emotion Recognition in Speech via Progressive Channel Querying
by: Wang, Xincheng, et al.
Published: (2024)

Testing Correctness, Fairness, and Robustness of Speech Emotion Recognition Models
by: Derington, Anna, et al.
Published: (2023)

SELM: Enhancing Speech Emotion Recognition for Out-of-Domain Scenarios
by: Bukhari, Hazim, et al.
Published: (2024)

EMO-RL: Emotion-Rule-Based Reinforcement Learning Enhanced Audio-Language Model for Generalized Speech Emotion Recognition
by: Li, Pengcheng, et al.
Published: (2025)

Comparative Evaluation of Acoustic Feature Extraction Tools for Clinical Speech Analysis
by: Choi, Anna Seo Gyeong, et al.
Published: (2025)

Domain Adapting Deep Reinforcement Learning for Real-world Speech Emotion Recognition
by: Rajapakshe, Thejan, et al.
Published: (2022)

From Human Speech to Ocean Signals: Transferring Speech Large Models for Underwater Acoustic Target Recognition
by: Huang, Mengcheng, et al.
Published: (2026)

Emotion-Aware Contrastive Adaptation Network for Source-Free Cross-Corpus Speech Emotion Recognition
by: Zhao, Yan, et al.
Published: (2024)

Emotional Styles Hide in Deep Speaker Embeddings: Disentangle Deep Speaker Embeddings for Speaker Clustering
by: Lin, Chaohao, et al.
Published: (2025)

Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition
by: Li, Guinan, et al.
Published: (2024)