:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Dong, Zhongren, Zhang, Zixing, Xu, Weixiang, Han, Jing, Ou, Jianjun, Schuller, Björn W.
Format:	Preprint
Published:	2024
Subjects:	Sound Computation and Language Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2405.03952
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Re-Parameterization of Lightweight Transformer for On-Device Speech Emotion Recognition
by: Zhang, Zixing, et al.
Published: (2024)

Intelligent Cardiac Auscultation for Murmur Detection via Parallel-Attentive Models with Uncertainty Estimation
by: Zhang, Zixing, et al.
Published: (2024)

ParaLBench: A Large-Scale Benchmark for Computational Paralinguistics over Acoustic Foundation Models
by: Zhang, Zixing, et al.
Published: (2024)

ProsodyFM: Unsupervised Phrasing and Intonation Control for Intelligible Speech Synthesis
by: He, Xiangheng, et al.
Published: (2024)

Quantifying Dimensional Independence in Speech: An Information-Theoretic Framework for Disentangled Representation Learning
by: Kashyap, Bipasha, et al.
Published: (2026)

Abusive Speech Detection in Indic Languages Using Acoustic Features
by: Spiesberger, Anika A., et al.
Published: (2024)

Enhancing Emotional Text-to-Speech Controllability with Natural Language Guidance through Contrastive Learning and Diffusion Models
by: Jing, Xin, et al.
Published: (2024)

From Audio Deepfake Detection to AI-Generated Music Detection -- A Pathway and Overview
by: Li, Yupei, et al.
Published: (2024)

Leveraging Multimodal Methods and Spontaneous Speech for Alzheimer's Disease Identification
by: Gao, Yifan, et al.
Published: (2024)

Integrating Pause Information with Word Embeddings in Language Models for Alzheimer's Disease Detection from Spontaneous Speech
by: Pu, Yu, et al.
Published: (2025)

ParaCLAP -- Towards a general language-audio model for computational paralinguistic tasks
by: Jing, Xin, et al.
Published: (2024)

Can Large Language Models Aid in Annotating Speech Emotional Data? Uncovering New Frontiers
by: Latif, Siddique, et al.
Published: (2023)

Emotion-Aware Contrastive Adaptation Network for Source-Free Cross-Corpus Speech Emotion Recognition
by: Zhao, Yan, et al.
Published: (2024)

Testing Correctness, Fairness, and Robustness of Speech Emotion Recognition Models
by: Derington, Anna, et al.
Published: (2023)

Charting 15 years of progress in deep learning for speech emotion recognition: A replication study
by: Triantafyllopoulos, Andreas, et al.
Published: (2025)

AffectSpeech: A Large-Scale Emotional Speech Dataset with Fine-Grained Textual Descriptions for Speech Emotion Captioning and Synthesis
by: Qi, Tianhua, et al.
Published: (2026)

Explainable Detection of Machine Generated Music and Early Systematic Evaluation
by: Li, Yupei, et al.
Published: (2024)

Leveraging Local and Global Knowledge Integration with Time-Frequency Calibrated Distillation for Speech Enhancement
by: Cheng, Jiaming, et al.
Published: (2025)

Domain Adapting Deep Reinforcement Learning for Real-world Speech Emotion Recognition
by: Rajapakshe, Thejan, et al.
Published: (2022)

Adaptive Speech Emotion Representation Learning Based On Dynamic Graph
by: Gao, Yingxue, et al.
Published: (2024)

S2ST-Omni: Hierarchical Language-Aware SpeechLLM Adaptation for Multilingual Speech-to-Speech Translation
by: Pan, Yu, et al.
Published: (2025)

STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion Recognition
by: Chang, Yi, et al.
Published: (2024)

M6: Multi-generator, Multi-domain, Multi-lingual and cultural, Multi-genres, Multi-instrument Machine-Generated Music Detection Databases
by: Li, Yupei, et al.
Published: (2024)

Cross-Dialect Bird Species Recognition with Dialect-Calibrated Augmentation
by: Ding, Jiani, et al.
Published: (2025)

Representation Learning with Parameterised Quantum Circuits for Advancing Speech Emotion Recognition
by: Rajapakshe, Thejan, et al.
Published: (2025)

Computer Audition: From Task-Specific Machine Learning to Foundation Models
by: Triantafyllopoulos, Andreas, et al.
Published: (2024)

Enhancing Speech Emotion Recognition Through Differentiable Architecture Search
by: Rajapakshe, Thejan, et al.
Published: (2023)

Audio-based Step-count Estimation for Running -- Windowing and Neural Network Baselines
by: Wagner, Philipp, et al.
Published: (2024)

Wav2Small: Distilling Wav2Vec2 to 72K parameters for Low-Resource Speech emotion recognition
by: Kounadis-Bastian, Dionyssos, et al.
Published: (2024)

Are you sure? Analysing Uncertainty Quantification Approaches for Real-world Speech Emotion Recognition
by: Schrüfer, Oliver, et al.
Published: (2024)

Audio Explanation Synthesis with Generative Foundation Models
by: Akman, Alican, et al.
Published: (2024)

emoDARTS: Joint Optimisation of CNN & Sequential Neural Network Architectures for Superior Speech Emotion Recognition
by: Rajapakshe, Thejan, et al.
Published: (2024)

SmoothCLAP: Soft-Target Enhanced Contrastive Language\--Audio Pretraining for Affective Computing
by: Jing, Xin, et al.
Published: (2026)

MELT: Towards Automated Multimodal Emotion Data Annotation by Leveraging LLM Embedded Knowledge
by: Jing, Xin, et al.
Published: (2025)

An automatic analysis of ultrasound vocalisations for the prediction of interaction context in captive Egyptian fruit bats
by: Triantafyllopoulos, Andreas, et al.
Published: (2024)

DOTA-ME-CS: Daily Oriented Text Audio-Mandarin English-Code Switching Dataset
by: Li, Yupei, et al.
Published: (2025)

Using voice analysis as an early indicator of risk for depression in young adults
by: Scherer, Klaus R., et al.
Published: (2024)

Speech as a Biomarker for Disease Detection
by: Botelho, Catarina, et al.
Published: (2024)

Improving Speaker-independent Speech Emotion Recognition Using Dynamic Joint Distribution Adaptation
by: Lu, Cheng, et al.
Published: (2024)

Breaking Resource Barriers in Speech Emotion Recognition via Data Distillation
by: Chang, Yi, et al.
Published: (2024)