:: Library Catalog

Imaxe de Portada

Gardado en:

Detalles Bibliográficos
Main Authors:	Wagner, Philipp, Triantafyllopoulos, Andreas, Gebhard, Alexander, Schuller, Björn
Formato:	Preprint
Publicado:	2024
Subjects:	Sound Audio and Speech Processing
Acceso en liña:	https://arxiv.org/abs/2406.06339
Tags:	Engadir etiqueta Sen Etiquetas, Sexa o primeiro en etiquetar este rexistro!

Títulos similares

Exploring Meta Information for Audio-based Zero-shot Bird Classification
por: Gebhard, Alexander, et al.
Publicado: (2023)

An automatic analysis of ultrasound vocalisations for the prediction of interaction context in captive Egyptian fruit bats
por: Triantafyllopoulos, Andreas, et al.
Publicado: (2024)

Computer Audition: From Task-Specific Machine Learning to Foundation Models
por: Triantafyllopoulos, Andreas, et al.
Publicado: (2024)

ParaCLAP -- Towards a general language-audio model for computational paralinguistic tasks
por: Jing, Xin, et al.
Publicado: (2024)

Charting 15 years of progress in deep learning for speech emotion recognition: A replication study
por: Triantafyllopoulos, Andreas, et al.
Publicado: (2025)

Enhancing Emotional Text-to-Speech Controllability with Natural Language Guidance through Contrastive Learning and Diffusion Models
por: Jing, Xin, et al.
Publicado: (2024)

SmoothCLAP: Soft-Target Enhanced Contrastive Language\--Audio Pretraining for Affective Computing
por: Jing, Xin, et al.
Publicado: (2026)

Audio Enhancement for Computer Audition -- An Iterative Training Paradigm Using Sample Importance
por: Milling, Manuel, et al.
Publicado: (2024)

Abusive Speech Detection in Indic Languages Using Acoustic Features
por: Spiesberger, Anika A., et al.
Publicado: (2024)

Bringing the Discussion of Minima Sharpness to the Audio Domain: a Filter-Normalised Evaluation for Acoustic Scene Classification
por: Milling, Manuel, et al.
Publicado: (2023)

MELT: Towards Automated Multimodal Emotion Data Annotation by Leveraging LLM Embedded Knowledge
por: Jing, Xin, et al.
Publicado: (2025)

autrainer: A Modular and Extensible Deep Learning Toolkit for Computer Audition Tasks
por: Rampp, Simon, et al.
Publicado: (2024)

Raw Audio Classification with Cosine Convolutional Neural Network (CosCovNN)
por: Haque, Kazi Nazmul, et al.
Publicado: (2024)

From Audio Deepfake Detection to AI-Generated Music Detection -- A Pathway and Overview
por: Li, Yupei, et al.
Publicado: (2024)

DB3V: A Dialect Dominated Dataset of Bird Vocalisation for Cross-corpus Bird Species Recognition
por: Jing, Xin, et al.
Publicado: (2024)

Audio Explanation Synthesis with Generative Foundation Models
por: Akman, Alican, et al.
Publicado: (2024)

Intelligent Cardiac Auscultation for Murmur Detection via Parallel-Attentive Models with Uncertainty Estimation
por: Zhang, Zixing, et al.
Publicado: (2024)

DOTA-ME-CS: Daily Oriented Text Audio-Mandarin English-Code Switching Dataset
por: Li, Yupei, et al.
Publicado: (2025)

Quantifying Dimensional Independence in Speech: An Information-Theoretic Framework for Disentangled Representation Learning
por: Kashyap, Bipasha, et al.
Publicado: (2026)

Cross-Dialect Bird Species Recognition with Dialect-Calibrated Augmentation
por: Ding, Jiani, et al.
Publicado: (2025)

Emotion-Aware Contrastive Adaptation Network for Source-Free Cross-Corpus Speech Emotion Recognition
por: Zhao, Yan, et al.
Publicado: (2024)

Can Large Language Models Aid in Annotating Speech Emotional Data? Uncovering New Frontiers
por: Latif, Siddique, et al.
Publicado: (2023)

Detecting COPD Through Speech Analysis: A Dataset of Danish Speech and Machine Learning Approach
por: Sankey-Olsen, Cuno, et al.
Publicado: (2025)

Combining Audio and Non-Audio Inputs in Evolved Neural Networks for Ovenbird
por: Hernandez, Sergio Poo, et al.
Publicado: (2025)

Explainable Detection of Machine Generated Music and Early Systematic Evaluation
por: Li, Yupei, et al.
Publicado: (2024)

Noise-to-mask Ratio Loss for Deep Neural Network based Audio Watermarking
por: Moritz, Martin, et al.
Publicado: (2024)

emoDARTS: Joint Optimisation of CNN & Sequential Neural Network Architectures for Superior Speech Emotion Recognition
por: Rajapakshe, Thejan, et al.
Publicado: (2024)

Online Single-Channel Audio-Based Sound Speed Estimation for Robust Multi-Channel Audio Control
por: Fuglsig, Andreas Jonas, et al.
Publicado: (2026)

This Paper Had the Smartest Reviewers -- Flattery Detection Utilising an Audio-Textual Transformer-Based Approach
por: Christ, Lukas, et al.
Publicado: (2024)

Audio Enhancement from Multiple Crowdsourced Recordings: A Simple and Effective Baseline
por: Aziz, Shiran, et al.
Publicado: (2024)

Lightweight Implicit Neural Network for Binaural Audio Synthesis
por: Lu, Xikun, et al.
Publicado: (2025)

M6: Multi-generator, Multi-domain, Multi-lingual and cultural, Multi-genres, Multi-instrument Machine-Generated Music Detection Databases
por: Li, Yupei, et al.
Publicado: (2024)

Using voice analysis as an early indicator of risk for depression in young adults
por: Scherer, Klaus R., et al.
Publicado: (2024)

Testing Correctness, Fairness, and Robustness of Speech Emotion Recognition Models
por: Derington, Anna, et al.
Publicado: (2023)

Spectral Masking with Explicit Time-Context Windowing for Neural Network-Based Monaural Speech Enhancement
por: Fiorio, Luan Vinícius, et al.
Publicado: (2024)

AffectSpeech: A Large-Scale Emotional Speech Dataset with Fine-Grained Textual Descriptions for Speech Emotion Captioning and Synthesis
por: Qi, Tianhua, et al.
Publicado: (2026)

Wav2Small: Distilling Wav2Vec2 to 72K parameters for Low-Resource Speech emotion recognition
por: Kounadis-Bastian, Dionyssos, et al.
Publicado: (2024)

TAME: Temporal Audio-based Mamba for Enhanced Drone Trajectory Estimation and Classification
por: Xiao, Zhenyuan, et al.
Publicado: (2024)

VCNAC: A Variable-Channel Neural Audio Codec for Mono, Stereo, and Surround Sound
por: Grötschla, Florian, et al.
Publicado: (2026)

Enhancing Neural Audio Fingerprint Robustness to Audio Degradation for Music Identification
por: Araz, R. Oguz, et al.
Publicado: (2025)