Gardado en:
| Main Authors: | Wagner, Philipp, Triantafyllopoulos, Andreas, Gebhard, Alexander, Schuller, Björn |
|---|---|
| Formato: | Preprint |
| Publicado: |
2024
|
| Subjects: | |
| Acceso en liña: | https://arxiv.org/abs/2406.06339 |
| Tags: |
Engadir etiqueta
Sen Etiquetas, Sexa o primeiro en etiquetar este rexistro!
|
Títulos similares
Exploring Meta Information for Audio-based Zero-shot Bird Classification
por: Gebhard, Alexander, et al.
Publicado: (2023)
por: Gebhard, Alexander, et al.
Publicado: (2023)
An automatic analysis of ultrasound vocalisations for the prediction of interaction context in captive Egyptian fruit bats
por: Triantafyllopoulos, Andreas, et al.
Publicado: (2024)
por: Triantafyllopoulos, Andreas, et al.
Publicado: (2024)
Computer Audition: From Task-Specific Machine Learning to Foundation Models
por: Triantafyllopoulos, Andreas, et al.
Publicado: (2024)
por: Triantafyllopoulos, Andreas, et al.
Publicado: (2024)
ParaCLAP -- Towards a general language-audio model for computational paralinguistic tasks
por: Jing, Xin, et al.
Publicado: (2024)
por: Jing, Xin, et al.
Publicado: (2024)
Charting 15 years of progress in deep learning for speech emotion recognition: A replication study
por: Triantafyllopoulos, Andreas, et al.
Publicado: (2025)
por: Triantafyllopoulos, Andreas, et al.
Publicado: (2025)
Enhancing Emotional Text-to-Speech Controllability with Natural Language Guidance through Contrastive Learning and Diffusion Models
por: Jing, Xin, et al.
Publicado: (2024)
por: Jing, Xin, et al.
Publicado: (2024)
SmoothCLAP: Soft-Target Enhanced Contrastive Language\--Audio Pretraining for Affective Computing
por: Jing, Xin, et al.
Publicado: (2026)
por: Jing, Xin, et al.
Publicado: (2026)
Audio Enhancement for Computer Audition -- An Iterative Training Paradigm Using Sample Importance
por: Milling, Manuel, et al.
Publicado: (2024)
por: Milling, Manuel, et al.
Publicado: (2024)
Abusive Speech Detection in Indic Languages Using Acoustic Features
por: Spiesberger, Anika A., et al.
Publicado: (2024)
por: Spiesberger, Anika A., et al.
Publicado: (2024)
Bringing the Discussion of Minima Sharpness to the Audio Domain: a Filter-Normalised Evaluation for Acoustic Scene Classification
por: Milling, Manuel, et al.
Publicado: (2023)
por: Milling, Manuel, et al.
Publicado: (2023)
MELT: Towards Automated Multimodal Emotion Data Annotation by Leveraging LLM Embedded Knowledge
por: Jing, Xin, et al.
Publicado: (2025)
por: Jing, Xin, et al.
Publicado: (2025)
autrainer: A Modular and Extensible Deep Learning Toolkit for Computer Audition Tasks
por: Rampp, Simon, et al.
Publicado: (2024)
por: Rampp, Simon, et al.
Publicado: (2024)
Raw Audio Classification with Cosine Convolutional Neural Network (CosCovNN)
por: Haque, Kazi Nazmul, et al.
Publicado: (2024)
por: Haque, Kazi Nazmul, et al.
Publicado: (2024)
From Audio Deepfake Detection to AI-Generated Music Detection -- A Pathway and Overview
por: Li, Yupei, et al.
Publicado: (2024)
por: Li, Yupei, et al.
Publicado: (2024)
DB3V: A Dialect Dominated Dataset of Bird Vocalisation for Cross-corpus Bird Species Recognition
por: Jing, Xin, et al.
Publicado: (2024)
por: Jing, Xin, et al.
Publicado: (2024)
Audio Explanation Synthesis with Generative Foundation Models
por: Akman, Alican, et al.
Publicado: (2024)
por: Akman, Alican, et al.
Publicado: (2024)
Intelligent Cardiac Auscultation for Murmur Detection via Parallel-Attentive Models with Uncertainty Estimation
por: Zhang, Zixing, et al.
Publicado: (2024)
por: Zhang, Zixing, et al.
Publicado: (2024)
DOTA-ME-CS: Daily Oriented Text Audio-Mandarin English-Code Switching Dataset
por: Li, Yupei, et al.
Publicado: (2025)
por: Li, Yupei, et al.
Publicado: (2025)
Quantifying Dimensional Independence in Speech: An Information-Theoretic Framework for Disentangled Representation Learning
por: Kashyap, Bipasha, et al.
Publicado: (2026)
por: Kashyap, Bipasha, et al.
Publicado: (2026)
Cross-Dialect Bird Species Recognition with Dialect-Calibrated Augmentation
por: Ding, Jiani, et al.
Publicado: (2025)
por: Ding, Jiani, et al.
Publicado: (2025)
Emotion-Aware Contrastive Adaptation Network for Source-Free Cross-Corpus Speech Emotion Recognition
por: Zhao, Yan, et al.
Publicado: (2024)
por: Zhao, Yan, et al.
Publicado: (2024)
Can Large Language Models Aid in Annotating Speech Emotional Data? Uncovering New Frontiers
por: Latif, Siddique, et al.
Publicado: (2023)
por: Latif, Siddique, et al.
Publicado: (2023)
Detecting COPD Through Speech Analysis: A Dataset of Danish Speech and Machine Learning Approach
por: Sankey-Olsen, Cuno, et al.
Publicado: (2025)
por: Sankey-Olsen, Cuno, et al.
Publicado: (2025)
Combining Audio and Non-Audio Inputs in Evolved Neural Networks for Ovenbird
por: Hernandez, Sergio Poo, et al.
Publicado: (2025)
por: Hernandez, Sergio Poo, et al.
Publicado: (2025)
Explainable Detection of Machine Generated Music and Early Systematic Evaluation
por: Li, Yupei, et al.
Publicado: (2024)
por: Li, Yupei, et al.
Publicado: (2024)
Noise-to-mask Ratio Loss for Deep Neural Network based Audio Watermarking
por: Moritz, Martin, et al.
Publicado: (2024)
por: Moritz, Martin, et al.
Publicado: (2024)
emoDARTS: Joint Optimisation of CNN & Sequential Neural Network Architectures for Superior Speech Emotion Recognition
por: Rajapakshe, Thejan, et al.
Publicado: (2024)
por: Rajapakshe, Thejan, et al.
Publicado: (2024)
Online Single-Channel Audio-Based Sound Speed Estimation for Robust Multi-Channel Audio Control
por: Fuglsig, Andreas Jonas, et al.
Publicado: (2026)
por: Fuglsig, Andreas Jonas, et al.
Publicado: (2026)
This Paper Had the Smartest Reviewers -- Flattery Detection Utilising an Audio-Textual Transformer-Based Approach
por: Christ, Lukas, et al.
Publicado: (2024)
por: Christ, Lukas, et al.
Publicado: (2024)
Audio Enhancement from Multiple Crowdsourced Recordings: A Simple and Effective Baseline
por: Aziz, Shiran, et al.
Publicado: (2024)
por: Aziz, Shiran, et al.
Publicado: (2024)
Lightweight Implicit Neural Network for Binaural Audio Synthesis
por: Lu, Xikun, et al.
Publicado: (2025)
por: Lu, Xikun, et al.
Publicado: (2025)
M6: Multi-generator, Multi-domain, Multi-lingual and cultural, Multi-genres, Multi-instrument Machine-Generated Music Detection Databases
por: Li, Yupei, et al.
Publicado: (2024)
por: Li, Yupei, et al.
Publicado: (2024)
Using voice analysis as an early indicator of risk for depression in young adults
por: Scherer, Klaus R., et al.
Publicado: (2024)
por: Scherer, Klaus R., et al.
Publicado: (2024)
Testing Correctness, Fairness, and Robustness of Speech Emotion Recognition Models
por: Derington, Anna, et al.
Publicado: (2023)
por: Derington, Anna, et al.
Publicado: (2023)
Spectral Masking with Explicit Time-Context Windowing for Neural Network-Based Monaural Speech Enhancement
por: Fiorio, Luan Vinícius, et al.
Publicado: (2024)
por: Fiorio, Luan Vinícius, et al.
Publicado: (2024)
AffectSpeech: A Large-Scale Emotional Speech Dataset with Fine-Grained Textual Descriptions for Speech Emotion Captioning and Synthesis
por: Qi, Tianhua, et al.
Publicado: (2026)
por: Qi, Tianhua, et al.
Publicado: (2026)
Wav2Small: Distilling Wav2Vec2 to 72K parameters for Low-Resource Speech emotion recognition
por: Kounadis-Bastian, Dionyssos, et al.
Publicado: (2024)
por: Kounadis-Bastian, Dionyssos, et al.
Publicado: (2024)
TAME: Temporal Audio-based Mamba for Enhanced Drone Trajectory Estimation and Classification
por: Xiao, Zhenyuan, et al.
Publicado: (2024)
por: Xiao, Zhenyuan, et al.
Publicado: (2024)
VCNAC: A Variable-Channel Neural Audio Codec for Mono, Stereo, and Surround Sound
por: Grötschla, Florian, et al.
Publicado: (2026)
por: Grötschla, Florian, et al.
Publicado: (2026)
Enhancing Neural Audio Fingerprint Robustness to Audio Degradation for Music Identification
por: Araz, R. Oguz, et al.
Publicado: (2025)
por: Araz, R. Oguz, et al.
Publicado: (2025)
Títulos similares
-
Exploring Meta Information for Audio-based Zero-shot Bird Classification
por: Gebhard, Alexander, et al.
Publicado: (2023) -
An automatic analysis of ultrasound vocalisations for the prediction of interaction context in captive Egyptian fruit bats
por: Triantafyllopoulos, Andreas, et al.
Publicado: (2024) -
Computer Audition: From Task-Specific Machine Learning to Foundation Models
por: Triantafyllopoulos, Andreas, et al.
Publicado: (2024) -
ParaCLAP -- Towards a general language-audio model for computational paralinguistic tasks
por: Jing, Xin, et al.
Publicado: (2024) -
Charting 15 years of progress in deep learning for speech emotion recognition: A replication study
por: Triantafyllopoulos, Andreas, et al.
Publicado: (2025)