:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Elias, Noel
Format:	Preprint
Veröffentlicht:	2024
Schlagworte:	Sound Machine Learning Audio and Speech Processing
Online-Zugang:	https://arxiv.org/abs/2410.21557
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Audio Classification of Low Feature Spectrograms Utilizing Convolutional Neural Networks
von: Elias, Noel
Veröffentlicht: (2024)

Automatic Contextual Audio Denoising
von: Luong, Diep, et al.
Veröffentlicht: (2026)

HRTF Estimation using a Score-based Prior
von: Thuillier, Etienne, et al.
Veröffentlicht: (2024)

Denoising by neural network for muzzle blast detection
von: Pujol, Hadrien, et al.
Veröffentlicht: (2025)

Are Deep Speech Denoising Models Robust to Adversarial Noise?
von: Schwarzer, Will, et al.
Veröffentlicht: (2025)

SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation
von: Saito, Koichi, et al.
Veröffentlicht: (2024)

Knowledge Distillation for Speech Denoising by Latent Representation Alignment with Cosine Distance
von: Luong, Diep, et al.
Veröffentlicht: (2025)

Contextual Speech Extraction: Leveraging Textual History as an Implicit Cue for Target Speech Extraction
von: Kim, Minsu, et al.
Veröffentlicht: (2025)

Neural Speech Extraction with Human Feedback
von: Itani, Malek, et al.
Veröffentlicht: (2025)

Uncertainty-Aware Mean Opinion Score Prediction
von: Wang, Hui, et al.
Veröffentlicht: (2024)

Cosine Scoring with Uncertainty for Neural Speaker Embedding
von: Wang, Qiongqiong, et al.
Veröffentlicht: (2024)

PBSCR: The Piano Bootleg Score Composer Recognition Dataset
von: Jain, Arhan, et al.
Veröffentlicht: (2024)

Score-Based Training for Energy-Based TTS Models
von: Sun, Wanli, et al.
Veröffentlicht: (2025)

Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS
von: Ko, Myeongjin, et al.
Veröffentlicht: (2023)

Identifying birdsong syllables without labelled data
von: Teng, Mélisande, et al.
Veröffentlicht: (2025)

End-to-end Piano Performance-MIDI to Score Conversion with Transformers
von: Beyer, Tim, et al.
Veröffentlicht: (2024)

FlowTSE: Target Speaker Extraction with Flow Matching
von: Navon, Aviv, et al.
Veröffentlicht: (2025)

DDTSE: Discriminative Diffusion Model for Target Speech Extraction
von: Zhang, Leying, et al.
Veröffentlicht: (2023)

Beat this! Accurate beat tracking without DBN postprocessing
von: Foscarin, Francesco, et al.
Veröffentlicht: (2024)

A Novel Stochastic Transformer-based Approach for Post-Traumatic Stress Disorder Detection using Audio Recording of Clinical Interviews
von: Dia, Mamadou, et al.
Veröffentlicht: (2024)

ASTRA: Aligning Speech and Text Representations for Asr without Sampling
von: Gaur, Neeraj, et al.
Veröffentlicht: (2024)

Scoring Time Intervals using Non-Hierarchical Transformer For Automatic Piano Transcription
von: Yan, Yujia, et al.
Veröffentlicht: (2024)

TSELM: Target Speaker Extraction using Discrete Tokens and Language Models
von: Tang, Beilong, et al.
Veröffentlicht: (2024)

SoundSculpt: Direction and Semantics Driven Ambisonic Target Sound Extraction
von: Chen, Tuochao, et al.
Veröffentlicht: (2025)

TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in End-to-End ASR
von: Ravi, Nagarathna, et al.
Veröffentlicht: (2024)

SeMaScore : a new evaluation metric for automatic speech recognition tasks
von: Sasindran, Zitha, et al.
Veröffentlicht: (2024)

An Effective Strategy for Modeling Score Ordinality and Non-uniform Intervals in Automated Speaking Assessment
von: Lo, Tien-Hong, et al.
Veröffentlicht: (2025)

Score-informed Music Source Separation: Improving Synthetic-to-real Generalization in Classical Music
von: Tunturi, Eetu, et al.
Veröffentlicht: (2025)

MGE-LDM: Joint Latent Diffusion for Simultaneous Music Generation and Source Extraction
von: Chae, Yunkee, et al.
Veröffentlicht: (2025)

Frame-Level Internal Tool Use for Temporal Grounding in Audio LMs
von: An, Joesph, et al.
Veröffentlicht: (2026)

Multi-Representation Attention Framework for Underwater Bioacoustic Denoising and Recognition
von: Razig, Amine, et al.
Veröffentlicht: (2025)

Adaptive Rotary Steering with Joint Autoregression for Robust Extraction of Closely Moving Speakers in Dynamic Scenarios
von: Kienegger, Jakob, et al.
Veröffentlicht: (2026)

Autoregressive Guidance of Deep Spatially Selective Filters using Bayesian Tracking for Efficient Extraction of Moving Speakers
von: Kienegger, Jakob, et al.
Veröffentlicht: (2026)

Steering Deep Non-Linear Spatially Selective Filters for Weakly Guided Extraction of Moving Speakers in Dynamic Scenarios
von: Kienegger, Jakob, et al.
Veröffentlicht: (2025)

Improving BERT for Symbolic Music Understanding Using Token Denoising and Pianoroll Prediction
von: Wang, Jun-You, et al.
Veröffentlicht: (2025)

Self-Steering Deep Non-Linear Spatially Selective Filters for Efficient Extraction of Moving Speakers under Weak Guidance
von: Kienegger, Jakob, et al.
Veröffentlicht: (2025)

Bringing the Discussion of Minima Sharpness to the Audio Domain: a Filter-Normalised Evaluation for Acoustic Scene Classification
von: Milling, Manuel, et al.
Veröffentlicht: (2023)

Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
von: Kong, Zhifeng, et al.
Veröffentlicht: (2024)

A data-driven two-microphone method for in-situ sound absorption measurements
von: Emmerich, Leon, et al.
Veröffentlicht: (2025)

Unrolled Creative Adversarial Network For Generating Novel Musical Pieces
von: Nag, Pratik
Veröffentlicht: (2024)