:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Hinrichs, Reemt, Ostermann, Jörn
Format:	Preprint
Published:	2025
Subjects:	Sound Machine Learning Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2502.02424
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Scalable Speech Enhancement with Dynamic Channel Pruning
by: Miccini, Riccardo, et al.
Published: (2024)

OnDA: On-device Channel Pruning for Efficient Personalized Keyword Spotting
by: Risso, Matteo, et al.
Published: (2026)

Revisit Micro-batch Clipping: Adaptive Data Pruning via Gradient Manipulation
by: Wang, Lun
Published: (2024)

Music2Latent: Consistency Autoencoders for Latent Audio Compression
by: Pasini, Marco, et al.
Published: (2024)

From Diet to Free Lunch: Estimating Auxiliary Signal Properties using Dynamic Pruning Masks in Speech Enhancement Networks
by: Miccini, Riccardo, et al.
Published: (2026)

FLToP CTC: Frame-Level Token Pruning via Relative Threshold for Efficient and Memory-Saving Decoding on Diverse Platforms
by: Shree, Atul, et al.
Published: (2025)

Text-Independent Speaker Identification Using Audio Looping With Margin Based Loss Functions
by: Garcia, Elliot Q C, et al.
Published: (2025)

Can Masked Autoencoders Also Listen to Birds?
by: Rauch, Lukas, et al.
Published: (2025)

Data-Driven Room Acoustic Modeling Via Differentiable Feedback Delay Networks With Learnable Delay Lines
by: Mezza, Alessandro Ilic, et al.
Published: (2024)

Re-Bottleneck: Latent Re-Structuring for Neural Audio Autoencoders
by: Bralios, Dimitrios, et al.
Published: (2025)

Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder
by: Melechovsky, Jan, et al.
Published: (2022)

wav2pos: Sound Source Localization using Masked Autoencoders
by: Berg, Axel, et al.
Published: (2024)

Music Emotion Prediction Using Recurrent Neural Networks
by: Chang, Xinyu, et al.
Published: (2024)

SpeechPrune: Context-aware Token Pruning for Speech Information Retrieval
by: Lin, Yueqian, et al.
Published: (2024)

CochCeps-Augment: A Novel Self-Supervised Contrastive Learning Using Cochlear Cepstrum-based Masking for Speech Emotion Recognition
by: Ziogas, Ioannis, et al.
Published: (2024)

Dynamic Gated Recurrent Neural Network for Compute-efficient Speech Enhancement
by: Cheng, Longbiao, et al.
Published: (2024)

Vocal Melody Construction for Persian Lyrics Using LSTM Recurrent Neural Networks
by: Jafari, Farshad, et al.
Published: (2024)

DEMONet: Underwater Acoustic Target Recognition based on Multi-Expert Network and Cross-Temporal Variational Autoencoder
by: Xie, Yuan, et al.
Published: (2024)

Noise-aware Speech Enhancement using Diffusion Probabilistic Model
by: Hu, Yuchen, et al.
Published: (2023)

ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control
by: Ji, Shengpeng, et al.
Published: (2024)

Investigation of Time-Frequency Feature Combinations with Histogram Layer Time Delay Neural Networks
by: Mohammadi, Amirmohammad, et al.
Published: (2024)

Synthetic data enables context-aware bioacoustic sound event detection
by: Hoffman, Benjamin, et al.
Published: (2025)

Zero-shot Voice Conversion with Diffusion Transformers
by: Liu, Songting
Published: (2024)

Zero-Shot Mono-to-Binaural Speech Synthesis
by: Levkovitch, Alon, et al.
Published: (2024)

Recurrence-Based Nonlinear Vocal Dynamics as Digital Biomarkers for Depression Detection from Conversational Speech
by: Samanta, Himadri S
Published: (2026)

Context-aware child-directed speech detection from long-form recordings
by: Charlot, Théo, et al.
Published: (2026)

Transcribing Rhythmic Patterns of the Guitar Track in Polyphonic Music
by: Lukoianov, Aleksandr, et al.
Published: (2025)

TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modeling
by: Wang, Yuancheng, et al.
Published: (2025)

AdaPTwin: Low-Cost Adaptive Compression of Product Twins in Transformers
by: Biju, Emil, et al.
Published: (2024)

Zero-Shot Multi-Lingual Speaker Verification in Clinical Trials
by: Akram, Ali, et al.
Published: (2024)

Embedding-Space Diffusion for Zero-Shot Environmental Sound Classification
by: Sims, Ysobel, et al.
Published: (2024)

Zero Shot Audio to Audio Emotion Transfer With Speaker Disentanglement
by: Dutta, Soumya, et al.
Published: (2024)

Information Retrieval for ZeroSpeech 2021: The Submission by University of Wroclaw
by: Chorowski, Jan, et al.
Published: (2021)

Multi-label Zero-Shot Audio Classification with Temporal Attention
by: Dogan, Duygu, et al.
Published: (2024)

Multi-modal Adversarial Training for Zero-Shot Voice Cloning
by: Janiczek, John, et al.
Published: (2024)

Audio Processing using Pattern Recognition for Music Genre Classification
by: Chatterjee, Sivangi, et al.
Published: (2024)

Pruning as Regularization: Sensitivity-Aware One-Shot Pruning in ASR
by: Irigoyen, Julian, et al.
Published: (2025)

SepPrune: Structured Pruning for Efficient Deep Speech Separation
by: Li, Yuqi, et al.
Published: (2025)

GE2E-AC: Generalized End-to-End Loss Training for Accent Classification
by: Watanabe, Chihiro, et al.
Published: (2024)

Improving Perceptual Audio Aesthetic Assessment via Triplet Loss and Self-Supervised Embeddings
by: Wisnu, Dyah A. M. G., et al.
Published: (2025)