Saved in:
| Main Authors: | Hinrichs, Reemt, Ostermann, Jörn |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.02424 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Scalable Speech Enhancement with Dynamic Channel Pruning
by: Miccini, Riccardo, et al.
Published: (2024)
by: Miccini, Riccardo, et al.
Published: (2024)
OnDA: On-device Channel Pruning for Efficient Personalized Keyword Spotting
by: Risso, Matteo, et al.
Published: (2026)
by: Risso, Matteo, et al.
Published: (2026)
Revisit Micro-batch Clipping: Adaptive Data Pruning via Gradient Manipulation
by: Wang, Lun
Published: (2024)
by: Wang, Lun
Published: (2024)
Music2Latent: Consistency Autoencoders for Latent Audio Compression
by: Pasini, Marco, et al.
Published: (2024)
by: Pasini, Marco, et al.
Published: (2024)
From Diet to Free Lunch: Estimating Auxiliary Signal Properties using Dynamic Pruning Masks in Speech Enhancement Networks
by: Miccini, Riccardo, et al.
Published: (2026)
by: Miccini, Riccardo, et al.
Published: (2026)
FLToP CTC: Frame-Level Token Pruning via Relative Threshold for Efficient and Memory-Saving Decoding on Diverse Platforms
by: Shree, Atul, et al.
Published: (2025)
by: Shree, Atul, et al.
Published: (2025)
Text-Independent Speaker Identification Using Audio Looping With Margin Based Loss Functions
by: Garcia, Elliot Q C, et al.
Published: (2025)
by: Garcia, Elliot Q C, et al.
Published: (2025)
Can Masked Autoencoders Also Listen to Birds?
by: Rauch, Lukas, et al.
Published: (2025)
by: Rauch, Lukas, et al.
Published: (2025)
Data-Driven Room Acoustic Modeling Via Differentiable Feedback Delay Networks With Learnable Delay Lines
by: Mezza, Alessandro Ilic, et al.
Published: (2024)
by: Mezza, Alessandro Ilic, et al.
Published: (2024)
Re-Bottleneck: Latent Re-Structuring for Neural Audio Autoencoders
by: Bralios, Dimitrios, et al.
Published: (2025)
by: Bralios, Dimitrios, et al.
Published: (2025)
Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder
by: Melechovsky, Jan, et al.
Published: (2022)
by: Melechovsky, Jan, et al.
Published: (2022)
wav2pos: Sound Source Localization using Masked Autoencoders
by: Berg, Axel, et al.
Published: (2024)
by: Berg, Axel, et al.
Published: (2024)
Music Emotion Prediction Using Recurrent Neural Networks
by: Chang, Xinyu, et al.
Published: (2024)
by: Chang, Xinyu, et al.
Published: (2024)
SpeechPrune: Context-aware Token Pruning for Speech Information Retrieval
by: Lin, Yueqian, et al.
Published: (2024)
by: Lin, Yueqian, et al.
Published: (2024)
CochCeps-Augment: A Novel Self-Supervised Contrastive Learning Using Cochlear Cepstrum-based Masking for Speech Emotion Recognition
by: Ziogas, Ioannis, et al.
Published: (2024)
by: Ziogas, Ioannis, et al.
Published: (2024)
Dynamic Gated Recurrent Neural Network for Compute-efficient Speech Enhancement
by: Cheng, Longbiao, et al.
Published: (2024)
by: Cheng, Longbiao, et al.
Published: (2024)
Vocal Melody Construction for Persian Lyrics Using LSTM Recurrent Neural Networks
by: Jafari, Farshad, et al.
Published: (2024)
by: Jafari, Farshad, et al.
Published: (2024)
DEMONet: Underwater Acoustic Target Recognition based on Multi-Expert Network and Cross-Temporal Variational Autoencoder
by: Xie, Yuan, et al.
Published: (2024)
by: Xie, Yuan, et al.
Published: (2024)
Noise-aware Speech Enhancement using Diffusion Probabilistic Model
by: Hu, Yuchen, et al.
Published: (2023)
by: Hu, Yuchen, et al.
Published: (2023)
ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control
by: Ji, Shengpeng, et al.
Published: (2024)
by: Ji, Shengpeng, et al.
Published: (2024)
Investigation of Time-Frequency Feature Combinations with Histogram Layer Time Delay Neural Networks
by: Mohammadi, Amirmohammad, et al.
Published: (2024)
by: Mohammadi, Amirmohammad, et al.
Published: (2024)
Synthetic data enables context-aware bioacoustic sound event detection
by: Hoffman, Benjamin, et al.
Published: (2025)
by: Hoffman, Benjamin, et al.
Published: (2025)
Zero-shot Voice Conversion with Diffusion Transformers
by: Liu, Songting
Published: (2024)
by: Liu, Songting
Published: (2024)
Zero-Shot Mono-to-Binaural Speech Synthesis
by: Levkovitch, Alon, et al.
Published: (2024)
by: Levkovitch, Alon, et al.
Published: (2024)
Recurrence-Based Nonlinear Vocal Dynamics as Digital Biomarkers for Depression Detection from Conversational Speech
by: Samanta, Himadri S
Published: (2026)
by: Samanta, Himadri S
Published: (2026)
Context-aware child-directed speech detection from long-form recordings
by: Charlot, Théo, et al.
Published: (2026)
by: Charlot, Théo, et al.
Published: (2026)
Transcribing Rhythmic Patterns of the Guitar Track in Polyphonic Music
by: Lukoianov, Aleksandr, et al.
Published: (2025)
by: Lukoianov, Aleksandr, et al.
Published: (2025)
TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modeling
by: Wang, Yuancheng, et al.
Published: (2025)
by: Wang, Yuancheng, et al.
Published: (2025)
AdaPTwin: Low-Cost Adaptive Compression of Product Twins in Transformers
by: Biju, Emil, et al.
Published: (2024)
by: Biju, Emil, et al.
Published: (2024)
Zero-Shot Multi-Lingual Speaker Verification in Clinical Trials
by: Akram, Ali, et al.
Published: (2024)
by: Akram, Ali, et al.
Published: (2024)
Embedding-Space Diffusion for Zero-Shot Environmental Sound Classification
by: Sims, Ysobel, et al.
Published: (2024)
by: Sims, Ysobel, et al.
Published: (2024)
Zero Shot Audio to Audio Emotion Transfer With Speaker Disentanglement
by: Dutta, Soumya, et al.
Published: (2024)
by: Dutta, Soumya, et al.
Published: (2024)
Information Retrieval for ZeroSpeech 2021: The Submission by University of Wroclaw
by: Chorowski, Jan, et al.
Published: (2021)
by: Chorowski, Jan, et al.
Published: (2021)
Multi-label Zero-Shot Audio Classification with Temporal Attention
by: Dogan, Duygu, et al.
Published: (2024)
by: Dogan, Duygu, et al.
Published: (2024)
Multi-modal Adversarial Training for Zero-Shot Voice Cloning
by: Janiczek, John, et al.
Published: (2024)
by: Janiczek, John, et al.
Published: (2024)
Audio Processing using Pattern Recognition for Music Genre Classification
by: Chatterjee, Sivangi, et al.
Published: (2024)
by: Chatterjee, Sivangi, et al.
Published: (2024)
Pruning as Regularization: Sensitivity-Aware One-Shot Pruning in ASR
by: Irigoyen, Julian, et al.
Published: (2025)
by: Irigoyen, Julian, et al.
Published: (2025)
SepPrune: Structured Pruning for Efficient Deep Speech Separation
by: Li, Yuqi, et al.
Published: (2025)
by: Li, Yuqi, et al.
Published: (2025)
GE2E-AC: Generalized End-to-End Loss Training for Accent Classification
by: Watanabe, Chihiro, et al.
Published: (2024)
by: Watanabe, Chihiro, et al.
Published: (2024)
Improving Perceptual Audio Aesthetic Assessment via Triplet Loss and Self-Supervised Embeddings
by: Wisnu, Dyah A. M. G., et al.
Published: (2025)
by: Wisnu, Dyah A. M. G., et al.
Published: (2025)
Similar Items
-
Scalable Speech Enhancement with Dynamic Channel Pruning
by: Miccini, Riccardo, et al.
Published: (2024) -
OnDA: On-device Channel Pruning for Efficient Personalized Keyword Spotting
by: Risso, Matteo, et al.
Published: (2026) -
Revisit Micro-batch Clipping: Adaptive Data Pruning via Gradient Manipulation
by: Wang, Lun
Published: (2024) -
Music2Latent: Consistency Autoencoders for Latent Audio Compression
by: Pasini, Marco, et al.
Published: (2024) -
From Diet to Free Lunch: Estimating Auxiliary Signal Properties using Dynamic Pruning Masks in Speech Enhancement Networks
by: Miccini, Riccardo, et al.
Published: (2026)