:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Cwitkowitz, Frank, Duan, Zhiyao
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing Machine Learning Sound
Online Access:	https://arxiv.org/abs/2402.15569
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Investigating an Overfitting and Degeneration Phenomenon in Self-Supervised Multi-Pitch Estimation
by: Cwitkowitz, Frank, et al.
Published: (2025)

SynthTab: Leveraging Synthesized Data for Guitar Tablature Transcription
by: Zang, Yongyi, et al.
Published: (2023)

Translation-Equivariant Self-Supervised Learning for Pitch Estimation with Optimal Transport
by: Torres, Bernardo, et al.
Published: (2025)

ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Speed
by: Chen, Meiying, et al.
Published: (2022)

Scoring Time Intervals using Non-Hierarchical Transformer For Automatic Piano Transcription
by: Yan, Yujia, et al.
Published: (2024)

Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription
by: Cwitkowitz, Frank, et al.
Published: (2023)

PESTO: Real-Time Pitch Estimation with Self-supervised Transposition-equivariant Objective
by: Riou, Alain, et al.
Published: (2025)

Towards Supervised Performance on Speaker Verification with Self-Supervised Learning by Leveraging Large-Scale ASR Models
by: Miara, Victor, et al.
Published: (2024)

Self-Supervised Embeddings for Detecting Individual Symptoms of Depression
by: Dumpala, Sri Harsha, et al.
Published: (2024)

Pseudo-Cepstrum: Pitch Modification for Mel-Based Neural Vocoders
by: Ellinas, Nikolaos, et al.
Published: (2025)

HyperGANStrument: Instrument Sound Synthesis and Editing with Pitch-Invariant Hypernetworks
by: Zhang, Zhe, et al.
Published: (2024)

MT-SLVR: Multi-Task Self-Supervised Learning for Transformation In(Variant) Representations
by: Heggan, Calum, et al.
Published: (2023)

PeriodGrad: Towards Pitch-Controllable Neural Vocoder Based on a Diffusion Probabilistic Model
by: Hono, Yukiya, et al.
Published: (2024)

Self-Supervised Speech Quality Estimation and Enhancement Using Only Clean Speech
by: Fu, Szu-Wei, et al.
Published: (2024)

Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
by: Fu, Yonggan, et al.
Published: (2022)

Singer Identity Representation Learning using Self-Supervised Techniques
by: Torres, Bernardo, et al.
Published: (2024)

Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning
by: Wu, Haibin, et al.
Published: (2021)

Self-Supervised Learning for Speaker Recognition: A study and review
by: Lepage, Theo, et al.
Published: (2026)

Self-Supervised Learning for Few-Shot Bird Sound Classification
by: Moummad, Ilyass, et al.
Published: (2023)

PESTO: Pitch Estimation with Self-supervised Transposition-equivariant Objective
by: Riou, Alain, et al.
Published: (2023)

On the Transferability of Large-Scale Self-Supervision to Few-Shot Audio Classification
by: Heggan, Calum, et al.
Published: (2024)

The Effect of Batch Size on Contrastive Self-Supervised Speech Representation Learning
by: Vaessen, Nik, et al.
Published: (2024)

Self-Supervised Frameworks for Speaker Verification via Bootstrapped Positive Sampling
by: Lepage, Theo, et al.
Published: (2025)

Highly Efficient Real-Time Streaming and Fully On-Device Speaker Diarization with Multi-Stage Clustering
by: Wang, Quan, et al.
Published: (2022)

Additive Margin in Contrastive Self-Supervised Frameworks to Learn Discriminative Speaker Representations
by: Lepage, Theo, et al.
Published: (2024)

Label-Efficient Self-Supervised Speaker Verification With Information Maximization and Contrastive Learning
by: Lepage, Théo, et al.
Published: (2022)

Understanding Self-Supervised Learning of Speech Representation via Invariance and Redundancy Reduction
by: Brima, Yusuf, et al.
Published: (2023)

Towards Early Prediction of Self-Supervised Speech Model Performance
by: Whetten, Ryan, et al.
Published: (2025)

Phoneme-Level Deepfake Detection Across Emotional Conditions Using Self-Supervised Embeddings
by: Nallaguntla, Vamshi, et al.
Published: (2026)

Improving Perceptual Audio Aesthetic Assessment via Triplet Loss and Self-Supervised Embeddings
by: Wisnu, Dyah A. M. G., et al.
Published: (2025)

Singing Voice Conversion with Accompaniment Using Self-Supervised Representation-Based Melody Features
by: Chen, Wei, et al.
Published: (2025)

Enhancing Audio-Language Models through Self-Supervised Post-Training with Text-Audio Pairs
by: Sinha, Anshuman, et al.
Published: (2024)

A Lightweight Slot-Attention Framework for Multi-Instrument Multi-Pitch Estimation
by: Taenzer, Michael
Published: (2026)

Low-Resource Cross-Domain Singing Voice Synthesis via Reduced Self-Supervised Speech Representations
by: Kakoulidis, Panos, et al.
Published: (2024)

HARP 2.0: Expanding Hosted, Asynchronous, Remote Processing for Deep Learning in the DAW
by: Benetatos, Christodoulos, et al.
Published: (2025)

BabyHuBERT: Multilingual Self-Supervised Learning for Segmenting Speakers in Child-Centered Long-Form Recordings
by: Charlot, Théo, et al.
Published: (2025)

Windowed SummaryMixing: An Efficient Fine-Tuning of Self-Supervised Learning Models for Low-resource Speech Recognition
by: Menon, Aditya Srinivas, et al.
Published: (2026)

SwiftF0: Fast and Accurate Monophonic Pitch Detection
by: Nieradzik, Lars
Published: (2025)

SSPS: Self-Supervised Positive Sampling for Robust Self-Supervised Speaker Verification
by: Lepage, Theo, et al.
Published: (2025)

Cross-domain Neural Pitch and Periodicity Estimation
by: Morrison, Max, et al.
Published: (2023)