:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Ogg, Mattson
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2502.02366
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Self-Supervised Speech Quality Assessment (S3QA): Leveraging Speech Foundation Models for a Scalable Speech Quality Metric
by: Ogg, Mattson, et al.
Published: (2025)

VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature
by: Du, Chenpeng, et al.
Published: (2022)

Data Selection Effects on Self-Supervised Learning of Audio Representations for French Audiovisual Broadcasts
by: Pelloin, Valentin, et al.
Published: (2026)

GigaAM: Efficient Self-Supervised Learner for Speech Recognition
by: Kutsakov, Aleksandr, et al.
Published: (2025)

Self-Supervised Multi-View Learning for Disentangled Music Audio Representations
by: Wilkins, Julia, et al.
Published: (2024)

Domain-Incremental Learning for Audio Classification
by: Mulimani, Manjunath, et al.
Published: (2024)

Exploring Self-Supervised Audio Models for Generalized Anomalous Sound Detection
by: Han, Bing, et al.
Published: (2025)

On the Transferability of Large-Scale Self-Supervision to Few-Shot Audio Classification
by: Heggan, Calum, et al.
Published: (2024)

SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model
by: Shams, Siavash, et al.
Published: (2024)

Pitch Contour Exploration Across Audio Domains: A Vision-Based Transfer Learning Approach
by: Abeßer, Jakob, et al.
Published: (2025)

Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation
by: Wu, Shih-Lun, et al.
Published: (2023)

DASS: Distilled Audio State Space Models Are Stronger and More Duration-Scalable Learners
by: Bhati, Saurabhchand, et al.
Published: (2024)

UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
by: Yang, Dongchao, et al.
Published: (2024)

Transfer Learning for Paediatric Sleep Apnoea Detection Using Physiology-Guided Acoustic Models
by: Niu, Chaoyue, et al.
Published: (2025)

Leveraging Self-supervised Audio Representations for Data-Efficient Acoustic Scene Classification
by: Cai, Yiqiang, et al.
Published: (2024)

Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers
by: Cappellazzo, Umberto, et al.
Published: (2023)

Leveraging Self-Supervised Audio-Visual Pretrained Models to Improve Vocoded Speech Intelligibility in Cochlear Implant Simulation
by: Lai, Richard Lee, et al.
Published: (2023)

Audio Deepfake Detection with Self-Supervised WavLM and Multi-Fusion Attentive Classifier
by: Guo, Yinlin, et al.
Published: (2023)

DeePAQ: A Perceptual Audio Quality Metric Based On Foundational Models and Weakly Supervised Learning
by: Jiang, Guanxin, et al.
Published: (2025)

Self-Supervised Learning of Spatial Acoustic Representation with Cross-Channel Signal Reconstruction and Multi-Channel Conformer
by: Yang, Bing, et al.
Published: (2023)

DSCLAP: Domain-Specific Contrastive Language-Audio Pre-Training
by: Liu, Shengqiang, et al.
Published: (2024)

Acoustic Non-Stationarity Objective Assessment with Hard Label Criteria for Supervised Learning Models
by: Zucatelli, Guilherme, et al.
Published: (2025)

AudioNet: Supervised Deep Hashing for Retrieval of Similar Audio Events
by: Dutta, Sagar, et al.
Published: (2025)

SAGA-SR: Semantically and Acoustically Guided Audio Super-Resolution
by: Im, Jaekwon, et al.
Published: (2025)

Acoustic Teleportation via Disentangled Neural Audio Codec Representations
by: Grundhuber, Philipp, et al.
Published: (2025)

Optimizing Domain-Adaptive Self-Supervised Learning for Clinical Voice-Based Disease Classification
by: Liu, Weixin, et al.
Published: (2026)

Analytic Study of Text-Free Speech Synthesis for Raw Audio using a Self-Supervised Learning Model
by: Park, Joonyong, et al.
Published: (2024)

MiMo-Audio: Audio Language Models are Few-Shot Learners
by: Core Team, et al.
Published: (2025)

Sub-band Domain Multi-Hypothesis Acoustic Echo Canceler Based Acoustic Scene Analysis
by: Southwell, Benjamin J, et al.
Published: (2025)

Prototype based Masked Audio Model for Self-Supervised Learning of Sound Event Detection
by: Cai, Pengfei, et al.
Published: (2024)

Domain Adaptation for Contrastive Audio-Language Models
by: Deshmukh, Soham, et al.
Published: (2024)

Bias and Fairness in Self-Supervised Acoustic Representations for Cognitive Impairment Detection
by: Gulzar, Kashaf, et al.
Published: (2026)

Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?
by: Aldeneh, Zakaria, et al.
Published: (2024)

Past, Present, and Future of Spatial Audio and Room Acoustics
by: Koyama, Shoichi, et al.
Published: (2025)

Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations
by: Yadav, Sarthak, et al.
Published: (2024)

Enhancing Audio-Language Models through Self-Supervised Post-Training with Text-Audio Pairs
by: Sinha, Anshuman, et al.
Published: (2024)

Leveraging Audio-Visual Data to Reduce the Multilingual Gap in Self-Supervised Speech Models
by: Blandón, María Andrea Cruz, et al.
Published: (2025)

SONAR: Self-Distilled Continual Pre-training for Domain Adaptive Audio Representation
by: Zhang, Yizhou, et al.
Published: (2025)

Audio Classification of Low Feature Spectrograms Utilizing Convolutional Neural Networks
by: Elias, Noel
Published: (2024)

MATS: An Audio Language Model under Text-only Supervision
by: Wang, Wen, et al.
Published: (2025)