:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Stankevich, A., Nechepurenko, I., Shevchenko, A., Gremyachikh, L., Ustyuzhanin, A., Vasyukov, A.
Format:	Preprint
Published:	2021
Subjects:	Machine Learning Sound Audio and Speech Processing 86-10, 86A22 I.2.6
Online Access:	https://arxiv.org/abs/2110.08626
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Joint Feature and Output Distillation for Low-complexity Acoustic Scene Classification
by: Li, Haowen, et al.
Published: (2025)

High-resolution closed-loop seismic inversion network in time-frequency phase mixed domain
by: Liu, Yingtian, et al.
Published: (2024)

Noise-Robust Keyword Spotting through Self-supervised Pretraining
by: Mørk, Jacob, et al.
Published: (2024)

Self-supervised Pretraining for Robust Personalized Voice Activity Detection in Adverse Conditions
by: Bovbjerg, Holger Severin, et al.
Published: (2023)

Learning Robust Spatial Representations from Binaural Audio through Feature Distillation
by: Bovbjerg, Holger Severin, et al.
Published: (2025)

Noise-Robust Target-Speaker Voice Activity Detection Through Self-Supervised Pretraining
by: Bovbjerg, Holger Severin, et al.
Published: (2025)

A Dynamic Learning Observatory Reveals the Rapid Salinization of Satkhira, Bangladesh
by: Sarkar, Showmitra Kumar, et al.
Published: (2026)

The Nash-MTL-STCN Method For Prestack Three-Parameter Inversion
by: Liu, Yingtian, et al.
Published: (2024)

Audio-based Kinship Verification Using Age Domain Conversion
by: Sun, Qiyang, et al.
Published: (2024)

Fast Diffusion Model For Seismic Data Noise Attenuation
by: Peng, Junheng, et al.
Published: (2024)

Seismic Data Strong Noise Attenuation Based on Diffusion Model and Principal Component Analysis
by: Peng, Junheng, et al.
Published: (2023)

TuneGenie: Reasoning-based LLM agents for preferential music generation
by: Pandey, Amitesh, et al.
Published: (2025)

KinSPEAK: Improving speech recognition for Kinyarwanda via semi-supervised learning methods
by: Nzeyimana, Antoine
Published: (2023)

Three-dimensional inversion of gravity data using implicit neural representations and scientific machine learning
by: Mishra, Pankaj K, et al.
Published: (2025)

Passive Underwater Acoustic Signal Separation based on Feature Decoupling Dual-path Network
by: Liu, Yucheng, et al.
Published: (2025)

Experimental Study: Enhancing Voice Spoofing Detection Models with wav2vec 2.0
by: Kang, Taein, et al.
Published: (2024)

Rethinking Masking Strategies for Masked Prediction-based Audio Self-supervised Learning
by: Niizumi, Daisuke, et al.
Published: (2026)

Beyond Deep Learning: Speech Segmentation and Phone Classification with Neural Assemblies
by: Adelson, Trevor, et al.
Published: (2026)

Deepfake audio as a data augmentation technique for training automatic speech to text transcription models
by: Ferreira, Alexandre R., et al.
Published: (2023)

Quantum-Enhanced Analysis and Grading of Vocal Performance
by: Agarwal, Rohan
Published: (2025)

Modeling L1 Influence on L2 Pronunciation: An MFCC-Based Framework for Explainable Machine Learning and Pedagogical Feedback
by: Jahanbin, Peyman
Published: (2025)

Analyzing and Exploring Training Recipes for Large-Scale Transformer-Based Weather Prediction
by: Willard, Jared D., et al.
Published: (2024)

A Multimodal Symphony: Integrating Taste and Sound through Generative AI
by: Spanio, Matteo, et al.
Published: (2025)

GraFPrint: A GNN-Based Approach for Audio Identification
by: Bhattacharjee, Aditya, et al.
Published: (2024)

Audio Foundation Models Outperform Symbolic Representations for Piano Performance Evaluation
by: Dhiman, Jai
Published: (2026)

Scalable Evaluation for Audio Identification via Synthetic Latent Fingerprint Generation
by: Bhattacharjee, Aditya, et al.
Published: (2025)

Fine-tuning Pre-trained Audio Models for COVID-19 Detection: A Technical Report
by: de Brito, Daniel Oliveira, et al.
Published: (2025)

Machine Learning Framework for Audio-Based Content Evaluation using MFCC, Chroma, Spectral Contrast, and Temporal Feature Engineering
by: Aristorenas, Aris J.
Published: (2024)

Joint Estimation of Piano Dynamics and Metrical Structure with a Multi-task Multi-Scale Network
by: He, Zhanhong, et al.
Published: (2025)

ChordSync: Conformer-Based Alignment of Chord Annotations to Music Audio
by: Poltronieri, Andrea, et al.
Published: (2024)

Symbolic Audio Classification via Modal Decision Tree Learning
by: Marzano, Enrico, et al.
Published: (2025)

Should you use a probabilistic duration model in TTS? Probably! Especially for spontaneous speech
by: Mehta, Shivam, et al.
Published: (2024)

Detection and Classification of Cetacean Echolocation Clicks using Image-based Object Detection Methods applied to Advanced Wavelet-based Transformations
by: Hauer, Christopher
Published: (2026)

Make Some Noise: Towards LLM audio reasoning and generation using sound tokens
by: Mehta, Shivam, et al.
Published: (2025)

Local Diagnostics of Continuous Normalizing Flow for Out-of-Distribution Detection
by: Cao, Xinwei, et al.
Published: (2026)

Decoding Phone Pairs from MEG Signals Across Speech Modalities
by: de Zuazo, Xabier, et al.
Published: (2025)

SemAlignVC: Enhancing zero-shot timbre conversion using semantic alignment
by: Mehta, Shivam, et al.
Published: (2025)

Accurate typhoon intensity forecasts using a non-iterative spatiotemporal transformer model
by: Qu, Hongyu, et al.
Published: (2025)

A Dual-TransUNet Deep Learning Framework for Multi-Source Precipitation Merging and Improving Seasonal and Extreme Estimates
by: Ye, Yuchen, et al.
Published: (2026)

SigWavNet: Learning Multiresolution Signal Wavelet Network for Speech Emotion Recognition
by: Nfissi, Alaa, et al.
Published: (2025)