:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Zihao, Ma, Le, Zhang, Chen, Han, Bo, Xu, Yunfei, Wang, Yikai, Chen, Xinyi, Hong, HaoRong, Liu, Wenbo, Wu, Xinda, Zhang, Kejun
Format:	Preprint
Published:	2023
Subjects:	Sound Artificial Intelligence Audio and Speech Processing H.5.5; F.2.2
Online Access:	https://arxiv.org/abs/2305.08029
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Generation of Musical Timbres using a Text-Guided Diffusion Model
by: Yuan, Weixuan, et al.
Published: (2025)

Self-Improvement for Audio Large Language Model using Unlabeled Speech
by: Wang, Shaowen, et al.
Published: (2025)

MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion
by: Li, Pengcheng, et al.
Published: (2024)

OBHS: An Optimized Block Huffman Scheme for Real-Time Audio Compression
by: Mahfi, Muntahi Safwan, et al.
Published: (2025)

PerceiverS: A Multi-Scale Perceiver with Effective Segmentation for Long-Term Expressive Symbolic Music Generation
by: Yi, Yungang, et al.
Published: (2024)

Audio Foundation Models Outperform Symbolic Representations for Piano Performance Evaluation
by: Dhiman, Jai
Published: (2026)

SeamlessEdit: Background Noise Aware Zero-Shot Speech Editing with in-Context Enhancement
by: Chen, Kuan-Yu, et al.
Published: (2025)

Adaptable Symbolic Music Infilling with MIDI-RWKV
by: Zhou-Zheng, Christian, et al.
Published: (2025)

Dichotic harmony for the musical practice
by: Madgazin, Vadim R.
Published: (2010)

Quantum-Enhanced Analysis and Grading of Vocal Performance
by: Agarwal, Rohan
Published: (2025)

ParaNoise-SV: Integrated Approach for Noise-Robust Speaker Verification with Parallel Joint Learning of Speech Enhancement and Noise Extraction
by: Kim, Minu, et al.
Published: (2025)

Less Stress, More Privacy: Stress Detection on Anonymized Speech of Air Traffic Controllers
by: Viswanathan, Janaki, et al.
Published: (2025)

SFMS-ALR: Script-First Multilingual Speech Synthesis with Adaptive Locale Resolution
by: Donepudi, Dharma Teja
Published: (2025)

GraFPrint: A GNN-Based Approach for Audio Identification
by: Bhattacharjee, Aditya, et al.
Published: (2024)

Scalable Evaluation for Audio Identification via Synthetic Latent Fingerprint Generation
by: Bhattacharjee, Aditya, et al.
Published: (2025)

Taming Audio VAEs via Target-KL Regularization
by: Seetharaman, Prem, et al.
Published: (2026)

Auptimize: Optimal Placement of Spatial Audio Cues for Extended Reality
by: Cho, Hyunsung, et al.
Published: (2024)

Machine Learning Framework for Audio-Based Content Evaluation using MFCC, Chroma, Spectral Contrast, and Temporal Feature Engineering
by: Aristorenas, Aris J.
Published: (2024)

Revisiting SSL for sound event detection: complementary fusion and adaptive post-processing
by: Cui, Hanfang, et al.
Published: (2025)

Joint Estimation of Piano Dynamics and Metrical Structure with a Multi-task Multi-Scale Network
by: He, Zhanhong, et al.
Published: (2025)

AudioMiXR: Spatial Audio Object Manipulation with 6DoF for Sound Design in Augmented Reality
by: Woodard, Brandon, et al.
Published: (2025)

Enhanced DareFightingICE Competitions: Sound Design and AI Competitions
by: Khan, Ibrahim, et al.
Published: (2024)

Compositional Phoneme Approximation for L1-Grounded L2 Pronunciation Training
by: Park, Jisang, et al.
Published: (2024)

acoupi: An Open-Source Python Framework for Deploying Bioacoustic AI Models on Edge Devices
by: Vuilliomenet, Aude, et al.
Published: (2025)

Window Size Versus Accuracy Experiments in Voice Activity Detectors
by: McKinnon, Max, et al.
Published: (2026)

M6(GPT)3: Generating Multitrack Modifiable Multi-Minute MIDI Music from Text using Genetic algorithms, Probabilistic methods and GPT Models in any Progression and Time Signature
by: Poćwiardowski, Jakub, et al.
Published: (2024)

AI Harmonizer: Expanding Vocal Expression with a Generative Neurosymbolic Music AI System
by: Blanchard, Lancelot, et al.
Published: (2025)

SemAlignVC: Enhancing zero-shot timbre conversion using semantic alignment
by: Mehta, Shivam, et al.
Published: (2025)

Improving Cross-Lingual Phonetic Representation of Low-Resource Languages Through Language Similarity Analysis
by: Kim, Minu, et al.
Published: (2025)

MaskClip: Detachable Clip-on Piezoelectric Sensing of Mask Surface Vibrations for Real-time Noise-Robust Speech Input
by: Hiraki, Hirotaka, et al.
Published: (2025)

Make Some Noise: Towards LLM audio reasoning and generation using sound tokens
by: Mehta, Shivam, et al.
Published: (2025)

Prevailing Research Areas for Music AI in the Era of Foundation Models
by: Wei, Megan, et al.
Published: (2024)

Fighting Game Adaptive Background Music for Improved Gameplay
by: Khan, Ibrahim, et al.
Published: (2024)

Can pre-trained Deep Learning models predict groove ratings?
by: Marmoret, Axel, et al.
Published: (2026)

Acoustic Wave Modeling Using 2D FDTD: Applications in Unreal Engine For Dynamic Sound Rendering
by: Samsurya, Bilkent
Published: (2025)

Should you use a probabilistic duration model in TTS? Probably! Especially for spontaneous speech
by: Mehta, Shivam, et al.
Published: (2024)

A Framework for Multimodal Medical Image Interaction
by: Schütz, Laura, et al.
Published: (2024)

Real-Time Emergency Vehicle Detection using Mel Spectrograms and Regular Expressions
by: Pacheco-Gonzalez, Alberto, et al.
Published: (2023)

STRUM: A Spectral Transcription and Rhythm Understanding Model for End-to-End Generation of Playable Rhythm-Game Charts
by: Opria, Joshua
Published: (2026)

An audio-to-analysis pipeline with certified transcription for information-theoretic profiling of the piano repertoire
by: Jalbert-Desforges, Fred
Published: (2026)