Saved in:
| Main Authors: | Wang, Zihao, Ma, Le, Zhang, Chen, Han, Bo, Xu, Yunfei, Wang, Yikai, Chen, Xinyi, Hong, HaoRong, Liu, Wenbo, Wu, Xinda, Zhang, Kejun |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2305.08029 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Generation of Musical Timbres using a Text-Guided Diffusion Model
by: Yuan, Weixuan, et al.
Published: (2025)
by: Yuan, Weixuan, et al.
Published: (2025)
Self-Improvement for Audio Large Language Model using Unlabeled Speech
by: Wang, Shaowen, et al.
Published: (2025)
by: Wang, Shaowen, et al.
Published: (2025)
MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion
by: Li, Pengcheng, et al.
Published: (2024)
by: Li, Pengcheng, et al.
Published: (2024)
OBHS: An Optimized Block Huffman Scheme for Real-Time Audio Compression
by: Mahfi, Muntahi Safwan, et al.
Published: (2025)
by: Mahfi, Muntahi Safwan, et al.
Published: (2025)
PerceiverS: A Multi-Scale Perceiver with Effective Segmentation for Long-Term Expressive Symbolic Music Generation
by: Yi, Yungang, et al.
Published: (2024)
by: Yi, Yungang, et al.
Published: (2024)
Audio Foundation Models Outperform Symbolic Representations for Piano Performance Evaluation
by: Dhiman, Jai
Published: (2026)
by: Dhiman, Jai
Published: (2026)
SeamlessEdit: Background Noise Aware Zero-Shot Speech Editing with in-Context Enhancement
by: Chen, Kuan-Yu, et al.
Published: (2025)
by: Chen, Kuan-Yu, et al.
Published: (2025)
Adaptable Symbolic Music Infilling with MIDI-RWKV
by: Zhou-Zheng, Christian, et al.
Published: (2025)
by: Zhou-Zheng, Christian, et al.
Published: (2025)
Dichotic harmony for the musical practice
by: Madgazin, Vadim R.
Published: (2010)
by: Madgazin, Vadim R.
Published: (2010)
Quantum-Enhanced Analysis and Grading of Vocal Performance
by: Agarwal, Rohan
Published: (2025)
by: Agarwal, Rohan
Published: (2025)
ParaNoise-SV: Integrated Approach for Noise-Robust Speaker Verification with Parallel Joint Learning of Speech Enhancement and Noise Extraction
by: Kim, Minu, et al.
Published: (2025)
by: Kim, Minu, et al.
Published: (2025)
Less Stress, More Privacy: Stress Detection on Anonymized Speech of Air Traffic Controllers
by: Viswanathan, Janaki, et al.
Published: (2025)
by: Viswanathan, Janaki, et al.
Published: (2025)
SFMS-ALR: Script-First Multilingual Speech Synthesis with Adaptive Locale Resolution
by: Donepudi, Dharma Teja
Published: (2025)
by: Donepudi, Dharma Teja
Published: (2025)
GraFPrint: A GNN-Based Approach for Audio Identification
by: Bhattacharjee, Aditya, et al.
Published: (2024)
by: Bhattacharjee, Aditya, et al.
Published: (2024)
Scalable Evaluation for Audio Identification via Synthetic Latent Fingerprint Generation
by: Bhattacharjee, Aditya, et al.
Published: (2025)
by: Bhattacharjee, Aditya, et al.
Published: (2025)
Taming Audio VAEs via Target-KL Regularization
by: Seetharaman, Prem, et al.
Published: (2026)
by: Seetharaman, Prem, et al.
Published: (2026)
Auptimize: Optimal Placement of Spatial Audio Cues for Extended Reality
by: Cho, Hyunsung, et al.
Published: (2024)
by: Cho, Hyunsung, et al.
Published: (2024)
Machine Learning Framework for Audio-Based Content Evaluation using MFCC, Chroma, Spectral Contrast, and Temporal Feature Engineering
by: Aristorenas, Aris J.
Published: (2024)
by: Aristorenas, Aris J.
Published: (2024)
Revisiting SSL for sound event detection: complementary fusion and adaptive post-processing
by: Cui, Hanfang, et al.
Published: (2025)
by: Cui, Hanfang, et al.
Published: (2025)
Joint Estimation of Piano Dynamics and Metrical Structure with a Multi-task Multi-Scale Network
by: He, Zhanhong, et al.
Published: (2025)
by: He, Zhanhong, et al.
Published: (2025)
AudioMiXR: Spatial Audio Object Manipulation with 6DoF for Sound Design in Augmented Reality
by: Woodard, Brandon, et al.
Published: (2025)
by: Woodard, Brandon, et al.
Published: (2025)
Enhanced DareFightingICE Competitions: Sound Design and AI Competitions
by: Khan, Ibrahim, et al.
Published: (2024)
by: Khan, Ibrahim, et al.
Published: (2024)
Compositional Phoneme Approximation for L1-Grounded L2 Pronunciation Training
by: Park, Jisang, et al.
Published: (2024)
by: Park, Jisang, et al.
Published: (2024)
acoupi: An Open-Source Python Framework for Deploying Bioacoustic AI Models on Edge Devices
by: Vuilliomenet, Aude, et al.
Published: (2025)
by: Vuilliomenet, Aude, et al.
Published: (2025)
Window Size Versus Accuracy Experiments in Voice Activity Detectors
by: McKinnon, Max, et al.
Published: (2026)
by: McKinnon, Max, et al.
Published: (2026)
M6(GPT)3: Generating Multitrack Modifiable Multi-Minute MIDI Music from Text using Genetic algorithms, Probabilistic methods and GPT Models in any Progression and Time Signature
by: Poćwiardowski, Jakub, et al.
Published: (2024)
by: Poćwiardowski, Jakub, et al.
Published: (2024)
AI Harmonizer: Expanding Vocal Expression with a Generative Neurosymbolic Music AI System
by: Blanchard, Lancelot, et al.
Published: (2025)
by: Blanchard, Lancelot, et al.
Published: (2025)
SemAlignVC: Enhancing zero-shot timbre conversion using semantic alignment
by: Mehta, Shivam, et al.
Published: (2025)
by: Mehta, Shivam, et al.
Published: (2025)
Improving Cross-Lingual Phonetic Representation of Low-Resource Languages Through Language Similarity Analysis
by: Kim, Minu, et al.
Published: (2025)
by: Kim, Minu, et al.
Published: (2025)
MaskClip: Detachable Clip-on Piezoelectric Sensing of Mask Surface Vibrations for Real-time Noise-Robust Speech Input
by: Hiraki, Hirotaka, et al.
Published: (2025)
by: Hiraki, Hirotaka, et al.
Published: (2025)
Make Some Noise: Towards LLM audio reasoning and generation using sound tokens
by: Mehta, Shivam, et al.
Published: (2025)
by: Mehta, Shivam, et al.
Published: (2025)
Prevailing Research Areas for Music AI in the Era of Foundation Models
by: Wei, Megan, et al.
Published: (2024)
by: Wei, Megan, et al.
Published: (2024)
Fighting Game Adaptive Background Music for Improved Gameplay
by: Khan, Ibrahim, et al.
Published: (2024)
by: Khan, Ibrahim, et al.
Published: (2024)
Can pre-trained Deep Learning models predict groove ratings?
by: Marmoret, Axel, et al.
Published: (2026)
by: Marmoret, Axel, et al.
Published: (2026)
Acoustic Wave Modeling Using 2D FDTD: Applications in Unreal Engine For Dynamic Sound Rendering
by: Samsurya, Bilkent
Published: (2025)
by: Samsurya, Bilkent
Published: (2025)
Should you use a probabilistic duration model in TTS? Probably! Especially for spontaneous speech
by: Mehta, Shivam, et al.
Published: (2024)
by: Mehta, Shivam, et al.
Published: (2024)
A Framework for Multimodal Medical Image Interaction
by: Schütz, Laura, et al.
Published: (2024)
by: Schütz, Laura, et al.
Published: (2024)
Real-Time Emergency Vehicle Detection using Mel Spectrograms and Regular Expressions
by: Pacheco-Gonzalez, Alberto, et al.
Published: (2023)
by: Pacheco-Gonzalez, Alberto, et al.
Published: (2023)
STRUM: A Spectral Transcription and Rhythm Understanding Model for End-to-End Generation of Playable Rhythm-Game Charts
by: Opria, Joshua
Published: (2026)
by: Opria, Joshua
Published: (2026)
An audio-to-analysis pipeline with certified transcription for information-theoretic profiling of the piano repertoire
by: Jalbert-Desforges, Fred
Published: (2026)
by: Jalbert-Desforges, Fred
Published: (2026)
Similar Items
-
Generation of Musical Timbres using a Text-Guided Diffusion Model
by: Yuan, Weixuan, et al.
Published: (2025) -
Self-Improvement for Audio Large Language Model using Unlabeled Speech
by: Wang, Shaowen, et al.
Published: (2025) -
MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion
by: Li, Pengcheng, et al.
Published: (2024) -
OBHS: An Optimized Block Huffman Scheme for Real-Time Audio Compression
by: Mahfi, Muntahi Safwan, et al.
Published: (2025) -
PerceiverS: A Multi-Scale Perceiver with Effective Segmentation for Long-Term Expressive Symbolic Music Generation
by: Yi, Yungang, et al.
Published: (2024)