:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ni-Hahn, Stephen, Xu, Weihan, Yin, Jerry, Zhu, Rico, Mak, Simon, Jiang, Yue, Rudin, Cynthia
Format:	Preprint
Published:	2024
Subjects:	Sound Artificial Intelligence
Online Access:	https://arxiv.org/abs/2408.07184
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

AutoSchA: Automatic Hierarchical Music Representations via Multi-Relational Node Isolation
by: Ni-Hahn, Stephen, et al.
Published: (2025)

ProGress: Structured Music Generation via Graph Diffusion and Hierarchical Music Analysis
by: Ni-Hahn, Stephen, et al.
Published: (2025)

YNote: A Novel Music Notation for Fine-Tuning LLMs in Music Generation
by: Lu, Shao-Chien, et al.
Published: (2025)

ONOTE: Benchmarking Omnimodal Notation Processing for Expert-level Music Intelligence
by: Ma, Menghe, et al.
Published: (2026)

Via Score to Performance: Efficient Human-Controllable Long Song Generation with Bar-Level Symbolic Notation
by: Wang, Tongxi, et al.
Published: (2025)

EMelodyGen: Emotion-Conditioned Melody Generation in ABC Notation with the Musical Feature Template
by: Zhou, Monan, et al.
Published: (2023)

Bridging Biological Hearing and Neuromorphic Computing: End-to-End Time-Domain Audio Signal Processing with Reservoir Computing
by: Sebastian, Rinku, et al.
Published: (2026)

What Do Language Models Hear? Probing for Auditory Representations in Language Models
by: Ngo, Jerry, et al.
Published: (2024)

NOTA: Multimodal Music Notation Understanding for Visual Large Language Model
by: Tang, Mingni, et al.
Published: (2025)

Aria-MIDI: A Dataset of Piano MIDI Files for Symbolic Music Modeling
by: Bradshaw, Louis, et al.
Published: (2025)

Evaluation of Deep Audio Representations for Hearables
by: Gröger, Fabian, et al.
Published: (2025)

Infant Cry Detection Using Causal Temporal Representation
by: Fu, Minghao, et al.
Published: (2025)

Generating Symbolic Music from Natural Language Prompts using an LLM-Enhanced Dataset
by: Xu, Weihan, et al.
Published: (2024)

Cross-Domain Audio Deepfake Detection: Dataset and Analysis
by: Li, Yuang, et al.
Published: (2024)

SCDF: A Speaker Characteristics DeepFake Speech Dataset for Bias Analysis
by: Staněk, Vojtěch, et al.
Published: (2025)

Quantize More, Lose Less: Autoregressive Generation from Residually Quantized Speech Representations
by: Han, Yichen, et al.
Published: (2025)

Tadabur: A Large-Scale Quran Audio Dataset
by: Alherran, Faisal
Published: (2026)

Towards Lightweight and Stable Zero-shot TTS with Self-distilled Representation Disentanglement
by: Chen, Qianniu, et al.
Published: (2025)

DAStatFormer: A Hybrid Multibranch Transformer with Statistical Feature Integration for DAS-Based Pattern Recognitions
by: Dione, Michel, et al.
Published: (2026)

Speech-Forensics: Towards Comprehensive Synthetic Speech Dataset Establishment and Analysis
by: Ji, Zhoulin, et al.
Published: (2024)

Deepfake Audio Detection Using Self-supervised Fusion Representations
by: Zaman, Khalid, et al.
Published: (2026)

Perceptually Aligning Representations of Music via Noise-Augmented Autoencoders
by: Bjare, Mathias Rose, et al.
Published: (2025)

MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing
by: Huang, Yu-Fen, et al.
Published: (2024)

Prosodic Boundary-Aware Streaming Generation for LLM-Based TTS with Streaming Text Input
by: Liu, Changsong, et al.
Published: (2026)

Layer-wise Investigation of Large-Scale Self-Supervised Music Representation Models
by: Zhou, Yizhi, et al.
Published: (2025)

Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning
by: Quelennec, Aurian, et al.
Published: (2025)

NSTR: Neural Spectral Transport Representation for Space-Varying Frequency Fields
by: Versace, Plein
Published: (2025)

Rethinking Leveraging Pre-Trained Multi-Layer Representations for Speaker Verification
by: Kim, Jin Sob, et al.
Published: (2025)

Enabling Automatic Disordered Speech Recognition: An Impaired Speech Dataset in the Akan Language
by: Wiafe, Isaac, et al.
Published: (2026)

HAIM: Human-AI Music Datasets for AI Music Production Tracking Benchmark
by: Go, Seonghyeon, et al.
Published: (2026)

Hear: Hierarchically Enhanced Aesthetic Representations For Multidimensional Music Evaluation
by: Liu, Shuyang, et al.
Published: (2025)

MATPAC++: Enhanced Masked Latent Prediction for Self-Supervised Audio Representation Learning
by: Quelennec, Aurian, et al.
Published: (2025)

JamendoMaxCaps: A Large Scale Music-caption Dataset with Imputed Metadata
by: Roy, Abhinaba, et al.
Published: (2025)

Structure-Aware Piano Accompaniment via Style Planning and Dataset-Aligned Pattern Retrieval
by: Zang, Wanyu, et al.
Published: (2026)

Multi-Accent Mandarin Dry-Vocal Singing Dataset: Benchmark for Singing Accent Recognition
by: Wang, Zihao, et al.
Published: (2025)

Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained Representations
by: Jiang, Xue, et al.
Published: (2025)

UniWhisper: Efficient Continual Multi-task Training for Robust Universal Audio Representation
by: Chen, Yuxuan, et al.
Published: (2026)

Cross-Cultural Bias in Mel-Scale Representations: Evidence and Alternatives from Speech and Music
by: Chauhan, Shivam, et al.
Published: (2026)

AudioMoG: Guiding Audio Generation with Mixture-of-Guidance
by: Wang, Junyou, et al.
Published: (2025)

DDFAD: Dataset Distillation Framework for Audio Data
by: Jiang, Wenbo, et al.
Published: (2024)