Saved in:
| Main Authors: | Ni-Hahn, Stephen, Xu, Weihan, Yin, Jerry, Zhu, Rico, Mak, Simon, Jiang, Yue, Rudin, Cynthia |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.07184 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AutoSchA: Automatic Hierarchical Music Representations via Multi-Relational Node Isolation
by: Ni-Hahn, Stephen, et al.
Published: (2025)
by: Ni-Hahn, Stephen, et al.
Published: (2025)
ProGress: Structured Music Generation via Graph Diffusion and Hierarchical Music Analysis
by: Ni-Hahn, Stephen, et al.
Published: (2025)
by: Ni-Hahn, Stephen, et al.
Published: (2025)
YNote: A Novel Music Notation for Fine-Tuning LLMs in Music Generation
by: Lu, Shao-Chien, et al.
Published: (2025)
by: Lu, Shao-Chien, et al.
Published: (2025)
ONOTE: Benchmarking Omnimodal Notation Processing for Expert-level Music Intelligence
by: Ma, Menghe, et al.
Published: (2026)
by: Ma, Menghe, et al.
Published: (2026)
Via Score to Performance: Efficient Human-Controllable Long Song Generation with Bar-Level Symbolic Notation
by: Wang, Tongxi, et al.
Published: (2025)
by: Wang, Tongxi, et al.
Published: (2025)
EMelodyGen: Emotion-Conditioned Melody Generation in ABC Notation with the Musical Feature Template
by: Zhou, Monan, et al.
Published: (2023)
by: Zhou, Monan, et al.
Published: (2023)
Bridging Biological Hearing and Neuromorphic Computing: End-to-End Time-Domain Audio Signal Processing with Reservoir Computing
by: Sebastian, Rinku, et al.
Published: (2026)
by: Sebastian, Rinku, et al.
Published: (2026)
What Do Language Models Hear? Probing for Auditory Representations in Language Models
by: Ngo, Jerry, et al.
Published: (2024)
by: Ngo, Jerry, et al.
Published: (2024)
NOTA: Multimodal Music Notation Understanding for Visual Large Language Model
by: Tang, Mingni, et al.
Published: (2025)
by: Tang, Mingni, et al.
Published: (2025)
Aria-MIDI: A Dataset of Piano MIDI Files for Symbolic Music Modeling
by: Bradshaw, Louis, et al.
Published: (2025)
by: Bradshaw, Louis, et al.
Published: (2025)
Evaluation of Deep Audio Representations for Hearables
by: Gröger, Fabian, et al.
Published: (2025)
by: Gröger, Fabian, et al.
Published: (2025)
Infant Cry Detection Using Causal Temporal Representation
by: Fu, Minghao, et al.
Published: (2025)
by: Fu, Minghao, et al.
Published: (2025)
Generating Symbolic Music from Natural Language Prompts using an LLM-Enhanced Dataset
by: Xu, Weihan, et al.
Published: (2024)
by: Xu, Weihan, et al.
Published: (2024)
Cross-Domain Audio Deepfake Detection: Dataset and Analysis
by: Li, Yuang, et al.
Published: (2024)
by: Li, Yuang, et al.
Published: (2024)
SCDF: A Speaker Characteristics DeepFake Speech Dataset for Bias Analysis
by: Staněk, Vojtěch, et al.
Published: (2025)
by: Staněk, Vojtěch, et al.
Published: (2025)
Quantize More, Lose Less: Autoregressive Generation from Residually Quantized Speech Representations
by: Han, Yichen, et al.
Published: (2025)
by: Han, Yichen, et al.
Published: (2025)
Tadabur: A Large-Scale Quran Audio Dataset
by: Alherran, Faisal
Published: (2026)
by: Alherran, Faisal
Published: (2026)
Towards Lightweight and Stable Zero-shot TTS with Self-distilled Representation Disentanglement
by: Chen, Qianniu, et al.
Published: (2025)
by: Chen, Qianniu, et al.
Published: (2025)
DAStatFormer: A Hybrid Multibranch Transformer with Statistical Feature Integration for DAS-Based Pattern Recognitions
by: Dione, Michel, et al.
Published: (2026)
by: Dione, Michel, et al.
Published: (2026)
Speech-Forensics: Towards Comprehensive Synthetic Speech Dataset Establishment and Analysis
by: Ji, Zhoulin, et al.
Published: (2024)
by: Ji, Zhoulin, et al.
Published: (2024)
Deepfake Audio Detection Using Self-supervised Fusion Representations
by: Zaman, Khalid, et al.
Published: (2026)
by: Zaman, Khalid, et al.
Published: (2026)
Perceptually Aligning Representations of Music via Noise-Augmented Autoencoders
by: Bjare, Mathias Rose, et al.
Published: (2025)
by: Bjare, Mathias Rose, et al.
Published: (2025)
MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing
by: Huang, Yu-Fen, et al.
Published: (2024)
by: Huang, Yu-Fen, et al.
Published: (2024)
Prosodic Boundary-Aware Streaming Generation for LLM-Based TTS with Streaming Text Input
by: Liu, Changsong, et al.
Published: (2026)
by: Liu, Changsong, et al.
Published: (2026)
Layer-wise Investigation of Large-Scale Self-Supervised Music Representation Models
by: Zhou, Yizhi, et al.
Published: (2025)
by: Zhou, Yizhi, et al.
Published: (2025)
Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning
by: Quelennec, Aurian, et al.
Published: (2025)
by: Quelennec, Aurian, et al.
Published: (2025)
NSTR: Neural Spectral Transport Representation for Space-Varying Frequency Fields
by: Versace, Plein
Published: (2025)
by: Versace, Plein
Published: (2025)
Rethinking Leveraging Pre-Trained Multi-Layer Representations for Speaker Verification
by: Kim, Jin Sob, et al.
Published: (2025)
by: Kim, Jin Sob, et al.
Published: (2025)
Enabling Automatic Disordered Speech Recognition: An Impaired Speech Dataset in the Akan Language
by: Wiafe, Isaac, et al.
Published: (2026)
by: Wiafe, Isaac, et al.
Published: (2026)
HAIM: Human-AI Music Datasets for AI Music Production Tracking Benchmark
by: Go, Seonghyeon, et al.
Published: (2026)
by: Go, Seonghyeon, et al.
Published: (2026)
Hear: Hierarchically Enhanced Aesthetic Representations For Multidimensional Music Evaluation
by: Liu, Shuyang, et al.
Published: (2025)
by: Liu, Shuyang, et al.
Published: (2025)
MATPAC++: Enhanced Masked Latent Prediction for Self-Supervised Audio Representation Learning
by: Quelennec, Aurian, et al.
Published: (2025)
by: Quelennec, Aurian, et al.
Published: (2025)
JamendoMaxCaps: A Large Scale Music-caption Dataset with Imputed Metadata
by: Roy, Abhinaba, et al.
Published: (2025)
by: Roy, Abhinaba, et al.
Published: (2025)
Structure-Aware Piano Accompaniment via Style Planning and Dataset-Aligned Pattern Retrieval
by: Zang, Wanyu, et al.
Published: (2026)
by: Zang, Wanyu, et al.
Published: (2026)
Multi-Accent Mandarin Dry-Vocal Singing Dataset: Benchmark for Singing Accent Recognition
by: Wang, Zihao, et al.
Published: (2025)
by: Wang, Zihao, et al.
Published: (2025)
Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained Representations
by: Jiang, Xue, et al.
Published: (2025)
by: Jiang, Xue, et al.
Published: (2025)
UniWhisper: Efficient Continual Multi-task Training for Robust Universal Audio Representation
by: Chen, Yuxuan, et al.
Published: (2026)
by: Chen, Yuxuan, et al.
Published: (2026)
Cross-Cultural Bias in Mel-Scale Representations: Evidence and Alternatives from Speech and Music
by: Chauhan, Shivam, et al.
Published: (2026)
by: Chauhan, Shivam, et al.
Published: (2026)
AudioMoG: Guiding Audio Generation with Mixture-of-Guidance
by: Wang, Junyou, et al.
Published: (2025)
by: Wang, Junyou, et al.
Published: (2025)
DDFAD: Dataset Distillation Framework for Audio Data
by: Jiang, Wenbo, et al.
Published: (2024)
by: Jiang, Wenbo, et al.
Published: (2024)
Similar Items
-
AutoSchA: Automatic Hierarchical Music Representations via Multi-Relational Node Isolation
by: Ni-Hahn, Stephen, et al.
Published: (2025) -
ProGress: Structured Music Generation via Graph Diffusion and Hierarchical Music Analysis
by: Ni-Hahn, Stephen, et al.
Published: (2025) -
YNote: A Novel Music Notation for Fine-Tuning LLMs in Music Generation
by: Lu, Shao-Chien, et al.
Published: (2025) -
ONOTE: Benchmarking Omnimodal Notation Processing for Expert-level Music Intelligence
by: Ma, Menghe, et al.
Published: (2026) -
Via Score to Performance: Efficient Human-Controllable Long Song Generation with Bar-Level Symbolic Notation
by: Wang, Tongxi, et al.
Published: (2025)