:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Huan, Chowdhury, Shreyan, Cancino-Chacón, Carlos Eduardo, Liang, Jinhua, Dixon, Simon, Widmer, Gerhard
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2406.14850
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Expressivity-aware Music Performance Retrieval using Mid-level Perceptual Features and Emotion Word Embeddings
by: Chowdhury, Shreyan, et al.
Published: (2024)

From Audio Encoders to Piano Judges: Benchmarking Performance Understanding for Solo Piano
by: Zhang, Huan, et al.
Published: (2024)

Sounding Out Reconstruction Error-Based Evaluation of Generative Models of Expressive Performance
by: Peter, Silvan David, et al.
Published: (2023)

Towards Musically Informed Evaluation of Piano Transcription Models
by: Hu, Patricia, et al.
Published: (2024)

RenderBox: Expressive Performance Rendering with Text Control
by: Zhang, Huan, et al.
Published: (2025)

Hierarchical Symbolic Pop Music Generation with Graph Neural Networks
by: Lim, Wen Qing, et al.
Published: (2024)

LLaQo: Towards a Query-Based Coach in Expressive Music Performance Assessment
by: Zhang, Huan, et al.
Published: (2024)

From Aesthetics to Human Preferences: Comparative Perspectives of Evaluating Text-to-Music Systems
by: Zhang, Huan, et al.
Published: (2025)

How to Infer Repeat Structures in MIDI Performances
by: Peter, Silvan, et al.
Published: (2025)

How does the teacher rate? Observations from the NeuroPiano dataset
by: Zhang, Huan, et al.
Published: (2024)

Improving Query-by-Vocal Imitation with Contrastive Learning and Audio Pretraining
by: Greif, Jonathan, et al.
Published: (2024)

Music Boomerang: Reusing Diffusion Models for Data Augmentation and Audio Manipulation
by: Fichtinger, Alexander, et al.
Published: (2025)

Emotion-Aware Speech Generation with Character-Specific Voices for Comics
by: Qian, Zhiwen, et al.
Published: (2025)

Reconstructing the Charlie Parker Omnibook using an audio-to-score automatic transcription pipeline
by: Riley, Xavier, et al.
Published: (2024)

Enhanced Automatic Drum Transcription via Drum Stem Source Separation
by: Riley, Xavier, et al.
Published: (2025)

Estimating Musical Surprisal from Audio in Autoregressive Diffusion Model Noise Spaces
by: Bjare, Mathias Rose, et al.
Published: (2025)

WavCraft: Audio Editing and Generation with Large Language Models
by: Liang, Jinhua, et al.
Published: (2024)

GAPS: A Large and Diverse Classical Guitar Dataset and Benchmark Transcription Model
by: Riley, Xavier, et al.
Published: (2024)

Moonbeam: A MIDI Foundation Model Using Both Absolute and Relative Music Attributes
by: Guo, Zixun, et al.
Published: (2025)

SMUG-Explain: A Framework for Symbolic Music Graph Explanations
by: Karystinaios, Emmanouil, et al.
Published: (2024)

Pairing Real-Time Piano Transcription with Symbol-level Tracking for Precise and Robust Score Following
by: Peter, Silvan, et al.
Published: (2025)

TheGlueNote: Learned Representations for Robust and Flexible Note Alignment
by: Peter, Silvan David, et al.
Published: (2024)

Exploring Performance-Complexity Trade-Offs in Sound Event Detection Models
by: Morocutti, Tobias, et al.
Published: (2025)

Bridging Paintings and Music -- Exploring Emotion based Music Generation through Paintings
by: Hisariya, Tanisha, et al.
Published: (2024)

Acoustic Prompt Tuning: Empowering Large Language Models with Audition Capabilities
by: Liang, Jinhua, et al.
Published: (2023)

Multi-Iteration Multi-Stage Fine-Tuning of Transformers for Sound Event Detection with Heterogeneous Datasets
by: Schmid, Florian, et al.
Published: (2024)

Language Models for Music Medicine Generation
by: Nikolakakis, Emmanouil, et al.
Published: (2024)

Fusing Audio and Metadata Embeddings Improves Language-based Audio Retrieval
by: Primus, Paul, et al.
Published: (2024)

Are Inherently Interpretable Models More Robust? A Study In Music Emotion Recognition
by: Hoedt, Katharina, et al.
Published: (2025)

Mind the Domain Gap: a Systematic Analysis on Bioacoustic Sound Event Detection
by: Liang, Jinhua, et al.
Published: (2024)

Improving Audio Spectrogram Transformers for Sound Event Detection Through Multi-Stage Training
by: Schmid, Florian, et al.
Published: (2024)

RUMAA: Repeat-Aware Unified Music Audio Analysis for Score-Performance Alignment, Transcription, and Mistake Detection
by: Chang, Sungkyun, et al.
Published: (2025)

High Resolution Guitar Transcription via Domain Adaptation
by: Riley, Xavier, et al.
Published: (2024)

Estimated Audio-Caption Correspondences Improve Language-Based Audio Retrieval
by: Primus, Paul, et al.
Published: (2024)

Beat this! Accurate beat tracking without DBN postprocessing
by: Foscarin, Francesco, et al.
Published: (2024)

TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining
by: Primus, Paul, et al.
Published: (2025)

Effective Pre-Training of Audio Transformers for Sound Event Detection
by: Schmid, Florian, et al.
Published: (2024)

AudioMorphix: Training-free audio editing with diffusion probabilistic models
by: Liang, Jinhua, et al.
Published: (2025)

Toward Natural Emotional Text-To-Speech System with Fine-Grained Non-Verbal Expression Control
by: Zhou, Wangzixi, et al.
Published: (2026)

Controlling Surprisal in Music Generation via Information Content Curve Matching
by: Bjare, Mathias Rose, et al.
Published: (2024)