:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Berti, Leonardo
Format:	Preprint
Published:	2024
Subjects:	Sound Machine Learning Multimedia Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2408.07020
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Siamese Residual Neural Network for Musical Shape Evaluation in Piano Performance Assessment
by: Li, Xiaoquan, et al.
Published: (2024)

Multimodal Dataset Normalization and Perceptual Validation for Music-Taste Correspondences
by: Spanio, Matteo, et al.
Published: (2026)

Music102: An $D_{12}$-equivariant transformer for chord progression accompaniment
by: Luo, Weiliang
Published: (2024)

BERT-like Pre-training for Symbolic Piano Music Classification Tasks
by: Chou, Yi-Hui, et al.
Published: (2021)

Speech Separation with Pretrained Frontend to Minimize Domain Mismatch
by: Wang, Wupeng, et al.
Published: (2024)

Improving BERT for Symbolic Music Understanding Using Token Denoising and Pianoroll Prediction
by: Wang, Jun-You, et al.
Published: (2025)

MIDI-GPT: A Controllable Generative Model for Computer-Assisted Multitrack Music Composition
by: Pasquier, Philippe, et al.
Published: (2025)

Language Model Based Text-to-Audio Generation: Anti-Causally Aligned Collaborative Residual Transformers
by: Wang, Juncheng, et al.
Published: (2025)

Leveraging LLM Embeddings for Cross Dataset Label Alignment and Zero Shot Music Emotion Prediction
by: Liu, Renhang, et al.
Published: (2024)

Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction
by: Wu, Wenxuan, et al.
Published: (2025)

ArrayDPS: Unsupervised Blind Speech Separation with a Diffusion Prior
by: Xu, Zhongweiyang, et al.
Published: (2025)

Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning
by: Zhang, Yixiao, et al.
Published: (2024)

Automatic Music Transcription using Convolutional Neural Networks and Constant-Q transform
by: Telila, Yohannis, et al.
Published: (2025)

Cinematic Audio Source Separation Using Visual Cues
by: Zhang, Kang, et al.
Published: (2026)

MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models
by: Weck, Benno, et al.
Published: (2024)

Leveraging Pre-Trained Autoencoders for Interpretable Prototype Learning of Music Audio
by: Alonso-Jiménez, Pablo, et al.
Published: (2024)

Bridging The Multi-Modality Gaps of Audio, Visual and Linguistic for Speech Enhancement
by: Lin, Meng-Ping, et al.
Published: (2025)

Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
by: Kim, Sungnyun, et al.
Published: (2025)

IML-Spikeformer: Input-aware Multi-Level Spiking Transformer for Speech Processing
by: Song, Zeyang, et al.
Published: (2025)

MR-MT3: Memory Retaining Multi-Track Music Transcription to Mitigate Instrument Leakage
by: Tan, Hao Hao, et al.
Published: (2024)

Spectron: Target Speaker Extraction using Conditional Transformer with Adversarial Refinement
by: Bandyopadhyay, Tathagata
Published: (2024)

Generative AI for Music and Audio
by: Dong, Hao-Wen
Published: (2024)

PDMX: A Large-Scale Public Domain MusicXML Dataset for Symbolic Music Processing
by: Long, Phillip, et al.
Published: (2024)

Multi-Track MusicLDM: Towards Versatile Music Generation with Latent Diffusion Model
by: Karchkhadze, Tornike, et al.
Published: (2024)

Learning Normal Patterns in Musical Loops
by: Dadman, Shayan, et al.
Published: (2025)

Semantic Grouping Network for Audio Source Separation
by: Mo, Shentong, et al.
Published: (2024)

A Dataset and Baselines for Measuring and Predicting the Music Piece Memorability
by: Tseng, Li-Yang, et al.
Published: (2024)

Music Genre Classification: Ensemble Learning with Subcomponents-level Attention
by: Liu, Yichen, et al.
Published: (2024)

LM2D: Lyrics- and Music-Driven Dance Synthesis
by: Yin, Wenjie, et al.
Published: (2024)

Segment-Factorized Full-Song Generation on Symbolic Piano Music
by: Chen, Ping-Yi, et al.
Published: (2025)

The Name-Free Gap: Policy-Aware Stylistic Control in Music Generation
by: Nagarajan, Ashwin, et al.
Published: (2025)

SteerMusic: Enhanced Musical Consistency for Zero-shot Text-guided and Personalized Music Editing
by: Niu, Xinlei, et al.
Published: (2025)

MERGE -- A Bimodal Audio-Lyrics Dataset for Static Music Emotion Recognition
by: Louro, Pedro Lima, et al.
Published: (2024)

Efficient Fine-Grained Guidance for Diffusion Model Based Symbolic Music Generation
by: Zhu, Tingyu, et al.
Published: (2024)

JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
by: Li, Peike, et al.
Published: (2023)

CMI-RewardBench: Evaluating Music Reward Models with Compositional Multimodal Instruction
by: Ma, Yinghao, et al.
Published: (2026)

Flexible Control in Symbolic Music Generation via Musical Metadata
by: Han, Sangjun, et al.
Published: (2024)

MusicAOG: an Energy-Based Model for Learning and Sampling a Hierarchical Representation of Symbolic Music
by: Qian, Yikai, et al.
Published: (2024)

On the Effect of Data-Augmentation on Local Embedding Properties in the Contrastive Learning of Music Audio Representations
by: McCallum, Matthew C., et al.
Published: (2024)

Music Enhancement with Deep Filters: A Technical Report for The ICASSP 2024 Cadenza Challenge
by: Shao, Keren, et al.
Published: (2024)