Saved in:
| Main Author: | Ziaoddini, Kajwan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.00006 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Who Gets Heard? Rethinking Fairness in AI for Music Systems
by: Mehta, Atharva, et al.
Published: (2025)
by: Mehta, Atharva, et al.
Published: (2025)
Can LLMs "Reason" in Music? An Evaluation of LLMs' Capability of Music Understanding and Generation
by: Zhou, Ziya, et al.
Published: (2024)
by: Zhou, Ziya, et al.
Published: (2024)
LaunchpadGPT: Language Model as Music Visualization Designer on Launchpad
by: Xu, Siting, et al.
Published: (2023)
by: Xu, Siting, et al.
Published: (2023)
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
by: Ma, Ziyang, et al.
Published: (2025)
by: Ma, Ziyang, et al.
Published: (2025)
Addressing Emotion Bias in Music Emotion Recognition and Generation with Frechet Audio Distance
by: Li, Yuanchao, et al.
Published: (2024)
by: Li, Yuanchao, et al.
Published: (2024)
Cross-Modal Learning for Music-to-Music-Video Description Generation
by: Mao, Zhuoyuan, et al.
Published: (2025)
by: Mao, Zhuoyuan, et al.
Published: (2025)
Beat-Based Rhythm Quantization of MIDI Performances
by: Wachter, Maximilian, et al.
Published: (2025)
by: Wachter, Maximilian, et al.
Published: (2025)
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models
by: Weck, Benno, et al.
Published: (2024)
by: Weck, Benno, et al.
Published: (2024)
Gender Representation in TV and Radio: Automatic Information Extraction methods versus Manual Analyses
by: Doukhan, David, et al.
Published: (2024)
by: Doukhan, David, et al.
Published: (2024)
ComposerX: Multi-Agent Symbolic Music Composition with LLMs
by: Deng, Qixin, et al.
Published: (2024)
by: Deng, Qixin, et al.
Published: (2024)
Fine-Tuning MIDI-to-Audio Alignment using a Neural Network on Piano Roll and CQT Representations
by: Murgul, Sebastian, et al.
Published: (2025)
by: Murgul, Sebastian, et al.
Published: (2025)
DeepResonance: Enhancing Multimodal Music Understanding via Music-centric Multi-way Instruction Tuning
by: Mao, Zhuoyuan, et al.
Published: (2025)
by: Mao, Zhuoyuan, et al.
Published: (2025)
MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response
by: Deng, Zihao, et al.
Published: (2023)
by: Deng, Zihao, et al.
Published: (2023)
MusicAOG: an Energy-Based Model for Learning and Sampling a Hierarchical Representation of Symbolic Music
by: Qian, Yikai, et al.
Published: (2024)
by: Qian, Yikai, et al.
Published: (2024)
Flexible Control in Symbolic Music Generation via Musical Metadata
by: Han, Sangjun, et al.
Published: (2024)
by: Han, Sangjun, et al.
Published: (2024)
Optimizing Feature Extraction for Symbolic Music
by: Simonetta, Federico, et al.
Published: (2023)
by: Simonetta, Federico, et al.
Published: (2023)
OpenMU: Your Swiss Army Knife for Music Understanding
by: Zhao, Mengjie, et al.
Published: (2024)
by: Zhao, Mengjie, et al.
Published: (2024)
Exploring Acoustic Similarity in Emotional Speech and Music via Self-Supervised Representations
by: Sun, Yujia, et al.
Published: (2024)
by: Sun, Yujia, et al.
Published: (2024)
Audio-Thinker: Guiding Audio Language Model When and How to Think via Reinforcement Learning
by: Wu, Shu, et al.
Published: (2025)
by: Wu, Shu, et al.
Published: (2025)
Pay More Attention To Audio: Mitigating Imbalance of Cross-Modal Attention in Large Audio Language Models
by: Wang, Junyu, et al.
Published: (2025)
by: Wang, Junyu, et al.
Published: (2025)
MLLM-based Speech Recognition: When and How is Multimodality Beneficial?
by: Guan, Yiwen, et al.
Published: (2025)
by: Guan, Yiwen, et al.
Published: (2025)
Reshaping Representation Space to Balance the Safety and Over-rejection in Large Audio Language Models
by: Yang, Hao, et al.
Published: (2025)
by: Yang, Hao, et al.
Published: (2025)
Fretting-Transformer: Encoder-Decoder Model for MIDI to Tablature Transcription
by: Hamberger, Anna, et al.
Published: (2025)
by: Hamberger, Anna, et al.
Published: (2025)
StarVC: A Unified Auto-Regressive Framework for Joint Text and Speech Generation in Voice Conversion
by: Li, Fengjin, et al.
Published: (2025)
by: Li, Fengjin, et al.
Published: (2025)
ELEGANCE: Efficient LLM Guidance for Audio-Visual Target Speech Extraction
by: Wu, Wenxuan, et al.
Published: (2025)
by: Wu, Wenxuan, et al.
Published: (2025)
CommonVoice-SpeechRE and RPG-MoGe: Advancing Speech Relation Extraction with a New Dataset and Multi-Order Generative Framework
by: Ning, Jinzhong, et al.
Published: (2025)
by: Ning, Jinzhong, et al.
Published: (2025)
Beat and Downbeat Tracking in Performance MIDI Using an End-to-End Transformer Architecture
by: Murgul, Sebastian, et al.
Published: (2025)
by: Murgul, Sebastian, et al.
Published: (2025)
Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model
by: Ma, Ziyang, et al.
Published: (2025)
by: Ma, Ziyang, et al.
Published: (2025)
SoundMind: RL-Incentivized Logic Reasoning for Audio-Language Models
by: Diao, Xingjian, et al.
Published: (2025)
by: Diao, Xingjian, et al.
Published: (2025)
Zero-Shot End-to-End Spoken Language Understanding via Cross-Modal Selective Self-Training
by: He, Jianfeng, et al.
Published: (2023)
by: He, Jianfeng, et al.
Published: (2023)
AudioSetMix: Enhancing Audio-Language Datasets with LLM-Assisted Augmentations
by: Xu, David
Published: (2024)
by: Xu, David
Published: (2024)
MMSD-Net: Towards Multi-modal Stuttering Detection
by: Nie, Liangyu, et al.
Published: (2024)
by: Nie, Liangyu, et al.
Published: (2024)
Double Mixture: Towards Continual Event Detection from Speech
by: Kang, Jingqi, et al.
Published: (2024)
by: Kang, Jingqi, et al.
Published: (2024)
MF-AED-AEC: Speech Emotion Recognition by Leveraging Multimodal Fusion, Asr Error Detection, and Asr Error Correction
by: He, Jiajun, et al.
Published: (2024)
by: He, Jiajun, et al.
Published: (2024)
Audio Is the Achilles' Heel: Red Teaming Audio Large Multimodal Models
by: Yang, Hao, et al.
Published: (2024)
by: Yang, Hao, et al.
Published: (2024)
Resurfacing Paralinguistic Awareness in Large Audio Language Models
by: Yang, Hao, et al.
Published: (2026)
by: Yang, Hao, et al.
Published: (2026)
Missingness-resilient Video-enhanced Multimodal Disfluency Detection
by: Mohapatra, Payal, et al.
Published: (2024)
by: Mohapatra, Payal, et al.
Published: (2024)
Speech Emotion Recognition with ASR Transcripts: A Comprehensive Study on Word Error Rate and Fusion Techniques
by: Li, Yuanchao, et al.
Published: (2024)
by: Li, Yuanchao, et al.
Published: (2024)
Learning Audio Concepts from Counterfactual Natural Language
by: Vosoughi, Ali, et al.
Published: (2024)
by: Vosoughi, Ali, et al.
Published: (2024)
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
by: Mei, Xinhao, et al.
Published: (2023)
by: Mei, Xinhao, et al.
Published: (2023)
Similar Items
-
Who Gets Heard? Rethinking Fairness in AI for Music Systems
by: Mehta, Atharva, et al.
Published: (2025) -
Can LLMs "Reason" in Music? An Evaluation of LLMs' Capability of Music Understanding and Generation
by: Zhou, Ziya, et al.
Published: (2024) -
LaunchpadGPT: Language Model as Music Visualization Designer on Launchpad
by: Xu, Siting, et al.
Published: (2023) -
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
by: Ma, Ziyang, et al.
Published: (2025) -
Addressing Emotion Bias in Music Emotion Recognition and Generation with Frechet Audio Distance
by: Li, Yuanchao, et al.
Published: (2024)