Saved in:
| Main Authors: | Lin, Zizhen, Chen, Xiaoting, Wang, Junyu |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.04589 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SECP: A Speech Enhancement-Based Curation Pipeline For Scalable Acquisition Of Clean Speech
by: Sabra, Adam, et al.
Published: (2024)
by: Sabra, Adam, et al.
Published: (2024)
LORT: Locally Refined Convolution and Taylor Transformer for Monaural Speech Enhancement
by: Wang, Junyu, et al.
Published: (2025)
by: Wang, Junyu, et al.
Published: (2025)
Multi-Sample Dynamic Time Warping for Few-Shot Keyword Spotting
by: Wilkinghoff, Kevin, et al.
Published: (2024)
by: Wilkinghoff, Kevin, et al.
Published: (2024)
Language-based Audio Retrieval with Co-Attention Networks
by: Sun, Haoran, et al.
Published: (2024)
by: Sun, Haoran, et al.
Published: (2024)
Bridging the Gap Between Semantic and User Preference Spaces for Multi-modal Music Representation Learning
by: Pan, Xiaofeng, et al.
Published: (2025)
by: Pan, Xiaofeng, et al.
Published: (2025)
DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval
by: Xin, Yifei, et al.
Published: (2024)
by: Xin, Yifei, et al.
Published: (2024)
Evaluating Interval-based Tokenization for Pitch Representation in Symbolic Music Analysis
by: Le, Dinh-Viet-Toan, et al.
Published: (2025)
by: Le, Dinh-Viet-Toan, et al.
Published: (2025)
SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering
by: Lin, Chyi-Jiunn, et al.
Published: (2024)
by: Lin, Chyi-Jiunn, et al.
Published: (2024)
Application of Audio Fingerprinting Techniques for Real-Time Scalable Speech Retrieval and Speech Clusterization
by: Altwlkany, Kemal, et al.
Published: (2024)
by: Altwlkany, Kemal, et al.
Published: (2024)
VoxRAG: A Step Toward Transcription-Free RAG Systems in Spoken Question Answering
by: Rackauckas, Zackary, et al.
Published: (2025)
by: Rackauckas, Zackary, et al.
Published: (2025)
Exploring Diverse Sounds: Identifying Outliers in a Music Corpus
by: Cai, Le, et al.
Published: (2024)
by: Cai, Le, et al.
Published: (2024)
Music Discovery Dialogue Generation Using Human Intent Analysis and Large Language Models
by: Doh, SeungHeon, et al.
Published: (2024)
by: Doh, SeungHeon, et al.
Published: (2024)
Track Role Prediction of Single-Instrumental Sequences
by: Han, Changheon, et al.
Published: (2024)
by: Han, Changheon, et al.
Published: (2024)
Personalized Dynamic Music Emotion Recognition with Dual-Scale Attention-Based Meta-Learning
by: Zhang, Dengming, et al.
Published: (2024)
by: Zhang, Dengming, et al.
Published: (2024)
LARP: Language Audio Relational Pre-training for Cold-Start Playlist Continuation
by: Salganik, Rebecca, et al.
Published: (2024)
by: Salganik, Rebecca, et al.
Published: (2024)
Expressivity-aware Music Performance Retrieval using Mid-level Perceptual Features and Emotion Word Embeddings
by: Chowdhury, Shreyan, et al.
Published: (2024)
by: Chowdhury, Shreyan, et al.
Published: (2024)
Exploring GPT's Ability as a Judge in Music Understanding
by: Fang, Kun, et al.
Published: (2025)
by: Fang, Kun, et al.
Published: (2025)
Multiscale Matching Driven by Cross-Modal Similarity Consistency for Audio-Text Retrieval
by: Wang, Qian, et al.
Published: (2024)
by: Wang, Qian, et al.
Published: (2024)
Do Captioning Metrics Reflect Music Semantic Alignment?
by: Lee, Jinwoo, et al.
Published: (2024)
by: Lee, Jinwoo, et al.
Published: (2024)
Automatic Estimation of Singing Voice Musical Dynamics
by: Narang, Jyoti, et al.
Published: (2024)
by: Narang, Jyoti, et al.
Published: (2024)
FusID: Modality-Fused Semantic IDs for Generative Music Recommendation
by: Kim, Haven, et al.
Published: (2026)
by: Kim, Haven, et al.
Published: (2026)
A SOUND APPROACH: Using Large Language Models to generate audio descriptions for egocentric text-audio retrieval
by: Oncescu, Andreea-Maria, et al.
Published: (2024)
by: Oncescu, Andreea-Maria, et al.
Published: (2024)
TALKPLAY: Multimodal Music Recommendation with Large Language Models
by: Doh, Seungheon, et al.
Published: (2025)
by: Doh, Seungheon, et al.
Published: (2025)
Evaluating High-Resolution Piano Sustain Pedal Depth Estimation with Musically Informed Metrics
by: Zhang, Hanwen, et al.
Published: (2025)
by: Zhang, Hanwen, et al.
Published: (2025)
Engraving Oriented Joint Estimation of Pitch Spelling and Local and Global Keys
by: Bouquillard, Augustin, et al.
Published: (2024)
by: Bouquillard, Augustin, et al.
Published: (2024)
Towards Computational Analysis of Pansori Singing
by: Park, Sangheon, et al.
Published: (2024)
by: Park, Sangheon, et al.
Published: (2024)
CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval
by: Abootorabi, Mohammad Mahdi, et al.
Published: (2024)
by: Abootorabi, Mohammad Mahdi, et al.
Published: (2024)
EAViT: External Attention Vision Transformer for Audio Classification
by: Iqbal, Aquib, et al.
Published: (2024)
by: Iqbal, Aquib, et al.
Published: (2024)
Transforming LLMs into Cross-modal and Cross-lingual Retrieval Systems
by: Gomez, Frank Palma, et al.
Published: (2024)
by: Gomez, Frank Palma, et al.
Published: (2024)
Nested Music Transformer: Sequentially Decoding Compound Tokens in Symbolic Music and Audio Generation
by: Yoo, HaeJun, et al.
Published: (2024)
by: Yoo, HaeJun, et al.
Published: (2024)
Multi-Modal Retrieval For Large Language Model Based Speech Recognition
by: Kolehmainen, Jari, et al.
Published: (2024)
by: Kolehmainen, Jari, et al.
Published: (2024)
TalkPlay-Tools: Conversational Music Recommendation with LLM Tool Calling
by: Doh, Seungheon, et al.
Published: (2025)
by: Doh, Seungheon, et al.
Published: (2025)
Diff4Steer: Steerable Diffusion Prior for Generative Music Retrieval with Semantic Guidance
by: Bao, Xuchan, et al.
Published: (2024)
by: Bao, Xuchan, et al.
Published: (2024)
Enriching Music Descriptions with a Finetuned-LLM and Metadata for Text-to-Music Retrieval
by: Doh, SeungHeon, et al.
Published: (2024)
by: Doh, SeungHeon, et al.
Published: (2024)
JEPOO: Highly Accurate Joint Estimation of Pitch, Onset and Offset for Music Information Retrieval
by: Wei, Haojie, et al.
Published: (2023)
by: Wei, Haojie, et al.
Published: (2023)
Streaming Piano Transcription Based on Consistent Onset and Offset Decoding with Sustain Pedal Detection
by: Wei, Weixing, et al.
Published: (2025)
by: Wei, Weixing, et al.
Published: (2025)
Multi-label Cross-lingual automatic music genre classification from lyrics with Sentence BERT
by: Tavares, Tiago Fernandes, et al.
Published: (2025)
by: Tavares, Tiago Fernandes, et al.
Published: (2025)
Distance Sampling-based Paraphraser Leveraging ChatGPT for Text Data Manipulation
by: Oh, Yoori, et al.
Published: (2024)
by: Oh, Yoori, et al.
Published: (2024)
Audio Does Matter: Importance-Aware Multi-Granularity Fusion for Video Moment Retrieval
by: Lin, Junan, et al.
Published: (2025)
by: Lin, Junan, et al.
Published: (2025)
PrimeK-Net: Multi-scale Spectral Learning via Group Prime-Kernel Convolutional Neural Networks for Single Channel Speech Enhancement
by: Lin, Zizhen, et al.
Published: (2025)
by: Lin, Zizhen, et al.
Published: (2025)
Similar Items
-
SECP: A Speech Enhancement-Based Curation Pipeline For Scalable Acquisition Of Clean Speech
by: Sabra, Adam, et al.
Published: (2024) -
LORT: Locally Refined Convolution and Taylor Transformer for Monaural Speech Enhancement
by: Wang, Junyu, et al.
Published: (2025) -
Multi-Sample Dynamic Time Warping for Few-Shot Keyword Spotting
by: Wilkinghoff, Kevin, et al.
Published: (2024) -
Language-based Audio Retrieval with Co-Attention Networks
by: Sun, Haoran, et al.
Published: (2024) -
Bridging the Gap Between Semantic and User Preference Spaces for Multi-modal Music Representation Learning
by: Pan, Xiaofeng, et al.
Published: (2025)