:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	McGettrick, Michael, McGettrick, Paul
Format:	Preprint
Published:	2024
Subjects:	Information Theory Computation and Language Computer Vision and Pattern Recognition Information Retrieval Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2407.12000
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Can Impressions of Music be Extracted from Thumbnail Images?
by: Harada, Takashi, et al.
Published: (2025)

Multimodal Transformer Distillation for Audio-Visual Synchronization
by: Chen, Xuanjun, et al.
Published: (2022)

Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language
by: Hamilton, Mark, et al.
Published: (2024)

Audio Does Matter: Importance-Aware Multi-Granularity Fusion for Video Moment Retrieval
by: Lin, Junan, et al.
Published: (2025)

WikiMuTe: A web-sourced dataset of semantic descriptions for music audio
by: Weck, Benno, et al.
Published: (2023)

Transforming LLMs into Cross-modal and Cross-lingual Retrieval Systems
by: Gomez, Frank Palma, et al.
Published: (2024)

CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval
by: Abootorabi, Mohammad Mahdi, et al.
Published: (2024)

SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering
by: Lin, Chyi-Jiunn, et al.
Published: (2024)

Analyzing Byte-Pair Encoding on Monophonic and Polyphonic Symbolic Music: A Focus on Musical Phrase Segmentation
by: Le, Dinh-Viet-Toan, et al.
Published: (2024)

Metric Learning with Progressive Self-Distillation for Audio-Visual Embedding Learning
by: Zeng, Donghuo, et al.
Published: (2025)

Ecologically Valid Benchmarking and Adaptive Attention: Scalable Marine Bioacoustic Monitoring
by: Rasmussen, Nicholas R., et al.
Published: (2025)

Multi-Modal Retrieval For Large Language Model Based Speech Recognition
by: Kolehmainen, Jari, et al.
Published: (2024)

I can listen but cannot read: An evaluation of two-tower multimodal systems for instrument recognition
by: Vasilakis, Yannis, et al.
Published: (2024)

A GEN AI Framework for Medical Note Generation
by: Leong, Hui Yi, et al.
Published: (2024)

More than words: Advancements and challenges in speech recognition for singing
by: Kruspe, Anna
Published: (2024)

Beyond Musical Descriptors: Extracting Preference-Bearing Intent in Music Queries
by: Baranes, Marion, et al.
Published: (2026)

Technical Report on classification of literature related to children speech disorder
by: Wang, Ziang, et al.
Published: (2025)

Navigating Speech Recording Collections with AI-Generated Illustrations
by: Håland, Sirina, et al.
Published: (2025)

FusID: Modality-Fused Semantic IDs for Generative Music Recommendation
by: Kim, Haven, et al.
Published: (2026)

MUSE: Flexible Voiceprint Receptive Fields and Multi-Path Fusion Enhanced Taylor Transformer for U-Net-based Speech Enhancement
by: Lin, Zizhen, et al.
Published: (2024)

Emergent musical properties of a transformer under contrastive self-supervised learning
by: Kong, Yuexuan, et al.
Published: (2025)

wav2graph: A Framework for Supervised Learning Knowledge Graph from Speech
by: Le-Duc, Khai, et al.
Published: (2024)

Multi-label Cross-lingual automatic music genre classification from lyrics with Sentence BERT
by: Tavares, Tiago Fernandes, et al.
Published: (2025)

The language of sound search: Examining User Queries in Audio Search Engines
by: Weck, Benno, et al.
Published: (2024)

Exploring Diverse Sounds: Identifying Outliers in a Music Corpus
by: Cai, Le, et al.
Published: (2024)

Music Discovery Dialogue Generation Using Human Intent Analysis and Large Language Models
by: Doh, SeungHeon, et al.
Published: (2024)

Track Role Prediction of Single-Instrumental Sequences
by: Han, Changheon, et al.
Published: (2024)

Personalized Dynamic Music Emotion Recognition with Dual-Scale Attention-Based Meta-Learning
by: Zhang, Dengming, et al.
Published: (2024)

LARP: Language Audio Relational Pre-training for Cold-Start Playlist Continuation
by: Salganik, Rebecca, et al.
Published: (2024)

Expressivity-aware Music Performance Retrieval using Mid-level Perceptual Features and Emotion Word Embeddings
by: Chowdhury, Shreyan, et al.
Published: (2024)

DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval
by: Xin, Yifei, et al.
Published: (2024)

Language-based Audio Retrieval with Co-Attention Networks
by: Sun, Haoran, et al.
Published: (2024)

Multi-Sample Dynamic Time Warping for Few-Shot Keyword Spotting
by: Wilkinghoff, Kevin, et al.
Published: (2024)

Multiscale Matching Driven by Cross-Modal Similarity Consistency for Audio-Text Retrieval
by: Wang, Qian, et al.
Published: (2024)

Do Captioning Metrics Reflect Music Semantic Alignment?
by: Lee, Jinwoo, et al.
Published: (2024)

Automatic Estimation of Singing Voice Musical Dynamics
by: Narang, Jyoti, et al.
Published: (2024)

A SOUND APPROACH: Using Large Language Models to generate audio descriptions for egocentric text-audio retrieval
by: Oncescu, Andreea-Maria, et al.
Published: (2024)

Engraving Oriented Joint Estimation of Pitch Spelling and Local and Global Keys
by: Bouquillard, Augustin, et al.
Published: (2024)

Towards Computational Analysis of Pansori Singing
by: Park, Sangheon, et al.
Published: (2024)

VoxRAG: A Step Toward Transcription-Free RAG Systems in Spoken Question Answering
by: Rackauckas, Zackary, et al.
Published: (2025)