Saved in:
| Main Authors: | Singh, Akanksha, Chen, Yi-Ping Phoebe, Arora, Vipul |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.16751 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
BEST-STD2.0: Balanced and Efficient Speech Tokenizer for Spoken Term Detection
by: Singh, Anup, et al.
Published: (2025)
by: Singh, Anup, et al.
Published: (2025)
Attention-Based Audio Embeddings for Query-by-Example
by: Singh, Anup, et al.
Published: (2022)
by: Singh, Anup, et al.
Published: (2022)
BEST-STD: Bidirectional Mamba-Enhanced Speech Tokenization for Spoken Term Detection
by: Singh, Anup, et al.
Published: (2024)
by: Singh, Anup, et al.
Published: (2024)
Cross-Lingual Query-by-Example Spoken Term Detection: A Transformer-Based Approach
by: Fatemeh, Allahdadi, et al.
Published: (2024)
by: Fatemeh, Allahdadi, et al.
Published: (2024)
Improving Active Learning for Melody Estimation by Disentangling Uncertainties
by: Jaiswal, Aayush, et al.
Published: (2025)
by: Jaiswal, Aayush, et al.
Published: (2025)
Explainable Deep Learning Analysis for Raga Identification in Indian Art Music
by: Singh, Parampreet, et al.
Published: (2024)
by: Singh, Parampreet, et al.
Published: (2024)
AudioNet: Supervised Deep Hashing for Retrieval of Similar Audio Events
by: Dutta, Sagar, et al.
Published: (2025)
by: Dutta, Sagar, et al.
Published: (2025)
Identification and Clustering of Unseen Ragas in Indian Art Music
by: Singh, Parampreet, et al.
Published: (2024)
by: Singh, Parampreet, et al.
Published: (2024)
Automatic Detection and Analysis of Singing Mistakes for Music Pedagogy
by: Kumar, Sumit, et al.
Published: (2026)
by: Kumar, Sumit, et al.
Published: (2026)
SyncNet: correlating objective for time delay estimation in audio signals
by: Raina, Akshay, et al.
Published: (2022)
by: Raina, Akshay, et al.
Published: (2022)
Uncertainty Quantification in Melody Estimation using Histogram Representation
by: Saxena, Kavya Ranjan, et al.
Published: (2025)
by: Saxena, Kavya Ranjan, et al.
Published: (2025)
Meta-learning-based percussion transcription and $t\bar{a}la$ identification from low-resource audio
by: Kodag, Rahul Bapusaheb, et al.
Published: (2025)
by: Kodag, Rahul Bapusaheb, et al.
Published: (2025)
Weakly Supervised Tabla Stroke Transcription via TI-SDRM: A Rhythm-Aware Lattice Rescoring Framework
by: Kodag, Rahul Bapusaheb, et al.
Published: (2026)
by: Kodag, Rahul Bapusaheb, et al.
Published: (2026)
Written Term Detection Improves Spoken Term Detection
by: Yusuf, Bolaji, et al.
Published: (2024)
by: Yusuf, Bolaji, et al.
Published: (2024)
Interactive singing melody extraction based on active adaptation
by: Saxena, Kavya Ranjan, et al.
Published: (2024)
by: Saxena, Kavya Ranjan, et al.
Published: (2024)
$T\bar{a}laGen:$ A System for Automatic $T\bar{a}la$ Identification and Generation
by: Kodag, Rahul Bapusaheb, et al.
Published: (2024)
by: Kodag, Rahul Bapusaheb, et al.
Published: (2024)
Learning from Limited Labels: Transductive Graph Label Propagation for Indian Music Analysis
by: Singh, Parampreet, et al.
Published: (2026)
by: Singh, Parampreet, et al.
Published: (2026)
Learning to Discover: A Generalized Framework for Raga Identification without Forgetting
by: Singh, Parampreet, et al.
Published: (2026)
by: Singh, Parampreet, et al.
Published: (2026)
Spoken-Term Discovery using Discrete Speech Units
by: van Niekerk, Benjamin, et al.
Published: (2024)
by: van Niekerk, Benjamin, et al.
Published: (2024)
Recognizing Ornaments in Vocal Indian Art Music with Active Annotation
by: Kumar, Sumit, et al.
Published: (2025)
by: Kumar, Sumit, et al.
Published: (2025)
HiPPO: Exploring A Novel Hierarchical Pronunciation Assessment Approach for Spoken Languages
by: Yan, Bi-Cheng, et al.
Published: (2025)
by: Yan, Bi-Cheng, et al.
Published: (2025)
TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in End-to-End ASR
by: Ravi, Nagarathna, et al.
Published: (2024)
by: Ravi, Nagarathna, et al.
Published: (2024)
Towards Hierarchical Spoken Language Dysfluency Modeling
by: Lian, Jiachen, et al.
Published: (2024)
by: Lian, Jiachen, et al.
Published: (2024)
Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model
by: Futami, Hayato, et al.
Published: (2024)
by: Futami, Hayato, et al.
Published: (2024)
On The Landscape of Spoken Language Models: A Comprehensive Survey
by: Arora, Siddhant, et al.
Published: (2025)
by: Arora, Siddhant, et al.
Published: (2025)
Acoustic and Semantic Modeling of Emotion in Spoken Language
by: Dutta, Soumya
Published: (2026)
by: Dutta, Soumya
Published: (2026)
Semantic-Aware Interruption Detection in Spoken Dialogue Systems: Benchmark, Metric, and Model
by: Xia, Kangxiang, et al.
Published: (2026)
by: Xia, Kangxiang, et al.
Published: (2026)
On the Evaluation of Speech Foundation Models for Spoken Language Understanding
by: Arora, Siddhant, et al.
Published: (2024)
by: Arora, Siddhant, et al.
Published: (2024)
ESPnet-SDS: Unified Toolkit and Demo for Spoken Dialogue Systems
by: Arora, Siddhant, et al.
Published: (2025)
by: Arora, Siddhant, et al.
Published: (2025)
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models
by: Lin, Yi-Cheng, et al.
Published: (2024)
by: Lin, Yi-Cheng, et al.
Published: (2024)
Chain-of-Thought Reasoning in Streaming Full-Duplex End-to-End Spoken Dialogue Systems
by: Arora, Siddhant, et al.
Published: (2025)
by: Arora, Siddhant, et al.
Published: (2025)
Spoken Language Corpora Augmentation with Domain-Specific Voice-Cloned Speech
by: Czyżnikiewicz, Mateusz, et al.
Published: (2024)
by: Czyżnikiewicz, Mateusz, et al.
Published: (2024)
Evaluating Hallucinations in Audio-Visual Multimodal LLMs with Spoken Queries under Diverse Acoustic Conditions
by: Park, Hansol, et al.
Published: (2025)
by: Park, Hansol, et al.
Published: (2025)
Chain-of-Thought Training for Open E2E Spoken Dialogue Systems
by: Arora, Siddhant, et al.
Published: (2025)
by: Arora, Siddhant, et al.
Published: (2025)
E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models
by: Xue, Hongfei, et al.
Published: (2023)
by: Xue, Hongfei, et al.
Published: (2023)
DuplexSLA: A Full-Duplex Spoken Language Model with Synchronized Speech, Language, and Action
by: Zhang, Haoyang, et al.
Published: (2026)
by: Zhang, Haoyang, et al.
Published: (2026)
TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
by: Tseng, Liang-Hsuan, et al.
Published: (2025)
by: Tseng, Liang-Hsuan, et al.
Published: (2025)
Exploring Text-Queried Sound Event Detection with Audio Source Separation
by: Yin, Han, et al.
Published: (2024)
by: Yin, Han, et al.
Published: (2024)
Proactive for Uncertainty: Cause-Aware Error Diagnosis and Interactive Clarification for Spoken Dialogue Systems
by: Peng, Yizhou, et al.
Published: (2026)
by: Peng, Yizhou, et al.
Published: (2026)
Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding
by: Li, Mohan, et al.
Published: (2024)
by: Li, Mohan, et al.
Published: (2024)
Similar Items
-
BEST-STD2.0: Balanced and Efficient Speech Tokenizer for Spoken Term Detection
by: Singh, Anup, et al.
Published: (2025) -
Attention-Based Audio Embeddings for Query-by-Example
by: Singh, Anup, et al.
Published: (2022) -
BEST-STD: Bidirectional Mamba-Enhanced Speech Tokenization for Spoken Term Detection
by: Singh, Anup, et al.
Published: (2024) -
Cross-Lingual Query-by-Example Spoken Term Detection: A Transformer-Based Approach
by: Fatemeh, Allahdadi, et al.
Published: (2024) -
Improving Active Learning for Melody Estimation by Disentangling Uncertainties
by: Jaiswal, Aayush, et al.
Published: (2025)