:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Singh, Akanksha, Chen, Yi-Ping Phoebe, Arora, Vipul
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2506.16751
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

BEST-STD2.0: Balanced and Efficient Speech Tokenizer for Spoken Term Detection
by: Singh, Anup, et al.
Published: (2025)

Attention-Based Audio Embeddings for Query-by-Example
by: Singh, Anup, et al.
Published: (2022)

BEST-STD: Bidirectional Mamba-Enhanced Speech Tokenization for Spoken Term Detection
by: Singh, Anup, et al.
Published: (2024)

Cross-Lingual Query-by-Example Spoken Term Detection: A Transformer-Based Approach
by: Fatemeh, Allahdadi, et al.
Published: (2024)

Improving Active Learning for Melody Estimation by Disentangling Uncertainties
by: Jaiswal, Aayush, et al.
Published: (2025)

Explainable Deep Learning Analysis for Raga Identification in Indian Art Music
by: Singh, Parampreet, et al.
Published: (2024)

AudioNet: Supervised Deep Hashing for Retrieval of Similar Audio Events
by: Dutta, Sagar, et al.
Published: (2025)

Identification and Clustering of Unseen Ragas in Indian Art Music
by: Singh, Parampreet, et al.
Published: (2024)

Automatic Detection and Analysis of Singing Mistakes for Music Pedagogy
by: Kumar, Sumit, et al.
Published: (2026)

SyncNet: correlating objective for time delay estimation in audio signals
by: Raina, Akshay, et al.
Published: (2022)

Uncertainty Quantification in Melody Estimation using Histogram Representation
by: Saxena, Kavya Ranjan, et al.
Published: (2025)

Meta-learning-based percussion transcription and $t\bar{a}la$ identification from low-resource audio
by: Kodag, Rahul Bapusaheb, et al.
Published: (2025)

Weakly Supervised Tabla Stroke Transcription via TI-SDRM: A Rhythm-Aware Lattice Rescoring Framework
by: Kodag, Rahul Bapusaheb, et al.
Published: (2026)

Written Term Detection Improves Spoken Term Detection
by: Yusuf, Bolaji, et al.
Published: (2024)

Interactive singing melody extraction based on active adaptation
by: Saxena, Kavya Ranjan, et al.
Published: (2024)

$T\bar{a}laGen:$ A System for Automatic $T\bar{a}la$ Identification and Generation
by: Kodag, Rahul Bapusaheb, et al.
Published: (2024)

Learning from Limited Labels: Transductive Graph Label Propagation for Indian Music Analysis
by: Singh, Parampreet, et al.
Published: (2026)

Learning to Discover: A Generalized Framework for Raga Identification without Forgetting
by: Singh, Parampreet, et al.
Published: (2026)

Spoken-Term Discovery using Discrete Speech Units
by: van Niekerk, Benjamin, et al.
Published: (2024)

Recognizing Ornaments in Vocal Indian Art Music with Active Annotation
by: Kumar, Sumit, et al.
Published: (2025)

HiPPO: Exploring A Novel Hierarchical Pronunciation Assessment Approach for Spoken Languages
by: Yan, Bi-Cheng, et al.
Published: (2025)

TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in End-to-End ASR
by: Ravi, Nagarathna, et al.
Published: (2024)

Towards Hierarchical Spoken Language Dysfluency Modeling
by: Lian, Jiachen, et al.
Published: (2024)

Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model
by: Futami, Hayato, et al.
Published: (2024)

On The Landscape of Spoken Language Models: A Comprehensive Survey
by: Arora, Siddhant, et al.
Published: (2025)

Acoustic and Semantic Modeling of Emotion in Spoken Language
by: Dutta, Soumya
Published: (2026)

Semantic-Aware Interruption Detection in Spoken Dialogue Systems: Benchmark, Metric, and Model
by: Xia, Kangxiang, et al.
Published: (2026)

On the Evaluation of Speech Foundation Models for Spoken Language Understanding
by: Arora, Siddhant, et al.
Published: (2024)

ESPnet-SDS: Unified Toolkit and Demo for Spoken Dialogue Systems
by: Arora, Siddhant, et al.
Published: (2025)

Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models
by: Lin, Yi-Cheng, et al.
Published: (2024)

Chain-of-Thought Reasoning in Streaming Full-Duplex End-to-End Spoken Dialogue Systems
by: Arora, Siddhant, et al.
Published: (2025)

Spoken Language Corpora Augmentation with Domain-Specific Voice-Cloned Speech
by: Czyżnikiewicz, Mateusz, et al.
Published: (2024)

Evaluating Hallucinations in Audio-Visual Multimodal LLMs with Spoken Queries under Diverse Acoustic Conditions
by: Park, Hansol, et al.
Published: (2025)

Chain-of-Thought Training for Open E2E Spoken Dialogue Systems
by: Arora, Siddhant, et al.
Published: (2025)

E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models
by: Xue, Hongfei, et al.
Published: (2023)

DuplexSLA: A Full-Duplex Spoken Language Model with Synchronized Speech, Language, and Action
by: Zhang, Haoyang, et al.
Published: (2026)

TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
by: Tseng, Liang-Hsuan, et al.
Published: (2025)

Exploring Text-Queried Sound Event Detection with Audio Source Separation
by: Yin, Han, et al.
Published: (2024)

Proactive for Uncertainty: Cause-Aware Error Diagnosis and Interactive Clarification for Spoken Dialogue Systems
by: Peng, Yizhou, et al.
Published: (2026)

Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding
by: Li, Mohan, et al.
Published: (2024)