:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Saxena, Kavya Ranjan, Arora, Vipul
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing Sound
Online Access:	https://arxiv.org/abs/2402.07599
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Uncertainty Quantification in Melody Estimation using Histogram Representation
by: Saxena, Kavya Ranjan, et al.
Published: (2025)

Attention-Based Audio Embeddings for Query-by-Example
by: Singh, Anup, et al.
Published: (2022)

DNN-based ensemble singing voice synthesis with interactions between singers
by: Hyodo, Hiroaki, et al.
Published: (2024)

TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in End-to-End ASR
by: Ravi, Nagarathna, et al.
Published: (2024)

Resource-constrained stereo singing voice cancellation
by: Borrelli, Clara, et al.
Published: (2024)

Recognizing Ornaments in Vocal Indian Art Music with Active Annotation
by: Kumar, Sumit, et al.
Published: (2025)

An audio-quality-based multi-strategy approach for target speaker extraction in the MISP 2023 Challenge
by: Han, Runduo, et al.
Published: (2024)

An adaptive filter bank based neural network approach for time delay estimation and speech enhancement
by: Ma, Lu
Published: (2025)

Towards Fusion of Neural Audio Codec-based Representations with Spectral for Heart Murmur Classification via Bandit-based Cross-Attention Mechanism
by: Phukan, Orchid Chetia, et al.
Published: (2025)

Hierarchical speaker representation for target speaker extraction
by: He, Shulin, et al.
Published: (2022)

Multimodal Zero-Shot Framework for Deepfake Hate Speech Detection in Low-Resource Languages
by: Ranjan, Rishabh, et al.
Published: (2025)

SynHate: Detecting Hate Speech in Synthetic Deepfake Audio
by: Ranjan, Rishabh, et al.
Published: (2025)

Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation
by: Tsunoo, Emiru, et al.
Published: (2023)

Sample adaptive data augmentation with progressive scheduling
by: Lu, Hongxuan, et al.
Published: (2024)

Scalable Offline ASR for Command-Style Dictation in Courtrooms
by: Nethil, Kumarmanas, et al.
Published: (2025)

Are Mamba-based Audio Foundation Models the Best Fit for Non-Verbal Emotion Recognition?
by: Akhtar, Mohd Mujtaba, et al.
Published: (2025)

Interaural time difference loss for binaural target sound extraction
by: Hernandez-Olivan, Carlos, et al.
Published: (2024)

Improving curriculum learning for target speaker extraction with synthetic speakers
by: Liu, Yun, et al.
Published: (2024)

Can Quantized Audio Language Models Perform Zero-Shot Spoofing Detection?
by: Dutta, Bikash, et al.
Published: (2025)

PARROT: Synergizing Mamba and Attention-based SSL Pre-Trained Models via Parallel Branch Hadamard Optimal Transport for Speech Emotion Recognition
by: Phukan, Orchid Chetia, et al.
Published: (2025)

Spectral or spatial? Leveraging both for speaker extraction in challenging data conditions
by: Eisenberg, Aviad, et al.
Published: (2025)

Text adaptation for speaker verification with speaker-text factorized embeddings
by: Yang, Yexin, et al.
Published: (2025)

Complexity boosted adaptive training for better low resource ASR performance
by: Lu, Hongxuan, et al.
Published: (2024)

Improving fairness in speaker verification via Group-adapted Fusion Network
by: Shen, Hua, et al.
Published: (2022)

Bayesian adaptive learning to latent variables via Variational Bayes and Maximum a Posteriori
by: Hu, Hu, et al.
Published: (2024)

The CHiME-7 UDASE task: Unsupervised domain adaptation for conversational speech enhancement
by: Leglaive, Simon, et al.
Published: (2023)

PROCTER: PROnunciation-aware ConTextual adaptER for personalized speech recognition in neural transducers
by: Pandey, Rahul, et al.
Published: (2023)

Can all variations within the unified mask-based beamformer framework achieve identical peak extraction performance?
by: Hiroe, Atsuo, et al.
Published: (2024)

Investigating the Reasonable Effectiveness of Speaker Pre-Trained Models and their Synergistic Power for SingMOS Prediction
by: Phukan, Orchid Chetia, et al.
Published: (2025)

Source Tracing of Synthetic Speech Systems Through Paralinguistic Pre-Trained Representations
by: Girish, et al.
Published: (2025)

Towards Machine Unlearning for Paralinguistic Speech Processing
by: Phukan, Orchid Chetia, et al.
Published: (2025)

musif: a Python package for symbolic music feature extraction
by: Llorens, Ana, et al.
Published: (2023)

Investigating Polyglot Speech Foundation Models for Learning Collective Emotion from Crowds
by: Phukan, Orchid Chetia, et al.
Published: (2025)

Unlocking Large Audio-Language Models for Interactive Language Learning
by: Liu, Hongfu, et al.
Published: (2026)

Accelerated Interactive Auralization of Highly Reverberant Spaces using Graphics Hardware
by: Rosseel, Hannes, et al.
Published: (2025)

From Independence to Interaction: Speaker-Aware Simulation of Multi-Speaker Conversational Timing
by: Gedeon, Máté, et al.
Published: (2025)

Phoenix-VAD: Streaming Semantic Endpoint Detection for Full-Duplex Speech Interaction
by: Wu, Weijie, et al.
Published: (2025)

End-to-End Joint ASR and Speaker Role Diarization with Child-Adult Interactions
by: Xu, Anfeng, et al.
Published: (2026)

Bridging Speech Emotion Recognition and Personality: Dataset and Temporal Interaction Condition Network
by: Gao, Yuan, et al.
Published: (2025)

Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions
by: Xin, Yifei, et al.
Published: (2023)