Saved in:
| Main Authors: | Saxena, Kavya Ranjan, Arora, Vipul |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.07599 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Uncertainty Quantification in Melody Estimation using Histogram Representation
by: Saxena, Kavya Ranjan, et al.
Published: (2025)
by: Saxena, Kavya Ranjan, et al.
Published: (2025)
Attention-Based Audio Embeddings for Query-by-Example
by: Singh, Anup, et al.
Published: (2022)
by: Singh, Anup, et al.
Published: (2022)
DNN-based ensemble singing voice synthesis with interactions between singers
by: Hyodo, Hiroaki, et al.
Published: (2024)
by: Hyodo, Hiroaki, et al.
Published: (2024)
TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in End-to-End ASR
by: Ravi, Nagarathna, et al.
Published: (2024)
by: Ravi, Nagarathna, et al.
Published: (2024)
Resource-constrained stereo singing voice cancellation
by: Borrelli, Clara, et al.
Published: (2024)
by: Borrelli, Clara, et al.
Published: (2024)
Recognizing Ornaments in Vocal Indian Art Music with Active Annotation
by: Kumar, Sumit, et al.
Published: (2025)
by: Kumar, Sumit, et al.
Published: (2025)
An audio-quality-based multi-strategy approach for target speaker extraction in the MISP 2023 Challenge
by: Han, Runduo, et al.
Published: (2024)
by: Han, Runduo, et al.
Published: (2024)
An adaptive filter bank based neural network approach for time delay estimation and speech enhancement
by: Ma, Lu
Published: (2025)
by: Ma, Lu
Published: (2025)
Towards Fusion of Neural Audio Codec-based Representations with Spectral for Heart Murmur Classification via Bandit-based Cross-Attention Mechanism
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Hierarchical speaker representation for target speaker extraction
by: He, Shulin, et al.
Published: (2022)
by: He, Shulin, et al.
Published: (2022)
Multimodal Zero-Shot Framework for Deepfake Hate Speech Detection in Low-Resource Languages
by: Ranjan, Rishabh, et al.
Published: (2025)
by: Ranjan, Rishabh, et al.
Published: (2025)
SynHate: Detecting Hate Speech in Synthetic Deepfake Audio
by: Ranjan, Rishabh, et al.
Published: (2025)
by: Ranjan, Rishabh, et al.
Published: (2025)
Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation
by: Tsunoo, Emiru, et al.
Published: (2023)
by: Tsunoo, Emiru, et al.
Published: (2023)
Sample adaptive data augmentation with progressive scheduling
by: Lu, Hongxuan, et al.
Published: (2024)
by: Lu, Hongxuan, et al.
Published: (2024)
Scalable Offline ASR for Command-Style Dictation in Courtrooms
by: Nethil, Kumarmanas, et al.
Published: (2025)
by: Nethil, Kumarmanas, et al.
Published: (2025)
Are Mamba-based Audio Foundation Models the Best Fit for Non-Verbal Emotion Recognition?
by: Akhtar, Mohd Mujtaba, et al.
Published: (2025)
by: Akhtar, Mohd Mujtaba, et al.
Published: (2025)
Interaural time difference loss for binaural target sound extraction
by: Hernandez-Olivan, Carlos, et al.
Published: (2024)
by: Hernandez-Olivan, Carlos, et al.
Published: (2024)
Improving curriculum learning for target speaker extraction with synthetic speakers
by: Liu, Yun, et al.
Published: (2024)
by: Liu, Yun, et al.
Published: (2024)
Can Quantized Audio Language Models Perform Zero-Shot Spoofing Detection?
by: Dutta, Bikash, et al.
Published: (2025)
by: Dutta, Bikash, et al.
Published: (2025)
PARROT: Synergizing Mamba and Attention-based SSL Pre-Trained Models via Parallel Branch Hadamard Optimal Transport for Speech Emotion Recognition
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Spectral or spatial? Leveraging both for speaker extraction in challenging data conditions
by: Eisenberg, Aviad, et al.
Published: (2025)
by: Eisenberg, Aviad, et al.
Published: (2025)
Text adaptation for speaker verification with speaker-text factorized embeddings
by: Yang, Yexin, et al.
Published: (2025)
by: Yang, Yexin, et al.
Published: (2025)
Complexity boosted adaptive training for better low resource ASR performance
by: Lu, Hongxuan, et al.
Published: (2024)
by: Lu, Hongxuan, et al.
Published: (2024)
Improving fairness in speaker verification via Group-adapted Fusion Network
by: Shen, Hua, et al.
Published: (2022)
by: Shen, Hua, et al.
Published: (2022)
Bayesian adaptive learning to latent variables via Variational Bayes and Maximum a Posteriori
by: Hu, Hu, et al.
Published: (2024)
by: Hu, Hu, et al.
Published: (2024)
The CHiME-7 UDASE task: Unsupervised domain adaptation for conversational speech enhancement
by: Leglaive, Simon, et al.
Published: (2023)
by: Leglaive, Simon, et al.
Published: (2023)
PROCTER: PROnunciation-aware ConTextual adaptER for personalized speech recognition in neural transducers
by: Pandey, Rahul, et al.
Published: (2023)
by: Pandey, Rahul, et al.
Published: (2023)
Can all variations within the unified mask-based beamformer framework achieve identical peak extraction performance?
by: Hiroe, Atsuo, et al.
Published: (2024)
by: Hiroe, Atsuo, et al.
Published: (2024)
Investigating the Reasonable Effectiveness of Speaker Pre-Trained Models and their Synergistic Power for SingMOS Prediction
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Source Tracing of Synthetic Speech Systems Through Paralinguistic Pre-Trained Representations
by: Girish, et al.
Published: (2025)
by: Girish, et al.
Published: (2025)
Towards Machine Unlearning for Paralinguistic Speech Processing
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
musif: a Python package for symbolic music feature extraction
by: Llorens, Ana, et al.
Published: (2023)
by: Llorens, Ana, et al.
Published: (2023)
Investigating Polyglot Speech Foundation Models for Learning Collective Emotion from Crowds
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Unlocking Large Audio-Language Models for Interactive Language Learning
by: Liu, Hongfu, et al.
Published: (2026)
by: Liu, Hongfu, et al.
Published: (2026)
Accelerated Interactive Auralization of Highly Reverberant Spaces using Graphics Hardware
by: Rosseel, Hannes, et al.
Published: (2025)
by: Rosseel, Hannes, et al.
Published: (2025)
From Independence to Interaction: Speaker-Aware Simulation of Multi-Speaker Conversational Timing
by: Gedeon, Máté, et al.
Published: (2025)
by: Gedeon, Máté, et al.
Published: (2025)
Phoenix-VAD: Streaming Semantic Endpoint Detection for Full-Duplex Speech Interaction
by: Wu, Weijie, et al.
Published: (2025)
by: Wu, Weijie, et al.
Published: (2025)
End-to-End Joint ASR and Speaker Role Diarization with Child-Adult Interactions
by: Xu, Anfeng, et al.
Published: (2026)
by: Xu, Anfeng, et al.
Published: (2026)
Bridging Speech Emotion Recognition and Personality: Dataset and Temporal Interaction Condition Network
by: Gao, Yuan, et al.
Published: (2025)
by: Gao, Yuan, et al.
Published: (2025)
Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions
by: Xin, Yifei, et al.
Published: (2023)
by: Xin, Yifei, et al.
Published: (2023)
Similar Items
-
Uncertainty Quantification in Melody Estimation using Histogram Representation
by: Saxena, Kavya Ranjan, et al.
Published: (2025) -
Attention-Based Audio Embeddings for Query-by-Example
by: Singh, Anup, et al.
Published: (2022) -
DNN-based ensemble singing voice synthesis with interactions between singers
by: Hyodo, Hiroaki, et al.
Published: (2024) -
TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in End-to-End ASR
by: Ravi, Nagarathna, et al.
Published: (2024) -
Resource-constrained stereo singing voice cancellation
by: Borrelli, Clara, et al.
Published: (2024)