Saved in:
| Main Authors: | Kommagouni, Priyanka, Narasinga, Vamshiraghusimha, Barche, Purva, C, Sai Akarsh, Vuppala, Anil |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.17149 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
IIITH-BUT system for IWSLT 2025 low-resource Bhojpuri to Hindi speech translation
by: Akkiraju, Bhavana, et al.
Published: (2025)
by: Akkiraju, Bhavana, et al.
Published: (2025)
Attempt Towards Stress Transfer in Speech-to-Speech Machine Translation
by: Akarsh, Sai, et al.
Published: (2024)
by: Akarsh, Sai, et al.
Published: (2024)
TeluguST-46: A Benchmark Corpus and Comprehensive Evaluation for Telugu-English Speech Translation
by: Akkiraju, Bhavana, et al.
Published: (2025)
by: Akkiraju, Bhavana, et al.
Published: (2025)
Open vocabulary keyword spotting through transfer learning from speech synthesis
by: V, Kesavaraj, et al.
Published: (2024)
by: V, Kesavaraj, et al.
Published: (2024)
Fairness in Dysarthric Speech Synthesis: Understanding Intrinsic Bias in Dysarthric Speech Cloning using F5-TTS
by: Anuprabha, M, et al.
Published: (2025)
by: Anuprabha, M, et al.
Published: (2025)
End-to-End User-Defined Keyword Spotting using Shifted Delta Coefficients
by: V, Kesavaraj, et al.
Published: (2024)
by: V, Kesavaraj, et al.
Published: (2024)
A Multi-modal Approach to Dysarthria Detection and Severity Assessment Using Speech and Text Information
by: M, Anuprabha, et al.
Published: (2024)
by: M, Anuprabha, et al.
Published: (2024)
Efficient ASR for Low-Resource Languages: Leveraging Cross-Lingual Unlabeled Data
by: Bandarupalli, Srihari, et al.
Published: (2025)
by: Bandarupalli, Srihari, et al.
Published: (2025)
A Preliminary Analysis of Automatic Word and Syllable Prominence Detection in Non-Native Speech With Text-to-Speech Prosody Embeddings
by: Mondal, Anindita, et al.
Published: (2024)
by: Mondal, Anindita, et al.
Published: (2024)
VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech
by: Gudmalwar, Ashishkumar, et al.
Published: (2024)
by: Gudmalwar, Ashishkumar, et al.
Published: (2024)
End-to-End Speech Translation for Low-Resource Languages Using Weakly Labeled Data
by: Pothula, Aishwarya, et al.
Published: (2025)
by: Pothula, Aishwarya, et al.
Published: (2025)
Z-Scores: A Metric for Linguistically Assessing Disfluency Removal
by: Teleki, Maria, et al.
Published: (2025)
by: Teleki, Maria, et al.
Published: (2025)
DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage
by: Wang, Kyra, et al.
Published: (2024)
by: Wang, Kyra, et al.
Published: (2024)
Missingness-resilient Video-enhanced Multimodal Disfluency Detection
by: Mohapatra, Payal, et al.
Published: (2024)
by: Mohapatra, Payal, et al.
Published: (2024)
Automatic Speech Recognition for Non-Native English: Accuracy and Disfluency Handling
by: McGuire, Michael
Published: (2025)
by: McGuire, Michael
Published: (2025)
Enhancing ASR Performance in the Medical Domain for Dravidian Languages
by: Devarakonda, Sri Charan, et al.
Published: (2026)
by: Devarakonda, Sri Charan, et al.
Published: (2026)
RISC: A Corpus for Shout Type Classification and Shout Intensity Prediction
by: Fukumori, Takahiro, et al.
Published: (2023)
by: Fukumori, Takahiro, et al.
Published: (2023)
Rethinking Cross-Corpus Speech Emotion Recognition Benchmarking: Are Paralinguistic Pre-Trained Representations Sufficient?
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Feature Representations for Automatic Meerkat Vocalization Classification
by: Mahmoud, Imen Ben, et al.
Published: (2024)
by: Mahmoud, Imen Ben, et al.
Published: (2024)
Evaluating CNN with Stacked Feature Representations and Audio Spectrogram Transformer Models for Sound Classification
by: Dehaghania, Parinaz Binandeh, et al.
Published: (2026)
by: Dehaghania, Parinaz Binandeh, et al.
Published: (2026)
Revealing the Hidden Temporal Structure of HubertSoft Embeddings based on the Russian Phonetic Corpus
by: Ananeva, Anastasia, et al.
Published: (2025)
by: Ananeva, Anastasia, et al.
Published: (2025)
Enhancing Speaker-Independent Dysarthric Speech Severity Classification with DSSCNet and Cross-Corpus Adaptation
by: Roy, Arnab Kumar, et al.
Published: (2025)
by: Roy, Arnab Kumar, et al.
Published: (2025)
Smooth Operators: LLMs Translating Imperfect Hints into Disfluency-Rich Transcripts
by: Altinok, Duygu
Published: (2025)
by: Altinok, Duygu
Published: (2025)
Full-Duplex-Bench-v3: Benchmarking Tool Use for Full-Duplex Voice Agents Under Real-World Disfluency
by: Lin, Guan-Ting, et al.
Published: (2026)
by: Lin, Guan-Ting, et al.
Published: (2026)
Toward a Reinforcement-Learning-Based System for Adjusting Medication to Minimize Speech Disfluency
by: Constas, Pavlos, et al.
Published: (2023)
by: Constas, Pavlos, et al.
Published: (2023)
The MSP-Podcast Corpus
by: Busso, Carlos, et al.
Published: (2025)
by: Busso, Carlos, et al.
Published: (2025)
SiamCTC: Learning Speech Representations through Monotonic Temporal Alignment
by: Eom, SooHwan, et al.
Published: (2026)
by: Eom, SooHwan, et al.
Published: (2026)
Distinctive Feature Codec: An Adaptive Efficient Speech Representation for Depression Detection
by: Zhang, Xiangyu, et al.
Published: (2025)
by: Zhang, Xiangyu, et al.
Published: (2025)
Evaluating Pretrained General-Purpose Audio Representations for Music Genre Classification
by: Rai, Kashish, et al.
Published: (2026)
by: Rai, Kashish, et al.
Published: (2026)
IdolSongsJp Corpus: A Multi-Singer Song Corpus in the Style of Japanese Idol Groups
by: Suda, Hitoshi, et al.
Published: (2025)
by: Suda, Hitoshi, et al.
Published: (2025)
BENYO-S2ST-Corpus-1: A Bilingual English-to-Yoruba Direct Speech-to-Speech Translation Corpus
by: Adetiba, Emmanuel, et al.
Published: (2025)
by: Adetiba, Emmanuel, et al.
Published: (2025)
The PARLO Dementia Corpus: A German Multi-Center Resource for Alzheimer's Disease
by: Braun, Franziska, et al.
Published: (2026)
by: Braun, Franziska, et al.
Published: (2026)
AdaMER-CTC: Connectionist Temporal Classification with Adaptive Maximum Entropy Regularization for Automatic Speech Recognition
by: Eom, SooHwan, et al.
Published: (2024)
by: Eom, SooHwan, et al.
Published: (2024)
Temporally Heterogeneous Graph Contrastive Learning for Multimodal Acoustic event Classification
by: Chen, Yuanjian, et al.
Published: (2025)
by: Chen, Yuanjian, et al.
Published: (2025)
TAME: Temporal Audio-based Mamba for Enhanced Drone Trajectory Estimation and Classification
by: Xiao, Zhenyuan, et al.
Published: (2024)
by: Xiao, Zhenyuan, et al.
Published: (2024)
Improving Speaker Representations Using Contrastive Losses on Multi-scale Features
by: Dixit, Satvik, et al.
Published: (2024)
by: Dixit, Satvik, et al.
Published: (2024)
Cross-Corpus Validation of Speech Emotion Recognition in Urdu using Domain-Knowledge Acoustic Features
by: Talpur, Unzela, et al.
Published: (2025)
by: Talpur, Unzela, et al.
Published: (2025)
UrduSpeech: A 156-Hour Urdu Speech Corpus with 12-Dimension Paralinguistic Annotations
by: Haq, Attia Nafees ul, et al.
Published: (2026)
by: Haq, Attia Nafees ul, et al.
Published: (2026)
spINAch: A Diachronic Corpus of French Broadcast Speech Controlled for Speakers' Age and Gender
by: Devauchelle, Simon, et al.
Published: (2026)
by: Devauchelle, Simon, et al.
Published: (2026)
Listen through the Sound: Generative Speech Restoration Leveraging Acoustic Context Representation
by: Chung, Soo-Whan, et al.
Published: (2025)
by: Chung, Soo-Whan, et al.
Published: (2025)
Similar Items
-
IIITH-BUT system for IWSLT 2025 low-resource Bhojpuri to Hindi speech translation
by: Akkiraju, Bhavana, et al.
Published: (2025) -
Attempt Towards Stress Transfer in Speech-to-Speech Machine Translation
by: Akarsh, Sai, et al.
Published: (2024) -
TeluguST-46: A Benchmark Corpus and Comprehensive Evaluation for Telugu-English Speech Translation
by: Akkiraju, Bhavana, et al.
Published: (2025) -
Open vocabulary keyword spotting through transfer learning from speech synthesis
by: V, Kesavaraj, et al.
Published: (2024) -
Fairness in Dysarthric Speech Synthesis: Understanding Intrinsic Bias in Dysarthric Speech Cloning using F5-TTS
by: Anuprabha, M, et al.
Published: (2025)