:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kommagouni, Priyanka, Narasinga, Vamshiraghusimha, Barche, Purva, C, Sai Akarsh, Vuppala, Anil
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2411.17149
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

IIITH-BUT system for IWSLT 2025 low-resource Bhojpuri to Hindi speech translation
by: Akkiraju, Bhavana, et al.
Published: (2025)

Attempt Towards Stress Transfer in Speech-to-Speech Machine Translation
by: Akarsh, Sai, et al.
Published: (2024)

TeluguST-46: A Benchmark Corpus and Comprehensive Evaluation for Telugu-English Speech Translation
by: Akkiraju, Bhavana, et al.
Published: (2025)

Open vocabulary keyword spotting through transfer learning from speech synthesis
by: V, Kesavaraj, et al.
Published: (2024)

Fairness in Dysarthric Speech Synthesis: Understanding Intrinsic Bias in Dysarthric Speech Cloning using F5-TTS
by: Anuprabha, M, et al.
Published: (2025)

End-to-End User-Defined Keyword Spotting using Shifted Delta Coefficients
by: V, Kesavaraj, et al.
Published: (2024)

A Multi-modal Approach to Dysarthria Detection and Severity Assessment Using Speech and Text Information
by: M, Anuprabha, et al.
Published: (2024)

Efficient ASR for Low-Resource Languages: Leveraging Cross-Lingual Unlabeled Data
by: Bandarupalli, Srihari, et al.
Published: (2025)

A Preliminary Analysis of Automatic Word and Syllable Prominence Detection in Non-Native Speech With Text-to-Speech Prosody Embeddings
by: Mondal, Anindita, et al.
Published: (2024)

VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech
by: Gudmalwar, Ashishkumar, et al.
Published: (2024)

End-to-End Speech Translation for Low-Resource Languages Using Weakly Labeled Data
by: Pothula, Aishwarya, et al.
Published: (2025)

Z-Scores: A Metric for Linguistically Assessing Disfluency Removal
by: Teleki, Maria, et al.
Published: (2025)

DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage
by: Wang, Kyra, et al.
Published: (2024)

Missingness-resilient Video-enhanced Multimodal Disfluency Detection
by: Mohapatra, Payal, et al.
Published: (2024)

Automatic Speech Recognition for Non-Native English: Accuracy and Disfluency Handling
by: McGuire, Michael
Published: (2025)

Enhancing ASR Performance in the Medical Domain for Dravidian Languages
by: Devarakonda, Sri Charan, et al.
Published: (2026)

RISC: A Corpus for Shout Type Classification and Shout Intensity Prediction
by: Fukumori, Takahiro, et al.
Published: (2023)

Rethinking Cross-Corpus Speech Emotion Recognition Benchmarking: Are Paralinguistic Pre-Trained Representations Sufficient?
by: Phukan, Orchid Chetia, et al.
Published: (2025)

Feature Representations for Automatic Meerkat Vocalization Classification
by: Mahmoud, Imen Ben, et al.
Published: (2024)

Evaluating CNN with Stacked Feature Representations and Audio Spectrogram Transformer Models for Sound Classification
by: Dehaghania, Parinaz Binandeh, et al.
Published: (2026)

Revealing the Hidden Temporal Structure of HubertSoft Embeddings based on the Russian Phonetic Corpus
by: Ananeva, Anastasia, et al.
Published: (2025)

Enhancing Speaker-Independent Dysarthric Speech Severity Classification with DSSCNet and Cross-Corpus Adaptation
by: Roy, Arnab Kumar, et al.
Published: (2025)

Smooth Operators: LLMs Translating Imperfect Hints into Disfluency-Rich Transcripts
by: Altinok, Duygu
Published: (2025)

Full-Duplex-Bench-v3: Benchmarking Tool Use for Full-Duplex Voice Agents Under Real-World Disfluency
by: Lin, Guan-Ting, et al.
Published: (2026)

Toward a Reinforcement-Learning-Based System for Adjusting Medication to Minimize Speech Disfluency
by: Constas, Pavlos, et al.
Published: (2023)

The MSP-Podcast Corpus
by: Busso, Carlos, et al.
Published: (2025)

SiamCTC: Learning Speech Representations through Monotonic Temporal Alignment
by: Eom, SooHwan, et al.
Published: (2026)

Distinctive Feature Codec: An Adaptive Efficient Speech Representation for Depression Detection
by: Zhang, Xiangyu, et al.
Published: (2025)

Evaluating Pretrained General-Purpose Audio Representations for Music Genre Classification
by: Rai, Kashish, et al.
Published: (2026)

IdolSongsJp Corpus: A Multi-Singer Song Corpus in the Style of Japanese Idol Groups
by: Suda, Hitoshi, et al.
Published: (2025)

BENYO-S2ST-Corpus-1: A Bilingual English-to-Yoruba Direct Speech-to-Speech Translation Corpus
by: Adetiba, Emmanuel, et al.
Published: (2025)

The PARLO Dementia Corpus: A German Multi-Center Resource for Alzheimer's Disease
by: Braun, Franziska, et al.
Published: (2026)

AdaMER-CTC: Connectionist Temporal Classification with Adaptive Maximum Entropy Regularization for Automatic Speech Recognition
by: Eom, SooHwan, et al.
Published: (2024)

Temporally Heterogeneous Graph Contrastive Learning for Multimodal Acoustic event Classification
by: Chen, Yuanjian, et al.
Published: (2025)

TAME: Temporal Audio-based Mamba for Enhanced Drone Trajectory Estimation and Classification
by: Xiao, Zhenyuan, et al.
Published: (2024)

Improving Speaker Representations Using Contrastive Losses on Multi-scale Features
by: Dixit, Satvik, et al.
Published: (2024)

Cross-Corpus Validation of Speech Emotion Recognition in Urdu using Domain-Knowledge Acoustic Features
by: Talpur, Unzela, et al.
Published: (2025)

UrduSpeech: A 156-Hour Urdu Speech Corpus with 12-Dimension Paralinguistic Annotations
by: Haq, Attia Nafees ul, et al.
Published: (2026)

spINAch: A Diachronic Corpus of French Broadcast Speech Controlled for Speakers' Age and Gender
by: Devauchelle, Simon, et al.
Published: (2026)

Listen through the Sound: Generative Speech Restoration Leveraging Acoustic Context Representation
by: Chung, Soo-Whan, et al.
Published: (2025)