:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Tan, Frank Lihui, Do, Youngah
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Machine Learning Sound Audio and Speech Processing I.2.7
Online Access:	https://arxiv.org/abs/2407.18501
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of Speech Sound Disorders in Korean children
by: Ahn, Taekyung, et al.
Published: (2024)

Suicide Risk Assessment Using Multimodal Speech Features: A Study on the SW1 Challenge Dataset
by: Marie, Ambre, et al.
Published: (2025)

Framework for Curating Speech Datasets and Evaluating ASR Systems: A Case Study for Polish
by: Junczyk, Michał
Published: (2024)

SW-ASR: A Context-Aware Hybrid ASR Pipeline for Robust Single Word Speech Recognition
by: Sharma, Manali, et al.
Published: (2026)

MuTox: Universal MUltilingual Audio-based TOXicity Dataset and Zero-shot Detector
by: Costa-jussà, Marta R., et al.
Published: (2024)

Measuring the Accuracy of Automatic Speech Recognition Solutions
by: Kuhn, Korbinian, et al.
Published: (2024)

Everyday Speech in the Indian Subcontinent
by: P, Utkarsh
Published: (2024)

Empathy Omni: Enabling Empathetic Speech Response Generation through Large Language Models
by: Wang, Haoyu, et al.
Published: (2025)

Bigger is not Always Better: The Effect of Context Size on Speech Pre-Training
by: Robertson, Sean, et al.
Published: (2023)

Beyond Levenshtein: Leveraging Multiple Algorithms for Robust Word Error Rate Computations And Granular Error Classifications
by: Kuhn, Korbinian, et al.
Published: (2024)

Predicting Upcoming Stuttering Events from Three-Second Audio: Stratified Evaluation Reveals Severity-Selective Precursors, and the Model Deploys Fully On-Device
by: Kozak, Nazar
Published: (2026)

Are Music Foundation Models Better at Singing Voice Deepfake Detection? Far-Better Fuse them with Speech Foundation Models
by: Phukan, Orchid Chetia, et al.
Published: (2024)

SpeechWeave: Diverse Multilingual Synthetic Text & Audio Data Generation Pipeline for Training Text to Speech Models
by: Dua, Karan, et al.
Published: (2025)

AQUALLM: Audio Question Answering Data Generation Using Large Language Models
by: Behera, Swarup Ranjan, et al.
Published: (2023)

SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset
by: Gautam, Sushant, et al.
Published: (2024)

SpeechAccentLLM: A Unified Framework for Foreign Accent Conversion and Text to Speech
by: Cheng, Zhuangfei, et al.
Published: (2025)

Quantifying the effect of speech pathology on automatic and human speaker verification
by: Halpern, Bence Mark, et al.
Published: (2024)

NAAQA: A Neural Architecture for Acoustic Question Answering
by: Abdelnour, Jerome, et al.
Published: (2021)

Enhancing Speech Emotion Recognition Leveraging Aligning Timestamps of ASR Transcripts and Speaker Diarization
by: Wang, Hsuan-Yu, et al.
Published: (2025)

Less Stress, More Privacy: Stress Detection on Anonymized Speech of Air Traffic Controllers
by: Viswanathan, Janaki, et al.
Published: (2025)

Thaka at KSAA-2026 Task 2: Regularized Fine-Tuning for Arabic Speech Diacritization
by: Alamr, Meshal, et al.
Published: (2026)

Improving Speech Recognition Accuracy Using Custom Language Models with the Vosk Toolkit
by: Soni, Aniket Abhishek
Published: (2025)

A Unified Model For Voice and Accent Conversion In Speech and Singing using Self-Supervised Learning and Feature Extraction
by: Cheripally, Sowmya
Published: (2024)

Taming Audio VAEs via Target-KL Regularization
by: Seetharaman, Prem, et al.
Published: (2026)

Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?
by: Fang, Qingkai, et al.
Published: (2024)

CTC-based Non-autoregressive Textless Speech-to-Speech Translation
by: Fang, Qingkai, et al.
Published: (2024)

Syllable based DNN-HMM Cantonese Speech to Text System
by: Wong, Timothy, et al.
Published: (2024)

MGSC: A Multi-granularity Consistency Framework for Robust End-to-end Asr
by: Yang, Xuwen
Published: (2025)

LLaMA-Omni: Seamless Speech Interaction with Large Language Models
by: Fang, Qingkai, et al.
Published: (2024)

Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasks
by: Villatoro-Tello, Esaú, et al.
Published: (2022)

An End-to-End Approach for Korean Wakeword Systems with Speaker Authentication
by: Seo, Geonwoo
Published: (2025)

Splitformer: An improved early-exit architecture for automatic speech recognition on edge devices
by: Lasbordes, Maxence, et al.
Published: (2025)

CUPE: Contextless Universal Phoneme Encoder for Language-Agnostic Speech Processing
by: Rehman, Abdul, et al.
Published: (2025)

Evaluating ASR Confidence Scores for Automated Error Detection in User-Assisted Correction Interfaces
by: Kuhn, Korbinian, et al.
Published: (2025)

Investigating Prosodic Signatures via Speech Pre-Trained Models for Audio Deepfake Source Attribution
by: Phukan, Orchid Chetia, et al.
Published: (2024)

Strong Alone, Stronger Together: Synergizing Modality-Binding Foundation Models with Optimal Transport for Non-Verbal Emotion Recognition
by: Phukan, Orchid Chetia, et al.
Published: (2024)

Multi-View Multi-Task Modeling with Speech Foundation Models for Speech Forensic Tasks
by: Phukan, Orchid Chetia, et al.
Published: (2024)

Avengers Assemble: Amalgamation of Non-Semantic Features for Depression Detection
by: Phukan, Orchid Chetia, et al.
Published: (2024)

SeQuiFi: Mitigating Catastrophic Forgetting in Speech Emotion Recognition with Sequential Class-Finetuning
by: Jain, Sarthak, et al.
Published: (2024)

Representation Loss Minimization with Randomized Selection Strategy for Efficient Environmental Fake Audio Detection
by: Phukan, Orchid Chetia, et al.
Published: (2024)