Saved in:
| Main Authors: | Tan, Frank Lihui, Do, Youngah |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.18501 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of Speech Sound Disorders in Korean children
by: Ahn, Taekyung, et al.
Published: (2024)
by: Ahn, Taekyung, et al.
Published: (2024)
Suicide Risk Assessment Using Multimodal Speech Features: A Study on the SW1 Challenge Dataset
by: Marie, Ambre, et al.
Published: (2025)
by: Marie, Ambre, et al.
Published: (2025)
Framework for Curating Speech Datasets and Evaluating ASR Systems: A Case Study for Polish
by: Junczyk, Michał
Published: (2024)
by: Junczyk, Michał
Published: (2024)
SW-ASR: A Context-Aware Hybrid ASR Pipeline for Robust Single Word Speech Recognition
by: Sharma, Manali, et al.
Published: (2026)
by: Sharma, Manali, et al.
Published: (2026)
MuTox: Universal MUltilingual Audio-based TOXicity Dataset and Zero-shot Detector
by: Costa-jussà, Marta R., et al.
Published: (2024)
by: Costa-jussà, Marta R., et al.
Published: (2024)
Measuring the Accuracy of Automatic Speech Recognition Solutions
by: Kuhn, Korbinian, et al.
Published: (2024)
by: Kuhn, Korbinian, et al.
Published: (2024)
Everyday Speech in the Indian Subcontinent
by: P, Utkarsh
Published: (2024)
by: P, Utkarsh
Published: (2024)
Empathy Omni: Enabling Empathetic Speech Response Generation through Large Language Models
by: Wang, Haoyu, et al.
Published: (2025)
by: Wang, Haoyu, et al.
Published: (2025)
Bigger is not Always Better: The Effect of Context Size on Speech Pre-Training
by: Robertson, Sean, et al.
Published: (2023)
by: Robertson, Sean, et al.
Published: (2023)
Beyond Levenshtein: Leveraging Multiple Algorithms for Robust Word Error Rate Computations And Granular Error Classifications
by: Kuhn, Korbinian, et al.
Published: (2024)
by: Kuhn, Korbinian, et al.
Published: (2024)
Predicting Upcoming Stuttering Events from Three-Second Audio: Stratified Evaluation Reveals Severity-Selective Precursors, and the Model Deploys Fully On-Device
by: Kozak, Nazar
Published: (2026)
by: Kozak, Nazar
Published: (2026)
Are Music Foundation Models Better at Singing Voice Deepfake Detection? Far-Better Fuse them with Speech Foundation Models
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
SpeechWeave: Diverse Multilingual Synthetic Text & Audio Data Generation Pipeline for Training Text to Speech Models
by: Dua, Karan, et al.
Published: (2025)
by: Dua, Karan, et al.
Published: (2025)
AQUALLM: Audio Question Answering Data Generation Using Large Language Models
by: Behera, Swarup Ranjan, et al.
Published: (2023)
by: Behera, Swarup Ranjan, et al.
Published: (2023)
SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset
by: Gautam, Sushant, et al.
Published: (2024)
by: Gautam, Sushant, et al.
Published: (2024)
SpeechAccentLLM: A Unified Framework for Foreign Accent Conversion and Text to Speech
by: Cheng, Zhuangfei, et al.
Published: (2025)
by: Cheng, Zhuangfei, et al.
Published: (2025)
Quantifying the effect of speech pathology on automatic and human speaker verification
by: Halpern, Bence Mark, et al.
Published: (2024)
by: Halpern, Bence Mark, et al.
Published: (2024)
NAAQA: A Neural Architecture for Acoustic Question Answering
by: Abdelnour, Jerome, et al.
Published: (2021)
by: Abdelnour, Jerome, et al.
Published: (2021)
Enhancing Speech Emotion Recognition Leveraging Aligning Timestamps of ASR Transcripts and Speaker Diarization
by: Wang, Hsuan-Yu, et al.
Published: (2025)
by: Wang, Hsuan-Yu, et al.
Published: (2025)
Less Stress, More Privacy: Stress Detection on Anonymized Speech of Air Traffic Controllers
by: Viswanathan, Janaki, et al.
Published: (2025)
by: Viswanathan, Janaki, et al.
Published: (2025)
Thaka at KSAA-2026 Task 2: Regularized Fine-Tuning for Arabic Speech Diacritization
by: Alamr, Meshal, et al.
Published: (2026)
by: Alamr, Meshal, et al.
Published: (2026)
Improving Speech Recognition Accuracy Using Custom Language Models with the Vosk Toolkit
by: Soni, Aniket Abhishek
Published: (2025)
by: Soni, Aniket Abhishek
Published: (2025)
A Unified Model For Voice and Accent Conversion In Speech and Singing using Self-Supervised Learning and Feature Extraction
by: Cheripally, Sowmya
Published: (2024)
by: Cheripally, Sowmya
Published: (2024)
Taming Audio VAEs via Target-KL Regularization
by: Seetharaman, Prem, et al.
Published: (2026)
by: Seetharaman, Prem, et al.
Published: (2026)
Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?
by: Fang, Qingkai, et al.
Published: (2024)
by: Fang, Qingkai, et al.
Published: (2024)
CTC-based Non-autoregressive Textless Speech-to-Speech Translation
by: Fang, Qingkai, et al.
Published: (2024)
by: Fang, Qingkai, et al.
Published: (2024)
Syllable based DNN-HMM Cantonese Speech to Text System
by: Wong, Timothy, et al.
Published: (2024)
by: Wong, Timothy, et al.
Published: (2024)
MGSC: A Multi-granularity Consistency Framework for Robust End-to-end Asr
by: Yang, Xuwen
Published: (2025)
by: Yang, Xuwen
Published: (2025)
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
by: Fang, Qingkai, et al.
Published: (2024)
by: Fang, Qingkai, et al.
Published: (2024)
Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasks
by: Villatoro-Tello, Esaú, et al.
Published: (2022)
by: Villatoro-Tello, Esaú, et al.
Published: (2022)
An End-to-End Approach for Korean Wakeword Systems with Speaker Authentication
by: Seo, Geonwoo
Published: (2025)
by: Seo, Geonwoo
Published: (2025)
Splitformer: An improved early-exit architecture for automatic speech recognition on edge devices
by: Lasbordes, Maxence, et al.
Published: (2025)
by: Lasbordes, Maxence, et al.
Published: (2025)
CUPE: Contextless Universal Phoneme Encoder for Language-Agnostic Speech Processing
by: Rehman, Abdul, et al.
Published: (2025)
by: Rehman, Abdul, et al.
Published: (2025)
Evaluating ASR Confidence Scores for Automated Error Detection in User-Assisted Correction Interfaces
by: Kuhn, Korbinian, et al.
Published: (2025)
by: Kuhn, Korbinian, et al.
Published: (2025)
Investigating Prosodic Signatures via Speech Pre-Trained Models for Audio Deepfake Source Attribution
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
Strong Alone, Stronger Together: Synergizing Modality-Binding Foundation Models with Optimal Transport for Non-Verbal Emotion Recognition
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
Multi-View Multi-Task Modeling with Speech Foundation Models for Speech Forensic Tasks
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
Avengers Assemble: Amalgamation of Non-Semantic Features for Depression Detection
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
SeQuiFi: Mitigating Catastrophic Forgetting in Speech Emotion Recognition with Sequential Class-Finetuning
by: Jain, Sarthak, et al.
Published: (2024)
by: Jain, Sarthak, et al.
Published: (2024)
Representation Loss Minimization with Randomized Selection Strategy for Efficient Environmental Fake Audio Detection
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
Similar Items
-
Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of Speech Sound Disorders in Korean children
by: Ahn, Taekyung, et al.
Published: (2024) -
Suicide Risk Assessment Using Multimodal Speech Features: A Study on the SW1 Challenge Dataset
by: Marie, Ambre, et al.
Published: (2025) -
Framework for Curating Speech Datasets and Evaluating ASR Systems: A Case Study for Polish
by: Junczyk, Michał
Published: (2024) -
SW-ASR: A Context-Aware Hybrid ASR Pipeline for Robust Single Word Speech Recognition
by: Sharma, Manali, et al.
Published: (2026) -
MuTox: Universal MUltilingual Audio-based TOXicity Dataset and Zero-shot Detector
by: Costa-jussà, Marta R., et al.
Published: (2024)