Saved in:
| Main Authors: | Gulzar, Kashaf, Wagner, Dominik, Bayerl, Sebastian P., Hönig, Florian, Bocklet, Tobias, Riedhammer, Korbinian |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.02027 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Large Language Models for Dysfluency Detection in Stuttered Speech
by: Wagner, Dominik, et al.
Published: (2024)
by: Wagner, Dominik, et al.
Published: (2024)
Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0
by: Bayerl, Sebastian P., et al.
Published: (2022)
by: Bayerl, Sebastian P., et al.
Published: (2022)
Infusing Acoustic Pause Context into Text-Based Dementia Assessment
by: Braun, Franziska, et al.
Published: (2024)
by: Braun, Franziska, et al.
Published: (2024)
Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models
by: Wagner, Dominik, et al.
Published: (2024)
by: Wagner, Dominik, et al.
Published: (2024)
The PARLO Dementia Corpus: A German Multi-Center Resource for Alzheimer's Disease
by: Braun, Franziska, et al.
Published: (2026)
by: Braun, Franziska, et al.
Published: (2026)
Adapter-Based Multi-Agent AVSR Extension for Pre-Trained ASR Models
by: Simic, Christopher, et al.
Published: (2025)
by: Simic, Christopher, et al.
Published: (2025)
Towards Multi-Level Transcript Segmentation: LoRA Fine-Tuning for Table-of-Contents Generation
by: Freisinger, Steffen, et al.
Published: (2026)
by: Freisinger, Steffen, et al.
Published: (2026)
Personalized Fine-Tuning with Controllable Synthetic Speech from LLM-Generated Transcripts for Dysarthric Speech Recognition
by: Wagner, Dominik, et al.
Published: (2025)
by: Wagner, Dominik, et al.
Published: (2025)
Bias and Fairness in Self-Supervised Acoustic Representations for Cognitive Impairment Detection
by: Gulzar, Kashaf, et al.
Published: (2026)
by: Gulzar, Kashaf, et al.
Published: (2026)
Reading Between the Waves: Robust Topic Segmentation Using Inter-Sentence Audio Features
by: Freisinger, Steffen, et al.
Published: (2026)
by: Freisinger, Steffen, et al.
Published: (2026)
Pitfalls and Limits in Automatic Dementia Assessment
by: Braun, Franziska, et al.
Published: (2025)
by: Braun, Franziska, et al.
Published: (2025)
Vocoder-Free Non-Parallel Conversion of Whispered Speech With Masked Cycle-Consistent Generative Adversarial Networks
by: Wagner, Dominik, et al.
Published: (2023)
by: Wagner, Dominik, et al.
Published: (2023)
Multilingual Stutter Event Detection for English, German, and Mandarin Speech
by: Haas, Felix, et al.
Published: (2026)
by: Haas, Felix, et al.
Published: (2026)
Towards Hierarchical Spoken Language Dysfluency Modeling
by: Lian, Jiachen, et al.
Published: (2024)
by: Lian, Jiachen, et al.
Published: (2024)
SSDM: Scalable Speech Dysfluency Modeling
by: Lian, Jiachen, et al.
Published: (2024)
by: Lian, Jiachen, et al.
Published: (2024)
YOLO-Stutter: End-to-end Region-Wise Speech Dysfluency Detection
by: Zhou, Xuanru, et al.
Published: (2024)
by: Zhou, Xuanru, et al.
Published: (2024)
Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection
by: Zhou, Xuanru, et al.
Published: (2024)
by: Zhou, Xuanru, et al.
Published: (2024)
Deep Learning for Assessment of Oral Reading Fluency
by: Vaidya, Mithilesh, et al.
Published: (2024)
by: Vaidya, Mithilesh, et al.
Published: (2024)
StutterCut: Uncertainty-Guided Normalised Cut for Dysfluency Segmentation
by: Ghosh, Suhita, et al.
Published: (2025)
by: Ghosh, Suhita, et al.
Published: (2025)
Factorized RVQ-GAN For Disentangled Speech Tokenization
by: Khurana, Sameer, et al.
Published: (2025)
by: Khurana, Sameer, et al.
Published: (2025)
CBF-AFA: Chunk-Based Multi-SSL Fusion for Automatic Fluency Assessment
by: Wade, Papa Séga, et al.
Published: (2025)
by: Wade, Papa Séga, et al.
Published: (2025)
Scaling Spoken Language Models with Syllabic Speech Tokenization
by: Lee, Nicholas, et al.
Published: (2025)
by: Lee, Nicholas, et al.
Published: (2025)
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
by: Zhang, Xin, et al.
Published: (2023)
by: Zhang, Xin, et al.
Published: (2023)
SELMA: A Speech-Enabled Language Model for Virtual Assistant Interactions
by: Wagner, Dominik, et al.
Published: (2025)
by: Wagner, Dominik, et al.
Published: (2025)
ZeroSyl: Simple Zero-Resource Syllable Tokenization for Spoken Language Modeling
by: Visser, Nicol, et al.
Published: (2026)
by: Visser, Nicol, et al.
Published: (2026)
A Survey of Music Generation in the Context of Interaction
by: Agchar, Ismael, et al.
Published: (2024)
by: Agchar, Ismael, et al.
Published: (2024)
Dysfluent WFST: A Framework for Zero-Shot Speech Dysfluency Transcription and Detection
by: Guo, Chenxu, et al.
Published: (2025)
by: Guo, Chenxu, et al.
Published: (2025)
Analysis and Evaluation of Synthetic Data Generation in Speech Dysfluency Detection
by: Zhang, Jinming, et al.
Published: (2025)
by: Zhang, Jinming, et al.
Published: (2025)
Stutter-Solver: End-to-end Multi-lingual Dysfluency Detection
by: Zhou, Xuanru, et al.
Published: (2024)
by: Zhou, Xuanru, et al.
Published: (2024)
LAST: Language Model Aware Speech Tokenization
by: Turetzky, Arnon, et al.
Published: (2024)
by: Turetzky, Arnon, et al.
Published: (2024)
A Multimodal Approach to Device-Directed Speech Detection with Large Language Models
by: Wagner, Dominik, et al.
Published: (2024)
by: Wagner, Dominik, et al.
Published: (2024)
Frontend Token Enhancement for Token-Based Speech Recognition
by: Ashihara, Takanori, et al.
Published: (2026)
by: Ashihara, Takanori, et al.
Published: (2026)
Beyond Levenshtein: Leveraging Multiple Algorithms for Robust Word Error Rate Computations And Granular Error Classifications
by: Kuhn, Korbinian, et al.
Published: (2024)
by: Kuhn, Korbinian, et al.
Published: (2024)
Kanade: A Simple Disentangled Tokenizer for Spoken Language Modeling
by: Huang, Zhijie, et al.
Published: (2026)
by: Huang, Zhijie, et al.
Published: (2026)
Neuron-Level Emotion Control in Speech-Generative Large Audio-Language Models
by: Zhao, Xiutian, et al.
Published: (2026)
by: Zhao, Xiutian, et al.
Published: (2026)
TokenVerse++: Towards Flexible Multitask Learning with Dynamic Task Activation
by: Kumar, Shashi, et al.
Published: (2025)
by: Kumar, Shashi, et al.
Published: (2025)
TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
by: Tseng, Liang-Hsuan, et al.
Published: (2025)
by: Tseng, Liang-Hsuan, et al.
Published: (2025)
Enhancing Whisper's Accuracy and Speed for Indian Languages through Prompt-Tuning and Tokenization
by: Tripathi, Kumud, et al.
Published: (2024)
by: Tripathi, Kumud, et al.
Published: (2024)
Multi-stage Large Language Model Correction for Speech Recognition
by: Pu, Jie, et al.
Published: (2023)
by: Pu, Jie, et al.
Published: (2023)
Model-free Speculative Decoding for Transformer-based ASR with Token Map Drafting
by: Ho, Tuan Vu, et al.
Published: (2025)
by: Ho, Tuan Vu, et al.
Published: (2025)
Similar Items
-
Large Language Models for Dysfluency Detection in Stuttered Speech
by: Wagner, Dominik, et al.
Published: (2024) -
Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0
by: Bayerl, Sebastian P., et al.
Published: (2022) -
Infusing Acoustic Pause Context into Text-Based Dementia Assessment
by: Braun, Franziska, et al.
Published: (2024) -
Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models
by: Wagner, Dominik, et al.
Published: (2024) -
The PARLO Dementia Corpus: A German Multi-Center Resource for Alzheimer's Disease
by: Braun, Franziska, et al.
Published: (2026)