:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Raissi, Tina, Schlüter, Ralf, Ney, Hermann
Format:	Preprint
Published:	2025
Subjects:	Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2501.04521
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Investigating the Effect of Label Topology and Training Criterion on ASR Performance and Alignment Quality
by: Raissi, Tina, et al.
Published: (2024)

Label-Context-Dependent Internal Language Model Estimation for CTC
by: Yang, Zijian, et al.
Published: (2025)

Unified Learnable 2D Convolutional Feature Extraction for ASR
by: Vieting, Peter, et al.
Published: (2025)

Sequence-Level Unsupervised Training in Speech Recognition: A Theoretical Study
by: Yang, Zijian, et al.
Published: (2026)

Chunked Attention-based Encoder-Decoder Model for Streaming Speech Recognition
by: Zeineldeen, Mohammad, et al.
Published: (2023)

On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers
by: Yang, Zijian, et al.
Published: (2023)

Speaker Adaptation for Quantised End-to-End ASR Models
by: Zhao, Qiuming, et al.
Published: (2024)

The Conformer Encoder May Reverse the Time Dimension
by: Schmitt, Robin, et al.
Published: (2024)

Speaker-Smoothed kNN Speaker Adaptation for End-to-End ASR
by: Li, Shaojun, et al.
Published: (2024)

End-to-End Joint ASR and Speaker Role Diarization with Child-Adult Interactions
by: Xu, Anfeng, et al.
Published: (2026)

SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR
by: Zhao, Qiuming, et al.
Published: (2024)

Differentiable Time-Varying Linear Prediction in the Context of End-to-End Analysis-by-Synthesis
by: Yu, Chin-Yun, et al.
Published: (2024)

Regularizing Learnable Feature Extraction for Automatic Speech Recognition
by: Vieting, Peter, et al.
Published: (2025)

Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training
by: Ahmad, Hawraz A., et al.
Published: (2024)

Alternating Weak Triphone/BPE Alignment Supervision from Hybrid Model Improves End-to-End ASR
by: Jiang, Jintao, et al.
Published: (2024)

Streaming Bilingual End-to-End ASR model using Attention over Multiple Softmax
by: Patil, Aditya, et al.
Published: (2024)

TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in End-to-End ASR
by: Ravi, Nagarathna, et al.
Published: (2024)

Dissecting the Segmentation Model of End-to-End Diarization with Vector Clustering
by: Plaquet, Alexis, et al.
Published: (2025)

End-to-End Target Speaker Speech Recognition Using Context-Aware Attention Mechanisms for Challenging Enrollment Scenario
by: Ghane, Mohsen, et al.
Published: (2025)

End-to-End Amp Modeling: From Data to Controllable Guitar Amplifier Models
by: Juvela, Lauri, et al.
Published: (2024)

End-to-end Joint Punctuated and Normalized ASR with a Limited Amount of Punctuated Training Data
by: Cui, Can, et al.
Published: (2023)

A cost minimization approach to fix the vocabulary size in a tokenizer for an End-to-End ASR system
by: Kopparapu, Sunil Kumar, et al.
Published: (2024)

Semi-Autoregressive Streaming ASR With Label Context
by: Arora, Siddhant, et al.
Published: (2023)

Diff-SAGe: End-to-End Spatial Audio Generation Using Diffusion Models
by: Kushwaha, Saksham Singh, et al.
Published: (2024)

End-to-End Diarization utilizing Attractor Deep Clustering
by: Palzer, David, et al.
Published: (2025)

An Investigation on Speaker Augmentation for End-to-End Speaker Extraction
by: You, Zhenghai, et al.
Published: (2025)

RawBMamba: End-to-End Bidirectional State Space Model for Audio Deepfake Detection
by: Chen, Yujie, et al.
Published: (2024)

Converting Anyone's Voice: End-to-End Expressive Voice Conversion with a Conditional Diffusion Model
by: Du, Zongyang, et al.
Published: (2024)

Improving noisy student training for low-resource languages in End-to-End ASR using CycleGAN and inter-domain losses
by: Li, Chia-Yu, et al.
Published: (2024)

End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions
by: Kang, Wonjune, et al.
Published: (2022)

DiaPer: End-to-End Neural Diarization with Perceiver-Based Attractors
by: Landini, Federico, et al.
Published: (2023)

Joint Training And Decoding for Multilingual End-to-End Simultaneous Speech Translation
by: Huang, Wuwei, et al.
Published: (2025)

AADNet: An End-to-End Deep Learning Model for Auditory Attention Decoding
by: Nguyen, Nhan Duc Thanh, et al.
Published: (2024)

WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for Authenticity Verification
by: Zhou, Junzuo, et al.
Published: (2024)

Interpreting End-to-End Deep Learning Models for Speech Source Localization Using Layer-wise Relevance Propagation
by: Comanducci, Luca, et al.
Published: (2024)

CosyEdit: Unlocking End-to-End Speech Editing Capability from Zero-Shot Text-to-Speech Models
by: Chen, Junyang, et al.
Published: (2026)

Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection
by: Zhou, Xuanru, et al.
Published: (2024)

audio2chart: End to End Audio Transcription into playable Guitar Hero charts
by: Tripodi, Riccardo
Published: (2025)

Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?
by: Zhang, Lin, et al.
Published: (2024)

Neural Scoring: A Refreshed End-to-End Approach for Speaker Recognition in Complex Conditions
by: Lin, Wan, et al.
Published: (2024)