Saved in:
| Main Authors: | Raissi, Tina, Schlüter, Ralf, Ney, Hermann |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.04521 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Investigating the Effect of Label Topology and Training Criterion on ASR Performance and Alignment Quality
by: Raissi, Tina, et al.
Published: (2024)
by: Raissi, Tina, et al.
Published: (2024)
Label-Context-Dependent Internal Language Model Estimation for CTC
by: Yang, Zijian, et al.
Published: (2025)
by: Yang, Zijian, et al.
Published: (2025)
Unified Learnable 2D Convolutional Feature Extraction for ASR
by: Vieting, Peter, et al.
Published: (2025)
by: Vieting, Peter, et al.
Published: (2025)
Sequence-Level Unsupervised Training in Speech Recognition: A Theoretical Study
by: Yang, Zijian, et al.
Published: (2026)
by: Yang, Zijian, et al.
Published: (2026)
Chunked Attention-based Encoder-Decoder Model for Streaming Speech Recognition
by: Zeineldeen, Mohammad, et al.
Published: (2023)
by: Zeineldeen, Mohammad, et al.
Published: (2023)
On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers
by: Yang, Zijian, et al.
Published: (2023)
by: Yang, Zijian, et al.
Published: (2023)
Speaker Adaptation for Quantised End-to-End ASR Models
by: Zhao, Qiuming, et al.
Published: (2024)
by: Zhao, Qiuming, et al.
Published: (2024)
The Conformer Encoder May Reverse the Time Dimension
by: Schmitt, Robin, et al.
Published: (2024)
by: Schmitt, Robin, et al.
Published: (2024)
Speaker-Smoothed kNN Speaker Adaptation for End-to-End ASR
by: Li, Shaojun, et al.
Published: (2024)
by: Li, Shaojun, et al.
Published: (2024)
End-to-End Joint ASR and Speaker Role Diarization with Child-Adult Interactions
by: Xu, Anfeng, et al.
Published: (2026)
by: Xu, Anfeng, et al.
Published: (2026)
SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR
by: Zhao, Qiuming, et al.
Published: (2024)
by: Zhao, Qiuming, et al.
Published: (2024)
Differentiable Time-Varying Linear Prediction in the Context of End-to-End Analysis-by-Synthesis
by: Yu, Chin-Yun, et al.
Published: (2024)
by: Yu, Chin-Yun, et al.
Published: (2024)
Regularizing Learnable Feature Extraction for Automatic Speech Recognition
by: Vieting, Peter, et al.
Published: (2025)
by: Vieting, Peter, et al.
Published: (2025)
Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training
by: Ahmad, Hawraz A., et al.
Published: (2024)
by: Ahmad, Hawraz A., et al.
Published: (2024)
Alternating Weak Triphone/BPE Alignment Supervision from Hybrid Model Improves End-to-End ASR
by: Jiang, Jintao, et al.
Published: (2024)
by: Jiang, Jintao, et al.
Published: (2024)
Streaming Bilingual End-to-End ASR model using Attention over Multiple Softmax
by: Patil, Aditya, et al.
Published: (2024)
by: Patil, Aditya, et al.
Published: (2024)
TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in End-to-End ASR
by: Ravi, Nagarathna, et al.
Published: (2024)
by: Ravi, Nagarathna, et al.
Published: (2024)
Dissecting the Segmentation Model of End-to-End Diarization with Vector Clustering
by: Plaquet, Alexis, et al.
Published: (2025)
by: Plaquet, Alexis, et al.
Published: (2025)
End-to-End Target Speaker Speech Recognition Using Context-Aware Attention Mechanisms for Challenging Enrollment Scenario
by: Ghane, Mohsen, et al.
Published: (2025)
by: Ghane, Mohsen, et al.
Published: (2025)
End-to-End Amp Modeling: From Data to Controllable Guitar Amplifier Models
by: Juvela, Lauri, et al.
Published: (2024)
by: Juvela, Lauri, et al.
Published: (2024)
End-to-end Joint Punctuated and Normalized ASR with a Limited Amount of Punctuated Training Data
by: Cui, Can, et al.
Published: (2023)
by: Cui, Can, et al.
Published: (2023)
A cost minimization approach to fix the vocabulary size in a tokenizer for an End-to-End ASR system
by: Kopparapu, Sunil Kumar, et al.
Published: (2024)
by: Kopparapu, Sunil Kumar, et al.
Published: (2024)
Semi-Autoregressive Streaming ASR With Label Context
by: Arora, Siddhant, et al.
Published: (2023)
by: Arora, Siddhant, et al.
Published: (2023)
Diff-SAGe: End-to-End Spatial Audio Generation Using Diffusion Models
by: Kushwaha, Saksham Singh, et al.
Published: (2024)
by: Kushwaha, Saksham Singh, et al.
Published: (2024)
End-to-End Diarization utilizing Attractor Deep Clustering
by: Palzer, David, et al.
Published: (2025)
by: Palzer, David, et al.
Published: (2025)
An Investigation on Speaker Augmentation for End-to-End Speaker Extraction
by: You, Zhenghai, et al.
Published: (2025)
by: You, Zhenghai, et al.
Published: (2025)
RawBMamba: End-to-End Bidirectional State Space Model for Audio Deepfake Detection
by: Chen, Yujie, et al.
Published: (2024)
by: Chen, Yujie, et al.
Published: (2024)
Converting Anyone's Voice: End-to-End Expressive Voice Conversion with a Conditional Diffusion Model
by: Du, Zongyang, et al.
Published: (2024)
by: Du, Zongyang, et al.
Published: (2024)
Improving noisy student training for low-resource languages in End-to-End ASR using CycleGAN and inter-domain losses
by: Li, Chia-Yu, et al.
Published: (2024)
by: Li, Chia-Yu, et al.
Published: (2024)
End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions
by: Kang, Wonjune, et al.
Published: (2022)
by: Kang, Wonjune, et al.
Published: (2022)
DiaPer: End-to-End Neural Diarization with Perceiver-Based Attractors
by: Landini, Federico, et al.
Published: (2023)
by: Landini, Federico, et al.
Published: (2023)
Joint Training And Decoding for Multilingual End-to-End Simultaneous Speech Translation
by: Huang, Wuwei, et al.
Published: (2025)
by: Huang, Wuwei, et al.
Published: (2025)
AADNet: An End-to-End Deep Learning Model for Auditory Attention Decoding
by: Nguyen, Nhan Duc Thanh, et al.
Published: (2024)
by: Nguyen, Nhan Duc Thanh, et al.
Published: (2024)
WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for Authenticity Verification
by: Zhou, Junzuo, et al.
Published: (2024)
by: Zhou, Junzuo, et al.
Published: (2024)
Interpreting End-to-End Deep Learning Models for Speech Source Localization Using Layer-wise Relevance Propagation
by: Comanducci, Luca, et al.
Published: (2024)
by: Comanducci, Luca, et al.
Published: (2024)
CosyEdit: Unlocking End-to-End Speech Editing Capability from Zero-Shot Text-to-Speech Models
by: Chen, Junyang, et al.
Published: (2026)
by: Chen, Junyang, et al.
Published: (2026)
Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection
by: Zhou, Xuanru, et al.
Published: (2024)
by: Zhou, Xuanru, et al.
Published: (2024)
audio2chart: End to End Audio Transcription into playable Guitar Hero charts
by: Tripodi, Riccardo
Published: (2025)
by: Tripodi, Riccardo
Published: (2025)
Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?
by: Zhang, Lin, et al.
Published: (2024)
by: Zhang, Lin, et al.
Published: (2024)
Neural Scoring: A Refreshed End-to-End Approach for Speaker Recognition in Complex Conditions
by: Lin, Wan, et al.
Published: (2024)
by: Lin, Wan, et al.
Published: (2024)
Similar Items
-
Investigating the Effect of Label Topology and Training Criterion on ASR Performance and Alignment Quality
by: Raissi, Tina, et al.
Published: (2024) -
Label-Context-Dependent Internal Language Model Estimation for CTC
by: Yang, Zijian, et al.
Published: (2025) -
Unified Learnable 2D Convolutional Feature Extraction for ASR
by: Vieting, Peter, et al.
Published: (2025) -
Sequence-Level Unsupervised Training in Speech Recognition: A Theoretical Study
by: Yang, Zijian, et al.
Published: (2026) -
Chunked Attention-based Encoder-Decoder Model for Streaming Speech Recognition
by: Zeineldeen, Mohammad, et al.
Published: (2023)