:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Huang, Wanting, Wang, Weiran
Format:	Preprint
Published:	2026
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2602.23171
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

NLE: Non-autoregressive LLM-based ASR by Transcript Editing
by: Dekel, Avihu, et al.
Published: (2026)

LV-CTC: Non-autoregressive ASR with CTC and latent variable models
by: Fujita, Yuya, et al.
Published: (2024)

A Neural Model for Contextual Biasing Score Learning and Filtering
by: Huang, Wanting, et al.
Published: (2025)

Semi-supervised Learning for Code-Switching ASR with Large Language Model Filter
by: Xi, Yu, et al.
Published: (2024)

Consistency Based Unsupervised Self-training For ASR Personalisation
by: Zhang, Jisi, et al.
Published: (2024)

Towards Effective and Efficient Non-autoregressive decoders for Conformer and LLM-based ASR using Block-based Attention Mask
by: Wang, Tianzi, et al.
Published: (2025)

RCT: Random Consistency Training for Semi-supervised Sound Event Detection
by: Shao, Nian, et al.
Published: (2021)

UniEnc-CASSNAT: An Encoder-only Non-autoregressive ASR for Speech SSL Models
by: Fan, Ruchao, et al.
Published: (2024)

Inverse-Hessian Regularization for Continual Learning in ASR
by: Eeckt, Steven Vander, et al.
Published: (2026)

Regularized autoregressive modeling and its application to audio signal reconstruction
by: Mokrý, Ondřej, et al.
Published: (2024)

Transducer Consistency Regularization for Speech to Text Applications
by: Tseng, Cindy, et al.
Published: (2024)

Reducing the Offline-Streaming Gap for Unified ASR Transducer with Consistency Regularization
by: Andrusenko, Andrei, et al.
Published: (2026)

Masked Self-distilled Transducer-based Keyword Spotting with Semi-autoregressive Decoding
by: Xi, Yu, et al.
Published: (2025)

Improving Music Source Separation with Diffusion and Consistency Refinement
by: Karchkhadze, Tornike, et al.
Published: (2024)

MDM-ASR: Bridging Accuracy and Efficiency in ASR with Diffusion-Based Non-Autoregressive Decoding
by: Yen, Hao, et al.
Published: (2026)

Multimodal Consistency-Guided Reference-Free Data Selection for ASR Accent Adaptation
by: Lei, Ligong, et al.
Published: (2026)

Better Semi-supervised Learning for Multi-domain ASR Through Incremental Retraining and Data Filtering
by: Carofilis, Andres, et al.
Published: (2025)

Enhancing Lyrics Transcription on Music Mixtures with Consistency Loss
by: Huang, Jiawen, et al.
Published: (2025)

AudioLCM: Text-to-Audio Generation with Latent Consistency Models
by: Liu, Huadai, et al.
Published: (2024)

DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models
by: Li, Li, et al.
Published: (2026)

Music Consistency Models
by: Fei, Zhengcong, et al.
Published: (2024)

InfiniteAudio: Infinite-Length Audio Generation with Consistency
by: Jung, Chaeyoung, et al.
Published: (2025)

Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition
by: Hu, Shujie, et al.
Published: (2024)

Semi-Autoregressive Streaming ASR With Label Context
by: Arora, Siddhant, et al.
Published: (2023)

DQR-TTS: Semi-supervised Text-to-speech Synthesis with Dynamic Quantized Representation
by: Wang, Jianzong, et al.
Published: (2023)

VS-Singer: Vision-Guided Stereo Singing Voice Synthesis with Consistency Schrödinger Bridge
by: Zhao, Zijing, et al.
Published: (2025)

A Non-autoregressive Model for Joint STT and TTS
by: Sunder, Vishal, et al.
Published: (2025)

An Investigation on Combining Geometry and Consistency Constraints into Phase Estimation for Speech Enhancement
by: Ho, Chun-Wei, et al.
Published: (2025)

Mel-FullSubNet: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR
by: Zhou, Rui, et al.
Published: (2024)

Vocoder-Free Non-Parallel Conversion of Whispered Speech With Masked Cycle-Consistent Generative Adversarial Networks
by: Wagner, Dominik, et al.
Published: (2023)

A Mamba-based Network for Semi-supervised Singing Melody Extraction Using Confidence Binary Regularization
by: He, Xiaoliang, et al.
Published: (2025)

Improving ASR Contextual Biasing with Guided Attention
by: Tang, Jiyang, et al.
Published: (2024)

LASER: Learning by Aligning Self-supervised Representations of Speech for Improving Content-related Tasks
by: Meghanani, Amit, et al.
Published: (2024)

Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models
by: Fan, Ruchao, et al.
Published: (2024)

Neurodyne: Neural Pitch Manipulation with Representation Learning and Cycle-Consistency GAN
by: Gu, Yicheng, et al.
Published: (2025)

Tweaking autoregressive methods for inpainting of gaps in audio signals
by: Mokrý, Ondřej, et al.
Published: (2024)

All-in-One ASR: Unifying Encoder-Decoder Models of CTC, Attention, and Transducer in Dual-Mode ASR
by: Moriya, Takafumi, et al.
Published: (2025)

Boosting Multi-Speaker Expressive Speech Synthesis with Semi-supervised Contrastive Learning
by: Zhu, Xinfa, et al.
Published: (2023)

Schrödinger Bridge Consistency Trajectory Models for Speech Enhancement
by: Nishigori, Shuichiro, et al.
Published: (2025)

Low-Cost Detection of Degraded Voice Clones via Source-Output Acoustic Consistency
by: Shokr, Jana, et al.
Published: (2026)