:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xie, Yudong, Han, Zhifeng, Xiao, Qinfan, Liang, Liwei, Tao, Lu-Qi, Ren, Tian-Ling
Format:	Preprint
Published:	2025
Subjects:	Human-Computer Interaction Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2502.17829
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A Cross-Modal Approach to Silent Speech with LLM-Enhanced Recognition
by: Benster, Tyler, et al.
Published: (2024)

Poster: Recognizing Hidden-in-the-Ear Private Key for Reliable Silent Speech Interface Using Multi-Task Learning
by: Dong, Xuefu, et al.
Published: (2025)

AVE Speech: A Comprehensive Multi-Modal Dataset for Speech Recognition Integrating Audio, Visual, and Electromyographic Signals
by: Zhou, Dongliang, et al.
Published: (2025)

CTC Blank Triggered Dynamic Layer-Skipping for Efficient CTC-based Speech Recognition
by: Hou, Junfeng, et al.
Published: (2024)

Towards LLM-Empowered Fine-Grained Speech Descriptors for Explainable Emotion Recognition
by: Chen, Youjun, et al.
Published: (2025)

Unimodal Aggregation for CTC-based Speech Recognition
by: Fang, Ying, et al.
Published: (2023)

Directional Source Separation for Robust Speech Recognition on Smart Glasses
by: Feng, Tiantian, et al.
Published: (2023)

Automatic Speech Recognition with BERT and CTC Transformers: A Review
by: Djeffal, Noussaiba, et al.
Published: (2024)

Speaker-Distinguishable CTC: Learning Speaker Distinction Using CTC for Multi-Talker Speech Recognition
by: Sakuma, Asahi, et al.
Published: (2025)

Using Confidence Scores to Improve Eyes-free Detection of Speech Recognition Errors
by: Nowrin, Sadia, et al.
Published: (2024)

Adapting Whisper for Lightweight and Efficient Automatic Speech Recognition of Children for On-device Edge Applications
by: Dutta, Satwik, et al.
Published: (2025)

Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation
by: Tsunoo, Emiru, et al.
Published: (2023)

STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion Recognition
by: Chang, Yi, et al.
Published: (2024)

Enhancing CTC-Based Visual Speech Recognition
by: Laux, Hendrik, et al.
Published: (2024)

Real-Time Word-Level Temporal Segmentation in Streaming Speech Recognition
by: Nishida, Naoto, et al.
Published: (2025)

AdaMER-CTC: Connectionist Temporal Classification with Adaptive Maximum Entropy Regularization for Automatic Speech Recognition
by: Eom, SooHwan, et al.
Published: (2024)

Personalized Speech Emotion Recognition in Human-Robot Interaction using Vision Transformers
by: Mishra, Ruchik, et al.
Published: (2024)

Timbre-Aware LLM-based Direct Speech-to-Speech Translation Extendable to Multiple Language Pairs
by: Arya, Lalaram, et al.
Published: (2026)

Psychophysiology-aided Perceptually Fluent Speech Analysis of Children Who Stutter
by: Xiao, Yi, et al.
Published: (2022)

Human Feedback Driven Dynamic Speech Emotion Recognition
by: Fedorov, Ilya, et al.
Published: (2025)

Improving Multimodal Emotion Recognition by Leveraging Acoustic Adaptation and Visual Alignment
by: Zhao, Zhixian, et al.
Published: (2024)

Exploring Cross-Utterance Speech Contexts for Conformer-Transducer Speech Recognition Systems
by: Cui, Mingyu, et al.
Published: (2025)

SegAug: CTC-Aligned Segmented Augmentation For Robust RNN-Transducer Based Speech Recognition
by: Le, Khanh, et al.
Published: (2025)

Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC
by: Kang, Jiawen, et al.
Published: (2024)

Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer
by: Burchi, Maxime, et al.
Published: (2024)

Cluster-to-Predict Affect Contours from Speech
by: Kuşçu, Gökhan, et al.
Published: (2024)

USpeech: Ultrasound-Enhanced Speech with Minimal Human Effort via Cross-Modal Synthesis
by: Yu, Luca Jiang-Tao, et al.
Published: (2024)

Toward using Speech to Sense Student Emotion in Remote Learning Environments
by: Vyas, Sargam, et al.
Published: (2026)

Sound-Based Recognition of Touch Gestures and Emotions for Enhanced Human-Robot Interaction
by: Hou, Yuanbo, et al.
Published: (2024)

Recreating Neural Activity During Speech Production with Language and Speech Model Embeddings
by: Khanday, Owais Mujtaba, et al.
Published: (2025)

SiamCTC: Learning Speech Representations through Monotonic Temporal Alignment
by: Eom, SooHwan, et al.
Published: (2026)

CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition
by: Bartelds, Martijn, et al.
Published: (2025)

OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
by: Peng, Yifan, et al.
Published: (2024)

LV-CTC: Non-autoregressive ASR with CTC and latent variable models
by: Fujita, Yuya, et al.
Published: (2024)

IR-UWB Radar-Based Contactless Silent Speech Recognition with Attention-Enhanced Temporal Convolutional Networks
by: Lee, Sunghwa, et al.
Published: (2025)

A Near-Real-Time Processing Ego Speech Filtering Pipeline Designed for Speech Interruption During Human-Robot Interaction
by: Li, Yue, et al.
Published: (2024)

Towards Temporally Explainable Dysarthric Speech Clarity Assessment
by: Park, Seohyun, et al.
Published: (2025)

SACM: SEEG-Audio Contrastive Matching for Chinese Speech Decoding
by: Wang, Hongbin, et al.
Published: (2025)

VoiceX: A Text-To-Speech Framework for Custom Voices
by: Mertes, Silvan, et al.
Published: (2024)

VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching
by: Guo, Yiwei, et al.
Published: (2023)