Saved in:
| Main Authors: | Xie, Yudong, Han, Zhifeng, Xiao, Qinfan, Liang, Liwei, Tao, Lu-Qi, Ren, Tian-Ling |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.17829 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Cross-Modal Approach to Silent Speech with LLM-Enhanced Recognition
by: Benster, Tyler, et al.
Published: (2024)
by: Benster, Tyler, et al.
Published: (2024)
Poster: Recognizing Hidden-in-the-Ear Private Key for Reliable Silent Speech Interface Using Multi-Task Learning
by: Dong, Xuefu, et al.
Published: (2025)
by: Dong, Xuefu, et al.
Published: (2025)
AVE Speech: A Comprehensive Multi-Modal Dataset for Speech Recognition Integrating Audio, Visual, and Electromyographic Signals
by: Zhou, Dongliang, et al.
Published: (2025)
by: Zhou, Dongliang, et al.
Published: (2025)
CTC Blank Triggered Dynamic Layer-Skipping for Efficient CTC-based Speech Recognition
by: Hou, Junfeng, et al.
Published: (2024)
by: Hou, Junfeng, et al.
Published: (2024)
Towards LLM-Empowered Fine-Grained Speech Descriptors for Explainable Emotion Recognition
by: Chen, Youjun, et al.
Published: (2025)
by: Chen, Youjun, et al.
Published: (2025)
Unimodal Aggregation for CTC-based Speech Recognition
by: Fang, Ying, et al.
Published: (2023)
by: Fang, Ying, et al.
Published: (2023)
Directional Source Separation for Robust Speech Recognition on Smart Glasses
by: Feng, Tiantian, et al.
Published: (2023)
by: Feng, Tiantian, et al.
Published: (2023)
Automatic Speech Recognition with BERT and CTC Transformers: A Review
by: Djeffal, Noussaiba, et al.
Published: (2024)
by: Djeffal, Noussaiba, et al.
Published: (2024)
Speaker-Distinguishable CTC: Learning Speaker Distinction Using CTC for Multi-Talker Speech Recognition
by: Sakuma, Asahi, et al.
Published: (2025)
by: Sakuma, Asahi, et al.
Published: (2025)
Using Confidence Scores to Improve Eyes-free Detection of Speech Recognition Errors
by: Nowrin, Sadia, et al.
Published: (2024)
by: Nowrin, Sadia, et al.
Published: (2024)
Adapting Whisper for Lightweight and Efficient Automatic Speech Recognition of Children for On-device Edge Applications
by: Dutta, Satwik, et al.
Published: (2025)
by: Dutta, Satwik, et al.
Published: (2025)
Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation
by: Tsunoo, Emiru, et al.
Published: (2023)
by: Tsunoo, Emiru, et al.
Published: (2023)
STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion Recognition
by: Chang, Yi, et al.
Published: (2024)
by: Chang, Yi, et al.
Published: (2024)
Enhancing CTC-Based Visual Speech Recognition
by: Laux, Hendrik, et al.
Published: (2024)
by: Laux, Hendrik, et al.
Published: (2024)
Real-Time Word-Level Temporal Segmentation in Streaming Speech Recognition
by: Nishida, Naoto, et al.
Published: (2025)
by: Nishida, Naoto, et al.
Published: (2025)
AdaMER-CTC: Connectionist Temporal Classification with Adaptive Maximum Entropy Regularization for Automatic Speech Recognition
by: Eom, SooHwan, et al.
Published: (2024)
by: Eom, SooHwan, et al.
Published: (2024)
Personalized Speech Emotion Recognition in Human-Robot Interaction using Vision Transformers
by: Mishra, Ruchik, et al.
Published: (2024)
by: Mishra, Ruchik, et al.
Published: (2024)
Timbre-Aware LLM-based Direct Speech-to-Speech Translation Extendable to Multiple Language Pairs
by: Arya, Lalaram, et al.
Published: (2026)
by: Arya, Lalaram, et al.
Published: (2026)
Psychophysiology-aided Perceptually Fluent Speech Analysis of Children Who Stutter
by: Xiao, Yi, et al.
Published: (2022)
by: Xiao, Yi, et al.
Published: (2022)
Human Feedback Driven Dynamic Speech Emotion Recognition
by: Fedorov, Ilya, et al.
Published: (2025)
by: Fedorov, Ilya, et al.
Published: (2025)
Improving Multimodal Emotion Recognition by Leveraging Acoustic Adaptation and Visual Alignment
by: Zhao, Zhixian, et al.
Published: (2024)
by: Zhao, Zhixian, et al.
Published: (2024)
Exploring Cross-Utterance Speech Contexts for Conformer-Transducer Speech Recognition Systems
by: Cui, Mingyu, et al.
Published: (2025)
by: Cui, Mingyu, et al.
Published: (2025)
SegAug: CTC-Aligned Segmented Augmentation For Robust RNN-Transducer Based Speech Recognition
by: Le, Khanh, et al.
Published: (2025)
by: Le, Khanh, et al.
Published: (2025)
Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC
by: Kang, Jiawen, et al.
Published: (2024)
by: Kang, Jiawen, et al.
Published: (2024)
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer
by: Burchi, Maxime, et al.
Published: (2024)
by: Burchi, Maxime, et al.
Published: (2024)
Cluster-to-Predict Affect Contours from Speech
by: Kuşçu, Gökhan, et al.
Published: (2024)
by: Kuşçu, Gökhan, et al.
Published: (2024)
USpeech: Ultrasound-Enhanced Speech with Minimal Human Effort via Cross-Modal Synthesis
by: Yu, Luca Jiang-Tao, et al.
Published: (2024)
by: Yu, Luca Jiang-Tao, et al.
Published: (2024)
Toward using Speech to Sense Student Emotion in Remote Learning Environments
by: Vyas, Sargam, et al.
Published: (2026)
by: Vyas, Sargam, et al.
Published: (2026)
Sound-Based Recognition of Touch Gestures and Emotions for Enhanced Human-Robot Interaction
by: Hou, Yuanbo, et al.
Published: (2024)
by: Hou, Yuanbo, et al.
Published: (2024)
Recreating Neural Activity During Speech Production with Language and Speech Model Embeddings
by: Khanday, Owais Mujtaba, et al.
Published: (2025)
by: Khanday, Owais Mujtaba, et al.
Published: (2025)
SiamCTC: Learning Speech Representations through Monotonic Temporal Alignment
by: Eom, SooHwan, et al.
Published: (2026)
by: Eom, SooHwan, et al.
Published: (2026)
CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition
by: Bartelds, Martijn, et al.
Published: (2025)
by: Bartelds, Martijn, et al.
Published: (2025)
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
by: Peng, Yifan, et al.
Published: (2024)
by: Peng, Yifan, et al.
Published: (2024)
LV-CTC: Non-autoregressive ASR with CTC and latent variable models
by: Fujita, Yuya, et al.
Published: (2024)
by: Fujita, Yuya, et al.
Published: (2024)
IR-UWB Radar-Based Contactless Silent Speech Recognition with Attention-Enhanced Temporal Convolutional Networks
by: Lee, Sunghwa, et al.
Published: (2025)
by: Lee, Sunghwa, et al.
Published: (2025)
A Near-Real-Time Processing Ego Speech Filtering Pipeline Designed for Speech Interruption During Human-Robot Interaction
by: Li, Yue, et al.
Published: (2024)
by: Li, Yue, et al.
Published: (2024)
Towards Temporally Explainable Dysarthric Speech Clarity Assessment
by: Park, Seohyun, et al.
Published: (2025)
by: Park, Seohyun, et al.
Published: (2025)
SACM: SEEG-Audio Contrastive Matching for Chinese Speech Decoding
by: Wang, Hongbin, et al.
Published: (2025)
by: Wang, Hongbin, et al.
Published: (2025)
VoiceX: A Text-To-Speech Framework for Custom Voices
by: Mertes, Silvan, et al.
Published: (2024)
by: Mertes, Silvan, et al.
Published: (2024)
VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching
by: Guo, Yiwei, et al.
Published: (2023)
by: Guo, Yiwei, et al.
Published: (2023)
Similar Items
-
A Cross-Modal Approach to Silent Speech with LLM-Enhanced Recognition
by: Benster, Tyler, et al.
Published: (2024) -
Poster: Recognizing Hidden-in-the-Ear Private Key for Reliable Silent Speech Interface Using Multi-Task Learning
by: Dong, Xuefu, et al.
Published: (2025) -
AVE Speech: A Comprehensive Multi-Modal Dataset for Speech Recognition Integrating Audio, Visual, and Electromyographic Signals
by: Zhou, Dongliang, et al.
Published: (2025) -
CTC Blank Triggered Dynamic Layer-Skipping for Efficient CTC-based Speech Recognition
by: Hou, Junfeng, et al.
Published: (2024) -
Towards LLM-Empowered Fine-Grained Speech Descriptors for Explainable Emotion Recognition
by: Chen, Youjun, et al.
Published: (2025)