Saved in:
| Main Authors: | Pan, Yilin, Shi, Yanpei, Zhang, Yijia, Lu, Mingyu |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.07277 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Optimizing Speech Multi-View Feature Fusion through Conditional Computation
by: Shan, Weiqiao, et al.
Published: (2025)
by: Shan, Weiqiao, et al.
Published: (2025)
Optimizing Automatic Speech Assessment: W-RankSim Regularization and Hybrid Feature Fusion Strategies
by: Wu, Chung-Wen, et al.
Published: (2024)
by: Wu, Chung-Wen, et al.
Published: (2024)
Articulation-Informed ASR: Integrating Articulatory Features into ASR via Auxiliary Speech Inversion and Cross-Attention Fusion
by: Attia, Ahmed Adel, et al.
Published: (2025)
by: Attia, Ahmed Adel, et al.
Published: (2025)
Jointly Fine-Tuning "BERT-like" Self Supervised Models to Improve Multimodal Speech Emotion Recognition
by: Siriwardhana, Shamane, et al.
Published: (2020)
by: Siriwardhana, Shamane, et al.
Published: (2020)
Temporal-Aware Iterative Speech Model for Dementia Detection
by: Ugwu, Chukwuemeka, et al.
Published: (2025)
by: Ugwu, Chukwuemeka, et al.
Published: (2025)
Quantizer-Aware Hierarchical Neural Codec Modeling for Speech Deepfake Detection
by: Wu, Jinyang, et al.
Published: (2026)
by: Wu, Jinyang, et al.
Published: (2026)
Cross-Speaker Encoding Network for Multi-Talker Speech Recognition
by: Kang, Jiawen, et al.
Published: (2024)
by: Kang, Jiawen, et al.
Published: (2024)
MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition
by: Wang, He, et al.
Published: (2024)
by: Wang, He, et al.
Published: (2024)
Self-supervised Speech Models for Word-Level Stuttered Speech Detection
by: Shih, Yi-Jen, et al.
Published: (2024)
by: Shih, Yi-Jen, et al.
Published: (2024)
AudioBERT: Audio Knowledge Augmented Language Model
by: Ok, Hyunjong, et al.
Published: (2024)
by: Ok, Hyunjong, et al.
Published: (2024)
Serialized Output Prompting for Large Language Model-based Multi-Talker Speech Recognition
by: Shi, Hao, et al.
Published: (2025)
by: Shi, Hao, et al.
Published: (2025)
TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation
by: Le, Chenyang, et al.
Published: (2024)
by: Le, Chenyang, et al.
Published: (2024)
An Investigation Into Explainable Audio Hate Speech Detection
by: An, Jinmyeong, et al.
Published: (2024)
by: An, Jinmyeong, et al.
Published: (2024)
QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions
by: Wang, Siyin, et al.
Published: (2025)
by: Wang, Siyin, et al.
Published: (2025)
Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition
by: Hu, Shujie, et al.
Published: (2024)
by: Hu, Shujie, et al.
Published: (2024)
Bimodal Connection Attention Fusion for Speech Emotion Recognition
by: Luo, Jiachen, et al.
Published: (2025)
by: Luo, Jiachen, et al.
Published: (2025)
Speech ReaLLM -- Real-time Streaming Speech Recognition with Multimodal LLMs by Teaching the Flow of Time
by: Seide, Frank, et al.
Published: (2024)
by: Seide, Frank, et al.
Published: (2024)
A Benchmark for Early-stage Parkinson's Disease Detection from Speech
by: Zhong, Terry Yi, et al.
Published: (2026)
by: Zhong, Terry Yi, et al.
Published: (2026)
Hierarchical Self-Supervised Representation Learning for Depression Detection from Speech
by: Li, Yuxin, et al.
Published: (2025)
by: Li, Yuxin, et al.
Published: (2025)
Improving Child Speech Recognition and Reading Mistake Detection by Using Prompts
by: Gao, Lingyun, et al.
Published: (2025)
by: Gao, Lingyun, et al.
Published: (2025)
Speech Recognition-based Feature Extraction for Enhanced Automatic Severity Classification in Dysarthric Speech
by: Choi, Yerin, et al.
Published: (2024)
by: Choi, Yerin, et al.
Published: (2024)
Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder
by: Dai, Yusheng, et al.
Published: (2023)
by: Dai, Yusheng, et al.
Published: (2023)
Dynamic Stress Detection: A Study of Temporal Progression Modelling of Stress in Speech
by: Lall, Vishakha, et al.
Published: (2025)
by: Lall, Vishakha, et al.
Published: (2025)
StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning
by: Zhang, Shaolei, et al.
Published: (2024)
by: Zhang, Shaolei, et al.
Published: (2024)
HuBERT-VIC: Improving Noise-Robust Automatic Speech Recognition of Speech Foundation Model via Variance-Invariance-Covariance Regularization
by: Ahn, Hyebin, et al.
Published: (2025)
by: Ahn, Hyebin, et al.
Published: (2025)
A Chinese Heart Failure Status Speech Database with Universal and Personalised Classification
by: Pan, Yue, et al.
Published: (2025)
by: Pan, Yue, et al.
Published: (2025)
Device-Directed Speech Detection for Follow-up Conversations Using Large Language Models
by: Ognjen, et al.
Published: (2024)
by: Ognjen, et al.
Published: (2024)
Leveraging Unit Language Guidance to Advance Speech Modeling in Textless Speech-to-Speech Translation
by: Zhang, Yuhao, et al.
Published: (2025)
by: Zhang, Yuhao, et al.
Published: (2025)
Towards Quantifying and Reducing Language Mismatch Effects in Cross-Lingual Speech Anti-Spoofing
by: Liu, Tianchi, et al.
Published: (2024)
by: Liu, Tianchi, et al.
Published: (2024)
Towards Robust Speech Representation Learning for Thousands of Languages
by: Chen, William, et al.
Published: (2024)
by: Chen, William, et al.
Published: (2024)
How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?
by: Papi, Sara, et al.
Published: (2024)
by: Papi, Sara, et al.
Published: (2024)
Improved Contextual Recognition In Automatic Speech Recognition Systems By Semantic Lattice Rescoring
by: Sudarshan, Ankitha, et al.
Published: (2023)
by: Sudarshan, Ankitha, et al.
Published: (2023)
SeamlessExpressiveLM: Speech Language Model for Expressive Speech-to-Speech Translation with Chain-of-Thought
by: Gong, Hongyu, et al.
Published: (2024)
by: Gong, Hongyu, et al.
Published: (2024)
SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation
by: Yu, Wenyi, et al.
Published: (2024)
by: Yu, Wenyi, et al.
Published: (2024)
Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
by: Wang, Peidong, et al.
Published: (2025)
by: Wang, Peidong, et al.
Published: (2025)
Phoneme-Level Feature Discrepancies: A Key to Detecting Sophisticated Speech Deepfakes
by: Zhang, Kuiyuan, et al.
Published: (2024)
by: Zhang, Kuiyuan, et al.
Published: (2024)
Adaptability of ASR Models on Low-Resource Language: A Comparative Study of Whisper and Wav2Vec-BERT on Bangla
by: Ridoy, Md Sazzadul Islam, et al.
Published: (2025)
by: Ridoy, Md Sazzadul Islam, et al.
Published: (2025)
Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT
by: Dai, Dongyang, et al.
Published: (2025)
by: Dai, Dongyang, et al.
Published: (2025)
WavLLM: Towards Robust and Adaptive Speech Large Language Model
by: Hu, Shujie, et al.
Published: (2024)
by: Hu, Shujie, et al.
Published: (2024)
Seamless Dysfluent Speech Text Alignment for Disordered Speech Analysis
by: Ye, Zongli, et al.
Published: (2025)
by: Ye, Zongli, et al.
Published: (2025)
Similar Items
-
Optimizing Speech Multi-View Feature Fusion through Conditional Computation
by: Shan, Weiqiao, et al.
Published: (2025) -
Optimizing Automatic Speech Assessment: W-RankSim Regularization and Hybrid Feature Fusion Strategies
by: Wu, Chung-Wen, et al.
Published: (2024) -
Articulation-Informed ASR: Integrating Articulatory Features into ASR via Auxiliary Speech Inversion and Cross-Attention Fusion
by: Attia, Ahmed Adel, et al.
Published: (2025) -
Jointly Fine-Tuning "BERT-like" Self Supervised Models to Improve Multimodal Speech Emotion Recognition
by: Siriwardhana, Shamane, et al.
Published: (2020) -
Temporal-Aware Iterative Speech Model for Dementia Detection
by: Ugwu, Chukwuemeka, et al.
Published: (2025)