:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Pan, Yilin, Shi, Yanpei, Zhang, Yijia, Lu, Mingyu
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing Artificial Intelligence Computation and Language Sound
Online Access:	https://arxiv.org/abs/2410.07277
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Optimizing Speech Multi-View Feature Fusion through Conditional Computation
by: Shan, Weiqiao, et al.
Published: (2025)

Optimizing Automatic Speech Assessment: W-RankSim Regularization and Hybrid Feature Fusion Strategies
by: Wu, Chung-Wen, et al.
Published: (2024)

Articulation-Informed ASR: Integrating Articulatory Features into ASR via Auxiliary Speech Inversion and Cross-Attention Fusion
by: Attia, Ahmed Adel, et al.
Published: (2025)

Jointly Fine-Tuning "BERT-like" Self Supervised Models to Improve Multimodal Speech Emotion Recognition
by: Siriwardhana, Shamane, et al.
Published: (2020)

Temporal-Aware Iterative Speech Model for Dementia Detection
by: Ugwu, Chukwuemeka, et al.
Published: (2025)

Quantizer-Aware Hierarchical Neural Codec Modeling for Speech Deepfake Detection
by: Wu, Jinyang, et al.
Published: (2026)

Cross-Speaker Encoding Network for Multi-Talker Speech Recognition
by: Kang, Jiawen, et al.
Published: (2024)

MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition
by: Wang, He, et al.
Published: (2024)

Self-supervised Speech Models for Word-Level Stuttered Speech Detection
by: Shih, Yi-Jen, et al.
Published: (2024)

AudioBERT: Audio Knowledge Augmented Language Model
by: Ok, Hyunjong, et al.
Published: (2024)

Serialized Output Prompting for Large Language Model-based Multi-Talker Speech Recognition
by: Shi, Hao, et al.
Published: (2025)

TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation
by: Le, Chenyang, et al.
Published: (2024)

An Investigation Into Explainable Audio Hate Speech Detection
by: An, Jinmyeong, et al.
Published: (2024)

QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions
by: Wang, Siyin, et al.
Published: (2025)

Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition
by: Hu, Shujie, et al.
Published: (2024)

Bimodal Connection Attention Fusion for Speech Emotion Recognition
by: Luo, Jiachen, et al.
Published: (2025)

Speech ReaLLM -- Real-time Streaming Speech Recognition with Multimodal LLMs by Teaching the Flow of Time
by: Seide, Frank, et al.
Published: (2024)

A Benchmark for Early-stage Parkinson's Disease Detection from Speech
by: Zhong, Terry Yi, et al.
Published: (2026)

Hierarchical Self-Supervised Representation Learning for Depression Detection from Speech
by: Li, Yuxin, et al.
Published: (2025)

Improving Child Speech Recognition and Reading Mistake Detection by Using Prompts
by: Gao, Lingyun, et al.
Published: (2025)

Speech Recognition-based Feature Extraction for Enhanced Automatic Severity Classification in Dysarthric Speech
by: Choi, Yerin, et al.
Published: (2024)

Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder
by: Dai, Yusheng, et al.
Published: (2023)

Dynamic Stress Detection: A Study of Temporal Progression Modelling of Stress in Speech
by: Lall, Vishakha, et al.
Published: (2025)

StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning
by: Zhang, Shaolei, et al.
Published: (2024)

HuBERT-VIC: Improving Noise-Robust Automatic Speech Recognition of Speech Foundation Model via Variance-Invariance-Covariance Regularization
by: Ahn, Hyebin, et al.
Published: (2025)

A Chinese Heart Failure Status Speech Database with Universal and Personalised Classification
by: Pan, Yue, et al.
Published: (2025)

Device-Directed Speech Detection for Follow-up Conversations Using Large Language Models
by: Ognjen, et al.
Published: (2024)

Leveraging Unit Language Guidance to Advance Speech Modeling in Textless Speech-to-Speech Translation
by: Zhang, Yuhao, et al.
Published: (2025)

Towards Quantifying and Reducing Language Mismatch Effects in Cross-Lingual Speech Anti-Spoofing
by: Liu, Tianchi, et al.
Published: (2024)

Towards Robust Speech Representation Learning for Thousands of Languages
by: Chen, William, et al.
Published: (2024)

How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?
by: Papi, Sara, et al.
Published: (2024)

Improved Contextual Recognition In Automatic Speech Recognition Systems By Semantic Lattice Rescoring
by: Sudarshan, Ankitha, et al.
Published: (2023)

SeamlessExpressiveLM: Speech Language Model for Expressive Speech-to-Speech Translation with Chain-of-Thought
by: Gong, Hongyu, et al.
Published: (2024)

SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation
by: Yu, Wenyi, et al.
Published: (2024)

Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
by: Wang, Peidong, et al.
Published: (2025)

Phoneme-Level Feature Discrepancies: A Key to Detecting Sophisticated Speech Deepfakes
by: Zhang, Kuiyuan, et al.
Published: (2024)

Adaptability of ASR Models on Low-Resource Language: A Comparative Study of Whisper and Wav2Vec-BERT on Bangla
by: Ridoy, Md Sazzadul Islam, et al.
Published: (2025)

Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT
by: Dai, Dongyang, et al.
Published: (2025)

WavLLM: Towards Robust and Adaptive Speech Large Language Model
by: Hu, Shujie, et al.
Published: (2024)

Seamless Dysfluent Speech Text Alignment for Disordered Speech Analysis
by: Ye, Zongli, et al.
Published: (2025)