:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Rutowski, Tomasz, Harati, Amir, Lu, Yang, Shriberg, Elizabeth
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2501.00608
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Speech-Based Depression Prediction Using Encoder-Weight-Only Transfer Learning and a Large Corpus
by: Harati, Amir, et al.
Published: (2024)

Robust Speech and Natural Language Processing Models for Depression Screening
by: Lu, Y., et al.
Published: (2024)

Toward Corpus Size Requirements for Training and Evaluating Depression Risk Models Using Spoken Language
by: Rutowski, Tomek, et al.
Published: (2024)

DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage
by: Wang, Kyra, et al.
Published: (2024)

Investigation of Speaker Representation for Target-Speaker Speech Processing
by: Ashihara, Takanori, et al.
Published: (2024)

MSLM-S2ST: A Multitask Speech Language Model for Textless Speech-to-Speech Translation with Speaker Style Preservation
by: Peng, Yifan, et al.
Published: (2024)

Interpreting Speaker Characteristics in the Dimensions of Self-Supervised Speech Features
by: van Rensburg, Kyle Janse, et al.
Published: (2026)

DiariST: Streaming Speech Translation with Speaker Diarization
by: Yang, Mu, et al.
Published: (2023)

Depression and Anxiety Prediction Using Deep Language Models and Transfer Learning
by: Rutowski, Tomasz, et al.
Published: (2024)

ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis
by: Kong, Jungil, et al.
Published: (2023)

Speaker-Distinguishable CTC: Learning Speaker Distinction Using CTC for Multi-Talker Speech Recognition
by: Sakuma, Asahi, et al.
Published: (2025)

Just ASR + LLM? A Study on Speech Large Language Models' Ability to Identify and Understand Speaker in Spoken Dialogue
by: Wu, Junkai, et al.
Published: (2024)

Speaker-Aware Simulation Improves Conversational Speech Recognition
by: Gedeon, Máté, et al.
Published: (2026)

Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models
by: Lin, Yi-Cheng, et al.
Published: (2024)

TagSpeech: End-to-End Multi-Speaker ASR and Diarization with Fine-Grained Temporal Grounding
by: Huo, Mingyue, et al.
Published: (2026)

Analysis of Speech Temporal Dynamics in the Context of Speaker Verification and Voice Anonymization
by: Tomashenko, Natalia, et al.
Published: (2024)

Speaker- and Text-Independent Estimation of Articulatory Movements and Phoneme Alignments from Speech
by: Weise, Tobias, et al.
Published: (2024)

Gammatonegram Representation for End-to-End Dysarthric Speech Processing Tasks: Speech Recognition, Speaker Identification, and Intelligibility Assessment
by: Farhadipour, Aref, et al.
Published: (2023)

Hypothesis Clustering and Merging: Novel MultiTalker Speech Recognition with Speaker Tokens
by: Kashiwagi, Yosuke, et al.
Published: (2024)

Enhancing Dysarthric Speech Recognition for Unseen Speakers via Prototype-Based Adaptation
by: Wang, Shiyao, et al.
Published: (2024)

Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers
by: Lin, Tzu-Quan, et al.
Published: (2025)

SpeechTaxi: On Multilingual Semantic Speech Classification
by: Keller, Lennart, et al.
Published: (2024)

Two-stage Framework for Robust Speech Emotion Recognition Using Target Speaker Extraction in Human Speech Noise Conditions
by: Mi, Jinyi, et al.
Published: (2024)

Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
by: Wang, Peidong, et al.
Published: (2025)

USAT: A Universal Speaker-Adaptive Text-to-Speech Approach
by: Wang, Wenbin, et al.
Published: (2024)

Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis
by: Fujita, Kenichi, et al.
Published: (2024)

Joint vs Sequential Speaker-Role Detection and Automatic Speech Recognition for Air-traffic Control
by: Blatt, Alexander, et al.
Published: (2024)

In-Context Learning Boosts Speech Recognition via Human-like Adaptation to Speakers and Language Varieties
by: Roll, Nathan, et al.
Published: (2025)

SC-SOT: Conditioning the Decoder on Diarized Speaker Information for End-to-End Overlapped Speech Recognition
by: Hirano, Yuta, et al.
Published: (2025)

Automatic Speech Recognition System-Independent Word Error Rate Estimation
by: Park, Chanho, et al.
Published: (2024)

LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization
by: Jin, Zengrui, et al.
Published: (2024)

Echotune: A Modular Extractor Leveraging the Variable-Length Nature of Speech in ASR Tasks
by: Chen, Sizhou, et al.
Published: (2023)

Speech-IFEval: Evaluating Instruction-Following and Quantifying Catastrophic Forgetting in Speech-Aware Language Models
by: Lu, Ke-Han, et al.
Published: (2025)

Pheme: Efficient and Conversational Speech Generation
by: Budzianowski, Paweł, et al.
Published: (2024)

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions
by: Xu, Anfeng, et al.
Published: (2024)

Classification of Spontaneous and Scripted Speech for Multilingual Audio
by: Elisha, Shahar, et al.
Published: (2024)

Analyzing and Improving Speaker Similarity Assessment for Speech Synthesis
by: Carbonneau, Marc-André, et al.
Published: (2025)

HENT-SRT: Hierarchical Efficient Neural Transducer with Self-Distillation for Joint Speech Recognition and Translation
by: Hussein, Amir, et al.
Published: (2025)

Length-Aware Rotary Position Embedding for Text-Speech Alignment
by: Kim, Hyeongju, et al.
Published: (2025)

Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech SpeechT5 Model
by: Lehečka, Jan, et al.
Published: (2024)