Saved in:
| Main Authors: | Rutowski, Tomasz, Harati, Amir, Lu, Yang, Shriberg, Elizabeth |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.00608 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Speech-Based Depression Prediction Using Encoder-Weight-Only Transfer Learning and a Large Corpus
by: Harati, Amir, et al.
Published: (2024)
by: Harati, Amir, et al.
Published: (2024)
Robust Speech and Natural Language Processing Models for Depression Screening
by: Lu, Y., et al.
Published: (2024)
by: Lu, Y., et al.
Published: (2024)
Toward Corpus Size Requirements for Training and Evaluating Depression Risk Models Using Spoken Language
by: Rutowski, Tomek, et al.
Published: (2024)
by: Rutowski, Tomek, et al.
Published: (2024)
DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage
by: Wang, Kyra, et al.
Published: (2024)
by: Wang, Kyra, et al.
Published: (2024)
Investigation of Speaker Representation for Target-Speaker Speech Processing
by: Ashihara, Takanori, et al.
Published: (2024)
by: Ashihara, Takanori, et al.
Published: (2024)
MSLM-S2ST: A Multitask Speech Language Model for Textless Speech-to-Speech Translation with Speaker Style Preservation
by: Peng, Yifan, et al.
Published: (2024)
by: Peng, Yifan, et al.
Published: (2024)
Interpreting Speaker Characteristics in the Dimensions of Self-Supervised Speech Features
by: van Rensburg, Kyle Janse, et al.
Published: (2026)
by: van Rensburg, Kyle Janse, et al.
Published: (2026)
DiariST: Streaming Speech Translation with Speaker Diarization
by: Yang, Mu, et al.
Published: (2023)
by: Yang, Mu, et al.
Published: (2023)
Depression and Anxiety Prediction Using Deep Language Models and Transfer Learning
by: Rutowski, Tomasz, et al.
Published: (2024)
by: Rutowski, Tomasz, et al.
Published: (2024)
ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis
by: Kong, Jungil, et al.
Published: (2023)
by: Kong, Jungil, et al.
Published: (2023)
Speaker-Distinguishable CTC: Learning Speaker Distinction Using CTC for Multi-Talker Speech Recognition
by: Sakuma, Asahi, et al.
Published: (2025)
by: Sakuma, Asahi, et al.
Published: (2025)
Just ASR + LLM? A Study on Speech Large Language Models' Ability to Identify and Understand Speaker in Spoken Dialogue
by: Wu, Junkai, et al.
Published: (2024)
by: Wu, Junkai, et al.
Published: (2024)
Speaker-Aware Simulation Improves Conversational Speech Recognition
by: Gedeon, Máté, et al.
Published: (2026)
by: Gedeon, Máté, et al.
Published: (2026)
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models
by: Lin, Yi-Cheng, et al.
Published: (2024)
by: Lin, Yi-Cheng, et al.
Published: (2024)
TagSpeech: End-to-End Multi-Speaker ASR and Diarization with Fine-Grained Temporal Grounding
by: Huo, Mingyue, et al.
Published: (2026)
by: Huo, Mingyue, et al.
Published: (2026)
Analysis of Speech Temporal Dynamics in the Context of Speaker Verification and Voice Anonymization
by: Tomashenko, Natalia, et al.
Published: (2024)
by: Tomashenko, Natalia, et al.
Published: (2024)
Speaker- and Text-Independent Estimation of Articulatory Movements and Phoneme Alignments from Speech
by: Weise, Tobias, et al.
Published: (2024)
by: Weise, Tobias, et al.
Published: (2024)
Gammatonegram Representation for End-to-End Dysarthric Speech Processing Tasks: Speech Recognition, Speaker Identification, and Intelligibility Assessment
by: Farhadipour, Aref, et al.
Published: (2023)
by: Farhadipour, Aref, et al.
Published: (2023)
Hypothesis Clustering and Merging: Novel MultiTalker Speech Recognition with Speaker Tokens
by: Kashiwagi, Yosuke, et al.
Published: (2024)
by: Kashiwagi, Yosuke, et al.
Published: (2024)
Enhancing Dysarthric Speech Recognition for Unseen Speakers via Prototype-Based Adaptation
by: Wang, Shiyao, et al.
Published: (2024)
by: Wang, Shiyao, et al.
Published: (2024)
Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers
by: Lin, Tzu-Quan, et al.
Published: (2025)
by: Lin, Tzu-Quan, et al.
Published: (2025)
SpeechTaxi: On Multilingual Semantic Speech Classification
by: Keller, Lennart, et al.
Published: (2024)
by: Keller, Lennart, et al.
Published: (2024)
Two-stage Framework for Robust Speech Emotion Recognition Using Target Speaker Extraction in Human Speech Noise Conditions
by: Mi, Jinyi, et al.
Published: (2024)
by: Mi, Jinyi, et al.
Published: (2024)
Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
by: Wang, Peidong, et al.
Published: (2025)
by: Wang, Peidong, et al.
Published: (2025)
USAT: A Universal Speaker-Adaptive Text-to-Speech Approach
by: Wang, Wenbin, et al.
Published: (2024)
by: Wang, Wenbin, et al.
Published: (2024)
Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis
by: Fujita, Kenichi, et al.
Published: (2024)
by: Fujita, Kenichi, et al.
Published: (2024)
Joint vs Sequential Speaker-Role Detection and Automatic Speech Recognition for Air-traffic Control
by: Blatt, Alexander, et al.
Published: (2024)
by: Blatt, Alexander, et al.
Published: (2024)
In-Context Learning Boosts Speech Recognition via Human-like Adaptation to Speakers and Language Varieties
by: Roll, Nathan, et al.
Published: (2025)
by: Roll, Nathan, et al.
Published: (2025)
SC-SOT: Conditioning the Decoder on Diarized Speaker Information for End-to-End Overlapped Speech Recognition
by: Hirano, Yuta, et al.
Published: (2025)
by: Hirano, Yuta, et al.
Published: (2025)
Automatic Speech Recognition System-Independent Word Error Rate Estimation
by: Park, Chanho, et al.
Published: (2024)
by: Park, Chanho, et al.
Published: (2024)
LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization
by: Jin, Zengrui, et al.
Published: (2024)
by: Jin, Zengrui, et al.
Published: (2024)
Echotune: A Modular Extractor Leveraging the Variable-Length Nature of Speech in ASR Tasks
by: Chen, Sizhou, et al.
Published: (2023)
by: Chen, Sizhou, et al.
Published: (2023)
Speech-IFEval: Evaluating Instruction-Following and Quantifying Catastrophic Forgetting in Speech-Aware Language Models
by: Lu, Ke-Han, et al.
Published: (2025)
by: Lu, Ke-Han, et al.
Published: (2025)
Pheme: Efficient and Conversational Speech Generation
by: Budzianowski, Paweł, et al.
Published: (2024)
by: Budzianowski, Paweł, et al.
Published: (2024)
Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions
by: Xu, Anfeng, et al.
Published: (2024)
by: Xu, Anfeng, et al.
Published: (2024)
Classification of Spontaneous and Scripted Speech for Multilingual Audio
by: Elisha, Shahar, et al.
Published: (2024)
by: Elisha, Shahar, et al.
Published: (2024)
Analyzing and Improving Speaker Similarity Assessment for Speech Synthesis
by: Carbonneau, Marc-André, et al.
Published: (2025)
by: Carbonneau, Marc-André, et al.
Published: (2025)
HENT-SRT: Hierarchical Efficient Neural Transducer with Self-Distillation for Joint Speech Recognition and Translation
by: Hussein, Amir, et al.
Published: (2025)
by: Hussein, Amir, et al.
Published: (2025)
Length-Aware Rotary Position Embedding for Text-Speech Alignment
by: Kim, Hyeongju, et al.
Published: (2025)
by: Kim, Hyeongju, et al.
Published: (2025)
Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech SpeechT5 Model
by: Lehečka, Jan, et al.
Published: (2024)
by: Lehečka, Jan, et al.
Published: (2024)
Similar Items
-
Speech-Based Depression Prediction Using Encoder-Weight-Only Transfer Learning and a Large Corpus
by: Harati, Amir, et al.
Published: (2024) -
Robust Speech and Natural Language Processing Models for Depression Screening
by: Lu, Y., et al.
Published: (2024) -
Toward Corpus Size Requirements for Training and Evaluating Depression Risk Models Using Spoken Language
by: Rutowski, Tomek, et al.
Published: (2024) -
DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage
by: Wang, Kyra, et al.
Published: (2024) -
Investigation of Speaker Representation for Target-Speaker Speech Processing
by: Ashihara, Takanori, et al.
Published: (2024)