:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Mehlman, Nick, Thebaud, Thomas, Byrd, Dani, Narayanan, Shri
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Audio and Speech Processing
Online-Zugang:	https://arxiv.org/abs/2506.06834
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits
von: Feng, Tiantian, et al.
Veröffentlicht: (2025)

Developing a Top-tier Framework in Naturalistic Conditions Challenge for Categorized Emotion Prediction: From Speech Foundation Models and Learning Objective to Data Augmentation and Engineering Choices
von: Feng, Tiantian, et al.
Veröffentlicht: (2025)

Scaling Multi-Talker ASR with Speaker-Agnostic Activity Streams
von: He, Xiluo, et al.
Veröffentlicht: (2025)

On the Relationship between Accent Strength and Articulatory Features
von: Huang, Kevin, et al.
Veröffentlicht: (2025)

Unraveling Adversarial Examples against Speaker Identification -- Techniques for Attack Detection and Victim Model Classification
von: Joshi, Sonal, et al.
Veröffentlicht: (2024)

Interpretable Modeling of Articulatory Temporal Dynamics from real-time MRI for Phoneme Recognition
von: Park, Jay, et al.
Veröffentlicht: (2025)

Exploring Speech Foundation Models for Speaker Diarization Across Lifespan
von: Xu, Anfeng, et al.
Veröffentlicht: (2026)

Joint ASR and Speaker Role Tagging with Serialized Output Training
von: Xu, Anfeng, et al.
Veröffentlicht: (2025)

Data Efficient Child-Adult Speaker Diarization with Simulated Conversations
von: Xu, Anfeng, et al.
Veröffentlicht: (2024)

Pretraining Multi-Speaker Identification for Neural Speaker Diarization
von: Horiguchi, Shota, et al.
Veröffentlicht: (2025)

Phone Duration Modeling for Speaker Age Estimation in Children
von: Shivakumar, Prashanth Gurunath, et al.
Veröffentlicht: (2021)

Enhancing Dialogue Annotation with Speaker Characteristics Leveraging a Frozen LLM
von: Thebaud, Thomas, et al.
Veröffentlicht: (2025)

Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe
von: Feng, Tiantian, et al.
Veröffentlicht: (2025)

VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark
von: Lin, Yuke, et al.
Veröffentlicht: (2024)

A Toolkit for Joint Speaker Diarization and Identification with Application to Speaker-Attributed ASR
von: Morrone, Giovanni, et al.
Veröffentlicht: (2024)

End-to-End Joint ASR and Speaker Role Diarization with Child-Adult Interactions
von: Xu, Anfeng, et al.
Veröffentlicht: (2026)

Cochleagram-based Noise Adapted Speaker Identification System for Distorted Speech
von: Ahmed, Sabbir, et al.
Veröffentlicht: (2025)

DNN based HRIRs Identification with a Continuously Rotating Speaker Array
von: Ko, Byeong-Yun, et al.
Veröffentlicht: (2025)

On the Role of Spatial Features in Foundation-Model-Based Speaker Diarization
von: Deegen, Marc, et al.
Veröffentlicht: (2026)

Emotion Recognition in Multi-Speaker Conversations through Speaker Identification, Knowledge Distillation, and Hierarchical Fusion
von: Li, Xiao, et al.
Veröffentlicht: (2025)

SpeakerRPL v2: Robust Open-set Speaker Identification through Enhanced Few-shot Foundation Tuning and Model Fusion
von: Chen, Zhiyong, et al.
Veröffentlicht: (2026)

Noise-robust Speech Separation with Fast Generative Correction
von: Wang, Helin, et al.
Veröffentlicht: (2024)

Multi-Label Training for Text-Independent Speaker Identification
von: Xue, Yuqi
Veröffentlicht: (2022)

ARTI-6: Towards Six-dimensional Articulatory Speech Encoding
von: Lee, Jihwan, et al.
Veröffentlicht: (2025)

Enhancing Open-Set Speaker Identification through Rapid Tuning with Speaker Reciprocal Points and Negative Sample
von: Chen, Zhiyong, et al.
Veröffentlicht: (2024)

Reconstruct! Don't Encode: Self-Supervised Representation Reconstruction Loss for High-Intelligibility and Low-Latency Streaming Neural Audio Codec
von: Lee, Junhyeok, et al.
Veröffentlicht: (2026)

Uncertainty Quantification in Machine Learning for Joint Speaker Diarization and Identification
von: McKnight, Simon W., et al.
Veröffentlicht: (2023)

openFEAT: Improving Speaker Identification by Open-set Few-shot Embedding Adaptation with Transformer
von: C, Kishan K, et al.
Veröffentlicht: (2022)

Magnitude and Phase-based Feature Fusion Using Co-attention Mechanism for Speaker recognition
von: Su, Rongfeng, et al.
Veröffentlicht: (2025)

Study on Inter and Intra Speaker Variability in Speaker Recognition
von: Okhotnikov, Anton, et al.
Veröffentlicht: (2024)

Whisper-PMFA: Partial Multi-Scale Feature Aggregation for Speaker Verification using Whisper Models
von: Zhao, Yiyang, et al.
Veröffentlicht: (2024)

Generating Rhythm Game Music with Jukebox
von: Yan, Nicholas
Veröffentlicht: (2023)

DiffAttack: Diffusion-based Timbre-reserved Adversarial Attack in Speaker Identification
von: Wang, Qing, et al.
Veröffentlicht: (2025)

Improved Feature Extraction Network for Neuro-Oriented Target Speaker Extraction
von: Fan, Cunhang, et al.
Veröffentlicht: (2025)

Improving Speaker Representations Using Contrastive Losses on Multi-scale Features
von: Dixit, Satvik, et al.
Veröffentlicht: (2024)

Attacking Voice Anonymization Systems with Augmented Feature and Speaker Identity Difference
von: Zhang, Yanzhe, et al.
Veröffentlicht: (2024)

SCDNet: Self-supervised Learning Feature-based Speaker Change Detection
von: Li, Yue, et al.
Veröffentlicht: (2024)

Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis
von: Fujita, Kenichi, et al.
Veröffentlicht: (2024)

Target Speaker Lipreading by Audio-Visual Self-Distillation Pretraining and Speaker Adaptation
von: Zhang, Jing-Xuan, et al.
Veröffentlicht: (2025)

Joint Optimization of Speaker and Spoof Detectors for Spoofing-Robust Automatic Speaker Verification
von: Kurnaz, Oğuzhan, et al.
Veröffentlicht: (2025)