:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Foley, Sean, Lee, Jihwan, Huang, Kevin, Shi, Xuan, Lee, Yoonjeong, Goldstein, Louis, Narayanan, Shrikanth
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Sound
Online-Zugang:	https://arxiv.org/abs/2509.14479
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Interpretable Modeling of Articulatory Temporal Dynamics from real-time MRI for Phoneme Recognition
von: Park, Jay, et al.
Veröffentlicht: (2025)

On the Relationship between Accent Strength and Articulatory Features
von: Huang, Kevin, et al.
Veröffentlicht: (2025)

Articulatory Feature Prediction from Surface EMG during Speech Production
von: Lee, Jihwan, et al.
Veröffentlicht: (2025)

Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe
von: Feng, Tiantian, et al.
Veröffentlicht: (2025)

Speech2rtMRI: Speech-Guided Diffusion Model for Real-time MRI Video of the Vocal Tract during Speech
von: Nguyen, Hong, et al.
Veröffentlicht: (2024)

ARTI-6: Towards Six-dimensional Articulatory Speech Encoding
von: Lee, Jihwan, et al.
Veröffentlicht: (2025)

Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits
von: Feng, Tiantian, et al.
Veröffentlicht: (2025)

Audio-visual child-adult speaker classification in dyadic interactions
von: Xu, Anfeng, et al.
Veröffentlicht: (2023)

TI-ASU: Toward Robust Automatic Speech Understanding through Text-to-speech Imputation Against Missing Speech Modality
von: Feng, Tiantian, et al.
Veröffentlicht: (2024)

Toward Fully-End-to-End Listened Speech Decoding from EEG Signals
von: Lee, Jihwan, et al.
Veröffentlicht: (2024)

VoxCog: Towards End-to-End Multilingual Cognitive Impairment Classification through Dialectal Knowledge
von: Feng, Tiantian, et al.
Veröffentlicht: (2026)

Towards disentangling the contributions of articulation and acoustics in multimodal phoneme recognition
von: Foley, Sean, et al.
Veröffentlicht: (2025)

PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Speech Models
von: Feng, Tiantian, et al.
Veröffentlicht: (2023)

Learning-free L2-Accented Speech Generation using Phonological Rules
von: Lertpetchpun, Thanathai, et al.
Veröffentlicht: (2026)

Egocentric Speaker Classification in Child-Adult Dyadic Interactions: From Sensing to Computational Modeling
von: Feng, Tiantian, et al.
Veröffentlicht: (2024)

Towards Interpretable Framework for Neural Audio Codecs via Sparse Autoencoders: A Case Study on Accent Information
von: Wang, Shih-Heng, et al.
Veröffentlicht: (2026)

Multi-channel multi-speaker transformer for speech recognition
von: Yifan, Guo, et al.
Veröffentlicht: (2026)

Multi-speaker Text-to-speech Training with Speaker Anonymized Data
von: Huang, Wen-Chin, et al.
Veröffentlicht: (2024)

LRS-VoxMM: A benchmark for in-the-wild audio-visual speech recognition
von: Kwak, Doyeop, et al.
Veröffentlicht: (2026)

Joint ASR and Speaker Role Tagging with Serialized Output Training
von: Xu, Anfeng, et al.
Veröffentlicht: (2025)

Trade-offs Between Capacity and Robustness in Neural Audio Codecs for Adversarially Robust Speech Recognition
von: Prescott, Jordan, et al.
Veröffentlicht: (2026)

Quantifying the effect of speech pathology on automatic and human speaker verification
von: Halpern, Bence Mark, et al.
Veröffentlicht: (2024)

Speaking Without Sound: Multi-speaker Silent Speech Voicing with Facial Inputs Only
von: Lee, Jaejun, et al.
Veröffentlicht: (2026)

Quantifying Speaker Embedding Phonological Rule Interactions in Accented Speech Synthesis
von: Lertpetchpun, Thanathai, et al.
Veröffentlicht: (2026)

Can Synthetic Audio From Generative Foundation Models Assist Audio Recognition and Speech Modeling?
von: Feng, Tiantian, et al.
Veröffentlicht: (2024)

VorTEX: Various overlap ratio for Target speech EXtraction
von: Oh, Ro-hoon, et al.
Veröffentlicht: (2026)

Affect Decoding in Phonated and Silent Speech Production from Surface EMG
von: Pistrosch, Simon, et al.
Veröffentlicht: (2026)

Online speaker diarization of meetings guided by speech separation
von: Gruttadauria, Elio, et al.
Veröffentlicht: (2024)

An Approach to Simultaneous Acquisition of Real-Time MRI Video, EEG, and Surface EMG for Articulatory, Brain, and Muscle Activity During Speech Production
von: Lee, Jihwan, et al.
Veröffentlicht: (2026)

Developing a Top-tier Framework in Naturalistic Conditions Challenge for Categorized Emotion Prediction: From Speech Foundation Models and Learning Objective to Data Augmentation and Engineering Choices
von: Feng, Tiantian, et al.
Veröffentlicht: (2025)

End-to-end multi-channel speaker extraction and binaural speech synthesis
von: Chi, Cheng, et al.
Veröffentlicht: (2024)

Examining Test-Time Adaptation for Personalized Child Speech Recognition
von: Shi, Zhonghao, et al.
Veröffentlicht: (2024)

Emotion-Aligned Contrastive Learning Between Images and Music
von: Stewart, Shanti, et al.
Veröffentlicht: (2023)

Phone Duration Modeling for Speaker Age Estimation in Children
von: Shivakumar, Prashanth Gurunath, et al.
Veröffentlicht: (2021)

voice2mode: Phonation Mode Classification in Singing using Self-Supervised Speech Models
von: Justus, Aju Ani, et al.
Veröffentlicht: (2026)

ModalityMirror: Improving Audio Classification in Modality Heterogeneity Federated Learning with Multimodal Distillation
von: Feng, Tiantian, et al.
Veröffentlicht: (2024)

Thinking in cocktail party: Chain-of-Thought and reinforcement learning for target speaker automatic speech recognition
von: Zhang, Yiru, et al.
Veröffentlicht: (2025)

VoxEmo: Benchmarking Speech Emotion Recognition with Speech LLMs
von: Zhang, Hezhao, et al.
Veröffentlicht: (2026)

End-to-End Joint ASR and Speaker Role Diarization with Child-Adult Interactions
von: Xu, Anfeng, et al.
Veröffentlicht: (2026)

VoxCare: Studying Natural Communication Behaviors of Hospital Caregivers through Wearable Sensing of Egocentric Audio
von: Feng, Tiantian, et al.
Veröffentlicht: (2026)