:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Khurana, Sameer, Dawalatabad, Nauman, Laurent, Antoine, Vicente, Luis, Gimeno, Pablo, Mingote, Victoria, Glass, James
Format:	Preprint
Published:	2023
Subjects:	Computation and Language Artificial Intelligence Audio and Speech Processing Signal Processing
Online Access:	https://arxiv.org/abs/2306.00789
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Automatic Prediction of Amyotrophic Lateral Sclerosis Progression using Longitudinal Speech Transformer
by: Wang, Liming, et al.
Published: (2024)

Speak in the Scene: Diffusion-based Acoustic Scene Transfer toward Immersive Speech Generation
by: Kim, Miseul, et al.
Published: (2024)

Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation
by: Kim, Minsu, et al.
Published: (2023)

SpeechMLC: Speech Multi-label Classification
by: Kim, Miseul, et al.
Published: (2025)

On Improving Error Resilience of Neural End-to-End Speech Coders
by: Gupta, Kishan, et al.
Published: (2024)

Late Fusion and Multi-Level Fission Amplify Cross-Modal Transfer in Text-Speech LMs
by: Cuervo, Santiago, et al.
Published: (2025)

TouchASP: Elastic Automatic Speech Perception that Everyone Can Touch
by: Song, Xingchen, et al.
Published: (2024)

A Speech Production Model for Radar: Connecting Speech Acoustics with Radar-Measured Vibrations
by: Lenz, Isabella, et al.
Published: (2025)

ParaS2S: Benchmarking and Aligning Spoken Language Models for Paralinguistic-aware Speech-to-Speech Interaction
by: Yang, Shu-wen, et al.
Published: (2025)

Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications
by: Wills, Simone, et al.
Published: (2023)

Binaural Localization Model for Speech in Noise
by: Tokala, Vikas, et al.
Published: (2025)

Speech-Based Prioritization for Schizophrenia Intervention
by: Premananth, Gowtham, et al.
Published: (2025)

Prompt-driven Target Speech Diarization
by: Jiang, Yidi, et al.
Published: (2023)

Brain-Informed Speech Separation for Cochlear Implants
by: Gajecki, Tom, et al.
Published: (2026)

Speech Enhancement based on cascaded two flows
by: Lee, Seonggyu, et al.
Published: (2025)

CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
by: Kim, Ji-Hoon, et al.
Published: (2024)

Advanced Signal Analysis in Detecting Replay Attacks for Automatic Speaker Verification Systems
by: Kuang, Lee Shih
Published: (2024)

FlowSE: Flow Matching-based Speech Enhancement
by: Lee, Seonggyu, et al.
Published: (2025)

Bottleneck Transformer-Based Approach for Improved Automatic STOI Score Prediction
by: Amartyaveer, et al.
Published: (2026)

Towards Improved Objective Perceptual Audio Quality Assessment -- Part 1: A Novel Data-Driven Cognitive Model
by: Delgado, Pablo M., et al.
Published: (2024)

Harmonics to the Rescue: Why Voiced Speech is Not a Wss Process
by: Bologni, Giovanni, et al.
Published: (2025)

SELM: Speech Enhancement Using Discrete Tokens and Language Models
by: Wang, Ziqian, et al.
Published: (2023)

Binaural Speech Enhancement Using Complex Convolutional Recurrent Networks
by: Tokala, Vikas, et al.
Published: (2025)

Analyzing the Impact of Accent on English Speech: Acoustic and Articulatory Perspectives
by: Premananth, Gowtham, et al.
Published: (2025)

The Overview of Segmental Durations Modification Algorithms on Speech Signal Characteristics
by: Jang, Kyeomeun, et al.
Published: (2025)

USDnet: Unsupervised Speech Dereverberation via Neural Forward Filtering
by: Wang, Zhong-Qiu
Published: (2024)

TTSlow: Slow Down Text-to-Speech with Efficiency Robustness Evaluations
by: Gao, Xiaoxue, et al.
Published: (2024)

Parameter-Efficient Fine-Tuning of Foundation Models for CLP Speech Classification
by: Bhattacharjee, Susmita, et al.
Published: (2025)

Exploring Disentangled Neural Speech Codecs from Self-Supervised Representations
by: Aihara, Ryo, et al.
Published: (2025)

Impact of Microphone Array Mismatches to Learning-based Replay Speech Detection
by: Neri, Michael, et al.
Published: (2025)

DeFTAN-II: Efficient Multichannel Speech Enhancement with Subgroup Processing
by: Lee, Dongheon, et al.
Published: (2023)

Multi-channel Replay Speech Detection using an Adaptive Learnable Beamformer
by: Neri, Michael, et al.
Published: (2025)

Entropy-Guided GRVQ for Ultra-Low Bitrate Neural Speech Codec
by: Ren, Yanzhou, et al.
Published: (2026)

BanglaNum -- A Public Dataset for Bengali Digit Recognition from Speech
by: Mohammad, Mir Sayeed, et al.
Published: (2024)

Mixture to Mixture: Leveraging Close-talk Mixtures as Weak-supervision for Speech Separation
by: Wang, Zhong-Qiu
Published: (2024)

FlowSE: Efficient and High-Quality Speech Enhancement via Flow Matching
by: Wang, Ziqian, et al.
Published: (2025)

HyBeam: Hybrid Microphone-Beamforming Array-Agnostic Speech Enhancement for Wearables
by: Ilan, Yuval Bar, et al.
Published: (2025)

Automatic Voice Classification Of Autistic Subjects
by: Vacca, Jessica, et al.
Published: (2024)

Zero-Bit Transmission of Adaptive Pre- and De-emphasis Filters for Speech and Audio Coding
by: Piralideh, Niloofar Omidi, et al.
Published: (2024)

AffectSpeech: A Large-Scale Emotional Speech Dataset with Fine-Grained Textual Descriptions for Speech Emotion Captioning and Synthesis
by: Qi, Tianhua, et al.
Published: (2026)