:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Hauret, Julien, Olivier, Malo, Joubaud, Thomas, Langrenne, Christophe, Poirée, Sarah, Zimpfer, Véronique, Bavu, Éric
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing Machine Learning
Online Access:	https://arxiv.org/abs/2407.11828
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

French Listening Tests for the Assessment of Intelligibility, Quality, and Identity of Body-Conducted Speech Enhancement
by: Joubaud, Thomas, et al.
Published: (2025)

Configurable EBEN: Extreme Bandwidth Extension Network to enhance body-conducted speech capture
by: Hauret, Julien, et al.
Published: (2023)

EBEN: Extreme bandwidth extension network applied to speech signals captured with noise-resilient body-conduction microphones
by: Hauret, Julien, et al.
Published: (2022)

Real-time speech enhancement in noise for throat microphone using neural audio codec as foundation model
by: Hauret, Julien, et al.
Published: (2025)

Bringing Interpretability to Neural Audio Codecs
by: Sadok, Samir, et al.
Published: (2025)

spINAch: A Diachronic Corpus of French Broadcast Speech Controlled for Speakers' Age and Gender
by: Devauchelle, Simon, et al.
Published: (2026)

VoxEffects: A Speech-Oriented Audio Effects Dataset and Benchmark
by: Zhang, Zhe, et al.
Published: (2026)

YODAS: Youtube-Oriented Dataset for Audio and Speech
by: Li, Xinjian, et al.
Published: (2024)

Benchmarking Large Pretrained Multilingual Models on Québec French Speech Recognition
by: Serrand, Coralie, et al.
Published: (2025)

Data Selection Effects on Self-Supervised Learning of Audio Representations for French Audiovisual Broadcasts
by: Pelloin, Valentin, et al.
Published: (2026)

The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event Detection
by: Bibbó, Gabriel, et al.
Published: (2024)

CUEMPATHY: A Counseling Speech Dataset for Psychotherapy Research
by: Tao, Dehua, et al.
Published: (2024)

CodecFake+: A Large-Scale Neural Audio Codec-Based Deepfake Speech Dataset
by: Chen, Xuanjun, et al.
Published: (2025)

ODAQ: Open Dataset of Audio Quality
by: Torcoli, Matteo, et al.
Published: (2023)

Audio-Visual Speech Enhancement for Spatial Audio - Spatial-VisualVoice and the MAVE Database
by: Yaffe, Danielle, et al.
Published: (2025)

WaLi: Can Pressure Sensors in HVAC Systems Capture Human Speech?
by: Tamiti, Tarikul Islam, et al.
Published: (2025)

Interpreting the Role of Visemes in Audio-Visual Speech Recognition
by: Papadopoulos, Aristeidis, et al.
Published: (2025)

A Semi-spontaneous Dutch Speech Dataset for Speech Enhancement and Speech Recognition
by: de Groot, Dimme, et al.
Published: (2026)

Cross-linguistic Prosodic Analysis of Autistic and Non-autistic Child Speech in Finnish, French and Slovak
by: Myllylä, Ida-Lotta, et al.
Published: (2026)

AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language Models
by: Bai, Jisheng, et al.
Published: (2024)

Speech Separation using Neural Audio Codecs with Embedding Loss
by: Yip, Jia Qi, et al.
Published: (2024)

ULTRAS -- Unified Learning of Transformer Representations for Audio and Speech Signals
by: E, Ameenudeen P, et al.
Published: (2026)

Classification of Autistic and Non-Autistic Children's Speech: A Cross-Linguistic Study in Finnish, French, and Slovak
by: Kakouros, Sofoklis, et al.
Published: (2026)

Expanding and Analyzing ODAQ -- the Open Dataset of Audio Quality
by: Dick, Sascha, et al.
Published: (2025)

Enhancing Crowdsourced Audio for Text-to-Speech Models
by: Giraldo, José, et al.
Published: (2024)

SpeechT-RAG: Reliable Depression Detection in LLMs with Retrieval-Augmented Generation Using Speech Timing Information
by: Zhang, Xiangyu, et al.
Published: (2025)

FairASR: Fair Audio Contrastive Learning for Automatic Speech Recognition
by: Kim, Jongsuk, et al.
Published: (2025)

Audio-Visual Feature Synchronization for Robust Speech Enhancement in Hearing Aids
by: Saleem, Nasir, et al.
Published: (2025)

SPEAR: A Unified SSL Framework for Learning Speech and Audio Representations
by: Yang, Xiaoyu, et al.
Published: (2025)

Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
by: Li, Jiaqi, et al.
Published: (2024)

ASPED: An Audio Dataset for Detecting Pedestrians
by: Seshadri, Pavan, et al.
Published: (2023)

Uncovering the Visual Contribution in Audio-Visual Speech Recognition
by: Lin, Zhaofeng, et al.
Published: (2024)

Multimodal Representation Loss Between Timed Text and Audio for Regularized Speech Separation
by: Hsieh, Tsun-An, et al.
Published: (2024)

Tracking Listener Attention: Gaze-Guided Audio-Visual Speech Enhancement Framework
by: Yang, Hsiang-Cheng, et al.
Published: (2026)

Cross-Modal Bottleneck Fusion For Noise Robust Audio-Visual Speech Recognition
by: Ok, Seaone, et al.
Published: (2026)

Rethinking Mamba in Speech Processing by Self-Supervised Models
by: Zhang, Xiangyu, et al.
Published: (2024)

MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis
by: Yang, Qian, et al.
Published: (2024)

SaSLaW: Dialogue Speech Corpus with Audio-visual Egocentric Information Toward Environment-adaptive Dialogue Speech Synthesis
by: Take, Osamu, et al.
Published: (2024)

A Generalist Audio Foundation Model for Comprehensive Body Sound Auscultation
by: Wang, Pingjie, et al.
Published: (2024)

LongCat-Audio-Codec: An Audio Tokenizer and Detokenizer Solution Designed for Speech Large Language Models
by: Zhao, Xiaohan, et al.
Published: (2025)