:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Fayet, Mateo
Format:	Preprint
Published:	2024
Subjects:	Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2404.10578
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Multichannel Voice Trigger Detection Based on Transform-average-concatenate
by: Higuchi, Takuya, et al.
Published: (2023)

THAI Speech Emotion Recognition (THAI-SER) corpus
by: Wongpithayadisai, Jilamika, et al.
Published: (2025)

A framework of text-dependent speaker verification for chinese numerical string corpus
by: Zheng, Litong, et al.
Published: (2024)

Building speech corpus with diverse voice characteristics for its prompt-based representation
by: Watanabe, Aya, et al.
Published: (2024)

Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context
by: Kang, Wei, et al.
Published: (2023)

A tunable binaural audio telepresence system capable of balancing immersive and enhanced modes
by: Hsu, Yicheng, et al.
Published: (2024)

DB3V: A Dialect Dominated Dataset of Bird Vocalisation for Cross-corpus Bird Species Recognition
by: Jing, Xin, et al.
Published: (2024)

Réduire le bruit grâce à la réalité augmentée sonore -- Auditory Concealer
by: Boukhemia, Clara
Published: (2025)

Advancing automatic speech recognition using feature fusion with self-supervised learning features: A case study on Fearless Steps Apollo corpus
by: Chen, Szu-Jui, et al.
Published: (2026)

Mathematics of the MML functional quantizer modules for VCV Rack software synthesizer
by: Schneider, Maxwell, et al.
Published: (2024)

Déréverbération non-supervisée de la parole par modèle hybride
by: Bahrman, Louis, et al.
Published: (2025)

Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networks
by: Beguš, Gašper, et al.
Published: (2023)

Accent conversion using discrete units with parallel data synthesized from controllable accented TTS
by: Nguyen, Tuan Nam, et al.
Published: (2024)

Del Visual al Auditivo: Sonorización de Escenas Guiada por Imagen
by: Sánchez, María, et al.
Published: (2024)

Audio synthesizer inversion in symmetric parameter spaces with approximately equivariant flow matching
by: Hayes, Ben, et al.
Published: (2025)

A corpus-based investigation of pitch contours of monosyllabic words in conversational Taiwan Mandarin
by: Jin, Xiaoyun, et al.
Published: (2024)

HypR: A comprehensive study for ASR hypothesis revising with a reference corpus
by: Wang, Yi-Wei, et al.
Published: (2023)

dLLM-ASR: A Faster Diffusion LLM-based Framework for Speech Recognition
by: Tian, Wenjie, et al.
Published: (2026)

The Greek podcast corpus: Competitive speech models for low-resourced languages with weakly supervised data
by: Paraskevopoulos, Georgios, et al.
Published: (2024)

dCoNNear: An Artifact-Free Neural Network Architecture for Closed-loop Audio Signal Processing
by: Wen, Chuan, et al.
Published: (2025)

NeuroVoz: a Castillian Spanish corpus of parkinsonian speech
by: Mendes-Laureano, Janaína, et al.
Published: (2024)

Filtro Adaptativo y Modulo de Grabacion en Dispositivo Para Mejora en la Calidad de Audicion
by: Torres, Carlos Elihu Palomino, et al.
Published: (2025)

Weighted Cross-entropy for Low-Resource Languages in Multilingual Speech Recognition
by: Piñeiro-Martín, Andrés, et al.
Published: (2024)

Retrieval-Augmented Approach for Unsupervised Anomalous Sound Detection and Captioning without Model Training
by: Ogura, Ryoya, et al.
Published: (2024)

Personalized Voice Synthesis through Human-in-the-Loop Coordinate Descent
by: Tian, Yusheng, et al.
Published: (2024)

BERP: A Blind Estimator of Room Parameters for Single-Channel Noisy Speech Signals
by: Wang, Lijun, et al.
Published: (2024)

Language-Codec: Bridging Discrete Codec Representations and Speech Language Models
by: Ji, Shengpeng, et al.
Published: (2024)

A Generalist Audio Foundation Model for Comprehensive Body Sound Auscultation
by: Wang, Pingjie, et al.
Published: (2024)

Uncovering the Visual Contribution in Audio-Visual Speech Recognition
by: Lin, Zhaofeng, et al.
Published: (2024)

Leveraging Sound Source Trajectories for Universal Sound Separation
by: Wu, Donghang, et al.
Published: (2024)

Adapting General Disentanglement-Based Speaker Anonymization for Enhanced Emotion Preservation
by: Miao, Xiaoxiao, et al.
Published: (2024)

Audiovisual angle and voice incongruence do not affect audiovisual verbal short-term memory in virtual reality
by: Ermert, Cosima A., et al.
Published: (2024)

I2TTS: Image-indicated Immersive Text-to-speech Synthesis with Spatial Perception
by: Zhang, Jiawei, et al.
Published: (2024)

PSELDNets: Pre-trained Neural Networks on a Large-scale Synthetic Dataset for Sound Event Localization and Detection
by: Hu, Jinbo, et al.
Published: (2024)

CA-MHFA: A Context-Aware Multi-Head Factorized Attentive Pooling for SSL-Based Speaker Verification
by: Peng, Junyi, et al.
Published: (2024)

Spatial-Temporal Activity-Informed Diarization and Separation
by: Hsu, Yicheng, et al.
Published: (2024)

An Intra-BRNN and GB-RVQ Based END-TO-END Neural Audio Codec
by: Xu, Linping, et al.
Published: (2024)

The role of direct sound spherical harmonics representation in externalization using binaural reproduction
by: Miller, Eran, et al.
Published: (2024)

Plugin Speech Enhancement: A Universal Speech Enhancement Framework Inspired by Dynamic Neural Network
by: Chen, Yanan, et al.
Published: (2024)

SICRN: Advancing Speech Enhancement through State Space Model and Inplace Convolution Techniques
by: Zhao, Changjiang, et al.
Published: (2024)