Saved in:
| Main Author: | Fayet, Mateo |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.10578 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Multichannel Voice Trigger Detection Based on Transform-average-concatenate
by: Higuchi, Takuya, et al.
Published: (2023)
by: Higuchi, Takuya, et al.
Published: (2023)
THAI Speech Emotion Recognition (THAI-SER) corpus
by: Wongpithayadisai, Jilamika, et al.
Published: (2025)
by: Wongpithayadisai, Jilamika, et al.
Published: (2025)
A framework of text-dependent speaker verification for chinese numerical string corpus
by: Zheng, Litong, et al.
Published: (2024)
by: Zheng, Litong, et al.
Published: (2024)
Building speech corpus with diverse voice characteristics for its prompt-based representation
by: Watanabe, Aya, et al.
Published: (2024)
by: Watanabe, Aya, et al.
Published: (2024)
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context
by: Kang, Wei, et al.
Published: (2023)
by: Kang, Wei, et al.
Published: (2023)
A tunable binaural audio telepresence system capable of balancing immersive and enhanced modes
by: Hsu, Yicheng, et al.
Published: (2024)
by: Hsu, Yicheng, et al.
Published: (2024)
DB3V: A Dialect Dominated Dataset of Bird Vocalisation for Cross-corpus Bird Species Recognition
by: Jing, Xin, et al.
Published: (2024)
by: Jing, Xin, et al.
Published: (2024)
Réduire le bruit grâce à la réalité augmentée sonore -- Auditory Concealer
by: Boukhemia, Clara
Published: (2025)
by: Boukhemia, Clara
Published: (2025)
Advancing automatic speech recognition using feature fusion with self-supervised learning features: A case study on Fearless Steps Apollo corpus
by: Chen, Szu-Jui, et al.
Published: (2026)
by: Chen, Szu-Jui, et al.
Published: (2026)
Mathematics of the MML functional quantizer modules for VCV Rack software synthesizer
by: Schneider, Maxwell, et al.
Published: (2024)
by: Schneider, Maxwell, et al.
Published: (2024)
Déréverbération non-supervisée de la parole par modèle hybride
by: Bahrman, Louis, et al.
Published: (2025)
by: Bahrman, Louis, et al.
Published: (2025)
Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networks
by: Beguš, Gašper, et al.
Published: (2023)
by: Beguš, Gašper, et al.
Published: (2023)
Accent conversion using discrete units with parallel data synthesized from controllable accented TTS
by: Nguyen, Tuan Nam, et al.
Published: (2024)
by: Nguyen, Tuan Nam, et al.
Published: (2024)
Del Visual al Auditivo: Sonorización de Escenas Guiada por Imagen
by: Sánchez, María, et al.
Published: (2024)
by: Sánchez, María, et al.
Published: (2024)
Audio synthesizer inversion in symmetric parameter spaces with approximately equivariant flow matching
by: Hayes, Ben, et al.
Published: (2025)
by: Hayes, Ben, et al.
Published: (2025)
A corpus-based investigation of pitch contours of monosyllabic words in conversational Taiwan Mandarin
by: Jin, Xiaoyun, et al.
Published: (2024)
by: Jin, Xiaoyun, et al.
Published: (2024)
HypR: A comprehensive study for ASR hypothesis revising with a reference corpus
by: Wang, Yi-Wei, et al.
Published: (2023)
by: Wang, Yi-Wei, et al.
Published: (2023)
dLLM-ASR: A Faster Diffusion LLM-based Framework for Speech Recognition
by: Tian, Wenjie, et al.
Published: (2026)
by: Tian, Wenjie, et al.
Published: (2026)
The Greek podcast corpus: Competitive speech models for low-resourced languages with weakly supervised data
by: Paraskevopoulos, Georgios, et al.
Published: (2024)
by: Paraskevopoulos, Georgios, et al.
Published: (2024)
dCoNNear: An Artifact-Free Neural Network Architecture for Closed-loop Audio Signal Processing
by: Wen, Chuan, et al.
Published: (2025)
by: Wen, Chuan, et al.
Published: (2025)
NeuroVoz: a Castillian Spanish corpus of parkinsonian speech
by: Mendes-Laureano, Janaína, et al.
Published: (2024)
by: Mendes-Laureano, Janaína, et al.
Published: (2024)
Filtro Adaptativo y Modulo de Grabacion en Dispositivo Para Mejora en la Calidad de Audicion
by: Torres, Carlos Elihu Palomino, et al.
Published: (2025)
by: Torres, Carlos Elihu Palomino, et al.
Published: (2025)
Weighted Cross-entropy for Low-Resource Languages in Multilingual Speech Recognition
by: Piñeiro-Martín, Andrés, et al.
Published: (2024)
by: Piñeiro-Martín, Andrés, et al.
Published: (2024)
Retrieval-Augmented Approach for Unsupervised Anomalous Sound Detection and Captioning without Model Training
by: Ogura, Ryoya, et al.
Published: (2024)
by: Ogura, Ryoya, et al.
Published: (2024)
Personalized Voice Synthesis through Human-in-the-Loop Coordinate Descent
by: Tian, Yusheng, et al.
Published: (2024)
by: Tian, Yusheng, et al.
Published: (2024)
BERP: A Blind Estimator of Room Parameters for Single-Channel Noisy Speech Signals
by: Wang, Lijun, et al.
Published: (2024)
by: Wang, Lijun, et al.
Published: (2024)
Language-Codec: Bridging Discrete Codec Representations and Speech Language Models
by: Ji, Shengpeng, et al.
Published: (2024)
by: Ji, Shengpeng, et al.
Published: (2024)
A Generalist Audio Foundation Model for Comprehensive Body Sound Auscultation
by: Wang, Pingjie, et al.
Published: (2024)
by: Wang, Pingjie, et al.
Published: (2024)
Uncovering the Visual Contribution in Audio-Visual Speech Recognition
by: Lin, Zhaofeng, et al.
Published: (2024)
by: Lin, Zhaofeng, et al.
Published: (2024)
Leveraging Sound Source Trajectories for Universal Sound Separation
by: Wu, Donghang, et al.
Published: (2024)
by: Wu, Donghang, et al.
Published: (2024)
Adapting General Disentanglement-Based Speaker Anonymization for Enhanced Emotion Preservation
by: Miao, Xiaoxiao, et al.
Published: (2024)
by: Miao, Xiaoxiao, et al.
Published: (2024)
Audiovisual angle and voice incongruence do not affect audiovisual verbal short-term memory in virtual reality
by: Ermert, Cosima A., et al.
Published: (2024)
by: Ermert, Cosima A., et al.
Published: (2024)
I2TTS: Image-indicated Immersive Text-to-speech Synthesis with Spatial Perception
by: Zhang, Jiawei, et al.
Published: (2024)
by: Zhang, Jiawei, et al.
Published: (2024)
PSELDNets: Pre-trained Neural Networks on a Large-scale Synthetic Dataset for Sound Event Localization and Detection
by: Hu, Jinbo, et al.
Published: (2024)
by: Hu, Jinbo, et al.
Published: (2024)
CA-MHFA: A Context-Aware Multi-Head Factorized Attentive Pooling for SSL-Based Speaker Verification
by: Peng, Junyi, et al.
Published: (2024)
by: Peng, Junyi, et al.
Published: (2024)
Spatial-Temporal Activity-Informed Diarization and Separation
by: Hsu, Yicheng, et al.
Published: (2024)
by: Hsu, Yicheng, et al.
Published: (2024)
An Intra-BRNN and GB-RVQ Based END-TO-END Neural Audio Codec
by: Xu, Linping, et al.
Published: (2024)
by: Xu, Linping, et al.
Published: (2024)
The role of direct sound spherical harmonics representation in externalization using binaural reproduction
by: Miller, Eran, et al.
Published: (2024)
by: Miller, Eran, et al.
Published: (2024)
Plugin Speech Enhancement: A Universal Speech Enhancement Framework Inspired by Dynamic Neural Network
by: Chen, Yanan, et al.
Published: (2024)
by: Chen, Yanan, et al.
Published: (2024)
SICRN: Advancing Speech Enhancement through State Space Model and Inplace Convolution Techniques
by: Zhao, Changjiang, et al.
Published: (2024)
by: Zhao, Changjiang, et al.
Published: (2024)
Similar Items
-
Multichannel Voice Trigger Detection Based on Transform-average-concatenate
by: Higuchi, Takuya, et al.
Published: (2023) -
THAI Speech Emotion Recognition (THAI-SER) corpus
by: Wongpithayadisai, Jilamika, et al.
Published: (2025) -
A framework of text-dependent speaker verification for chinese numerical string corpus
by: Zheng, Litong, et al.
Published: (2024) -
Building speech corpus with diverse voice characteristics for its prompt-based representation
by: Watanabe, Aya, et al.
Published: (2024) -
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context
by: Kang, Wei, et al.
Published: (2023)