:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Druart, Lucas, Vielzeuf, Valentin, Estève, Yannick
Format:	Preprint
Veröffentlicht:	2023
Schlagworte:	Computation and Language Artificial Intelligence Audio and Speech Processing Signal Processing
Online-Zugang:	https://arxiv.org/abs/2311.04923
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Investigating Low-Cost LLM Annotation for~Spoken Dialogue Understanding Datasets
von: Druart, Lucas, et al.
Veröffentlicht: (2024)

Automatic Voice Identification after Speech Resynthesis using PPG
von: Gaudier, Thibault, et al.
Veröffentlicht: (2024)

Improved Cross-Lingual Transfer Learning For Automatic Speech Translation
von: Khurana, Sameer, et al.
Veröffentlicht: (2023)

Enhancing Listened Speech Decoding from EEG via Parallel Phoneme Sequence Prediction
von: Lee, Jihwan, et al.
Veröffentlicht: (2025)

MultiGen: Child-Friendly Multilingual Speech Generator with LLMs
von: Gao, Xiaoxue, et al.
Veröffentlicht: (2025)

Tone recognition in low-resource languages of North-East India: peeling the layers of SSL-based speech models
von: Gogoi, Parismita, et al.
Veröffentlicht: (2025)

The Speech-LLM Takes It All: A Truly Fully End-to-End Spoken Dialogue State Tracking Approach
von: Ghazal, Nizar El, et al.
Veröffentlicht: (2025)

TRI-DEP: A Trimodal Comparative Study for Depression Detection Using Speech, Text, and EEG
von: Nurfidausi, Annisaa Fitri, et al.
Veröffentlicht: (2025)

Language Bias in Self-Supervised Learning For Automatic Speech Recognition
von: Storey, Edward, et al.
Veröffentlicht: (2025)

Reading Miscue Detection in Primary School through Automatic Speech Recognition
von: Gao, Lingyun, et al.
Veröffentlicht: (2024)

Optimizing the role of human evaluation in LLM-based spoken document summarization systems
von: Kroll, Margaret, et al.
Veröffentlicht: (2024)

Wavelet GPT: Wavelet Inspired Large Language Models
von: Verma, Prateek
Veröffentlicht: (2024)

ANIRA: An Architecture for Neural Network Inference in Real-Time Audio Applications
von: Ackva, Valentin, et al.
Veröffentlicht: (2025)

Enhancing dysarthria speech feature representation with empirical mode decomposition and Walsh-Hadamard transform
von: Zhu, Ting, et al.
Veröffentlicht: (2023)

Privacy-Preserving End-to-End Full-Duplex Speech Dialogue Models
von: Kuzmin, Nikita, et al.
Veröffentlicht: (2026)

Transferable Adversarial Attacks against ASR
von: Gao, Xiaoxue, et al.
Veröffentlicht: (2024)

Crowdsourced Multilingual Speech Intelligibility Testing
von: Lechler, Laura, et al.
Veröffentlicht: (2024)

Mitigating Intra-Speaker Variability in Diarization with Style-Controllable Speech Augmentation
von: Kim, Miseul, et al.
Veröffentlicht: (2025)

An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS
von: Wang, Xiaofei, et al.
Veröffentlicht: (2024)

Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech
von: Wu, Haibin, et al.
Veröffentlicht: (2024)

FreGrad: Lightweight and Fast Frequency-aware Diffusion Vocoder
von: Nguyen, Tan Dat, et al.
Veröffentlicht: (2024)

StreamVoiceAnon+: Emotion-Preserving Streaming Speaker Anonymization via Frame-Level Acoustic Distillation
von: Kuzmin, Nikita, et al.
Veröffentlicht: (2026)

Text-Aware Adapter for Few-Shot Keyword Spotting
von: Jung, Youngmoon, et al.
Veröffentlicht: (2024)

Neural Spectral Band Generation for Audio Coding
von: Choi, Woongjib, et al.
Veröffentlicht: (2025)

Scattering Transform for Auditory Attention Decoding
von: Pallenberg, René, et al.
Veröffentlicht: (2026)

Relational Proxy Loss for Audio-Text based Keyword Spotting
von: Jung, Youngmoon, et al.
Veröffentlicht: (2024)

Explainable AI in Speaker Recognition -- Making Latent Representations Understandable
von: Xu, Yanze, et al.
Veröffentlicht: (2026)

Voice Mapping of Text-to-Speech Systems: A Metric-Based Approach for Voice Quality Assessment
von: Cai, Huanchen, et al.
Veröffentlicht: (2026)

Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation
von: Kim, Minsu, et al.
Veröffentlicht: (2023)

MSP-Podcast SER Challenge 2024: L'antenne du Ventoux Multimodal Self-Supervised Learning for Speech Emotion Recognition
von: Duret, Jarod, et al.
Veröffentlicht: (2024)

WST-X Series: Wavelet Scattering Transform for Interpretable Speech Deepfake Detection
von: Xuan, Xi, et al.
Veröffentlicht: (2026)

TouchASP: Elastic Automatic Speech Perception that Everyone Can Touch
von: Song, Xingchen, et al.
Veröffentlicht: (2024)

A Large-Scale Evaluation of Speech Foundation Models
von: Yang, Shu-wen, et al.
Veröffentlicht: (2024)

WaveSP-Net: Learnable Wavelet-Domain Sparse Prompt Tuning for Speech Deepfake Detection
von: Xuan, Xi, et al.
Veröffentlicht: (2025)

A Computational Approach to Analyzing Disrupted Language in Schizophrenia: Integrating Surprisal and Coherence Measures
von: Premananth, Gowtham, et al.
Veröffentlicht: (2025)

Neuro2Semantic: A Transfer Learning Framework for Semantic Reconstruction of Continuous Language from Human Intracranial EEG
von: Shams, Siavash, et al.
Veröffentlicht: (2025)

Visual-Aware Speech Recognition for Noisy Scenarios
von: Balaji, Lakshmipathi, et al.
Veröffentlicht: (2025)

A multimodal LLM for the non-invasive decoding of spoken text from brain recordings
von: Hmamouche, Youssef, et al.
Veröffentlicht: (2024)

Do we really need Self-Attention for Streaming Automatic Speech Recognition?
von: Dkhissi, Youness, et al.
Veröffentlicht: (2026)

Compressing Quaternion Convolutional Neural Networks for Audio Classification
von: Singh, Arshdeep, et al.
Veröffentlicht: (2025)