:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Kahsu, Ataklti, Teferra, Solomon
Natura:	Preprint
Pubblicazione:	2023
Soggetti:	Audio and Speech Processing Machine Learning Sound 68T50 (Primary) H.1.2
Accesso online:	https://arxiv.org/abs/2402.04254
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

A Unified Model For Voice and Accent Conversion In Speech and Singing using Self-Supervised Learning and Feature Extraction
di: Cheripally, Sowmya
Pubblicazione: (2024)

Real-time CARFAC Cochlea Model Acceleration on FPGA for Underwater Acoustic Sensing Systems
di: Bremer, Bram, et al.
Pubblicazione: (2025)

Training for Speech Recognition on Coprocessors
di: Baunsgaard, Sebastian, et al.
Pubblicazione: (2020)

Quantization for OpenAI's Whisper Models: A Comparative Analysis
di: Andreyev, Allison
Pubblicazione: (2025)

SonicVerse: Multi-Task Learning for Music Feature-Informed Captioning
di: Chopra, Anuradha, et al.
Pubblicazione: (2025)

Toward Low-Latency End-to-End Voice Agents for Telecommunications Using Streaming ASR, Quantized LLMs, and Real-Time TTS
di: Ethiraj, Vignesh, et al.
Pubblicazione: (2025)

Spectral oversubtraction? An approach for speech enhancement after robot ego speech filtering in semi-real-time
di: Li, Yue, et al.
Pubblicazione: (2024)

Benchmarking Foundation Speech and Language Models for Alzheimer's Disease and Related Dementia Detection from Spontaneous Speech
di: Li, Jingyu, et al.
Pubblicazione: (2025)

Deep Feed-Forward Neural Network for Bangla Isolated Speech Recognition
di: Bhadra, Dipayan, et al.
Pubblicazione: (2025)

Splitformer: An improved early-exit architecture for automatic speech recognition on edge devices
di: Lasbordes, Maxence, et al.
Pubblicazione: (2025)

SARA: Stress Test Reasoning in Audio Deepfake Detection
di: Nguyen, Binh, et al.
Pubblicazione: (2026)

Improving Speech Recognition Accuracy Using Custom Language Models with the Vosk Toolkit
di: Soni, Aniket Abhishek
Pubblicazione: (2025)

Monaural Multi-Speaker Speech Separation Using Efficient Transformer Model
di: Rijal, S., et al.
Pubblicazione: (2023)

Are Music Foundation Models Better at Singing Voice Deepfake Detection? Far-Better Fuse them with Speech Foundation Models
di: Phukan, Orchid Chetia, et al.
Pubblicazione: (2024)

HELIX: Scaling Raw Audio Understanding with Hybrid Mamba-Attention Beyond the Quadratic Limit
di: Khushiyant, et al.
Pubblicazione: (2026)

Detecting Check-Worthy Claims in Political Debates, Speeches, and Interviews Using Audio Data
di: Ivanov, Petar, et al.
Pubblicazione: (2023)

Developing Acoustic Models for Automatic Speech Recognition in Swedish
di: Salvi, Giampiero
Pubblicazione: (2024)

Unified speech and gesture synthesis using flow matching
di: Mehta, Shivam, et al.
Pubblicazione: (2023)

SeamlessEdit: Background Noise Aware Zero-Shot Speech Editing with in-Context Enhancement
di: Chen, Kuan-Yu, et al.
Pubblicazione: (2025)

Score Distillation Sampling for Audio: Source Separation, Synthesis, and Beyond
di: Richter-Powell, Jessie, et al.
Pubblicazione: (2025)

Multi-blank Transducers for Speech Recognition
di: Xu, Hainan, et al.
Pubblicazione: (2022)

Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis
di: Jiang, Xilin, et al.
Pubblicazione: (2024)

Test-Time Adaptation for Speech Emotion Recognition
di: Dong, Jiaheng, et al.
Pubblicazione: (2026)

Drax: Speech Recognition with Discrete Flow Matching
di: Navon, Aviv, et al.
Pubblicazione: (2025)

Adapting WavLM for Speech Emotion Recognition
di: Diatlova, Daria, et al.
Pubblicazione: (2024)

Keyword-Guided Adaptation of Automatic Speech Recognition
di: Shamsian, Aviv, et al.
Pubblicazione: (2024)

SeQuiFi: Mitigating Catastrophic Forgetting in Speech Emotion Recognition with Sequential Class-Finetuning
di: Jain, Sarthak, et al.
Pubblicazione: (2024)

AURA: Agent for Understanding, Reasoning, and Automated Tool Use in Voice-Driven Tasks
di: Maben, Leander Melroy, et al.
Pubblicazione: (2025)

UniGlyph: A Seven-Segment Script for Universal Language Representation
di: Sherin, G. V. Bency, et al.
Pubblicazione: (2024)

CogniVoice: Multimodal and Multilingual Fusion Networks for Mild Cognitive Impairment Assessment from Spontaneous Speech
di: Cheng, Jiali, et al.
Pubblicazione: (2024)

Self-supervised Pretraining for Robust Personalized Voice Activity Detection in Adverse Conditions
di: Bovbjerg, Holger Severin, et al.
Pubblicazione: (2023)

Noise-Robust Keyword Spotting through Self-supervised Pretraining
di: Mørk, Jacob, et al.
Pubblicazione: (2024)

Learning Robust Spatial Representations from Binaural Audio through Feature Distillation
di: Bovbjerg, Holger Severin, et al.
Pubblicazione: (2025)

Noise-Robust Target-Speaker Voice Activity Detection Through Self-Supervised Pretraining
di: Bovbjerg, Holger Severin, et al.
Pubblicazione: (2025)

Enhancing Speech Emotion Recognition Through Differentiable Architecture Search
di: Rajapakshe, Thejan, et al.
Pubblicazione: (2023)

TRNet: Two-level Refinement Network leveraging Speech Enhancement for Noise Robust Speech Emotion Recognition
di: Chen, Chengxin, et al.
Pubblicazione: (2024)

Parameter Efficient Finetuning for Speech Emotion Recognition and Domain Adaptation
di: Lashkarashvili, Nineli, et al.
Pubblicazione: (2024)

STAR: Speech-to-Audio Generation via Representation Learning
di: Xie, Zeyu, et al.
Pubblicazione: (2025)

Comparison of parameters of vowel sounds of russian and english languages
di: Fedoseev, V. I., et al.
Pubblicazione: (2024)

Transcription-Free Fine-Tuning of Speech Separation Models for Noisy and Reverberant Multi-Speaker Automatic Speech Recognition
di: Ravenscroft, William, et al.
Pubblicazione: (2024)