:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Jalbert-Desforges, Fred
Format:	Preprint
Veröffentlicht:	2026
Schlagworte:	Sound Audio and Speech Processing Applications 94A17, 68T10 H.5.5; I.2.7; J.5
Online-Zugang:	https://arxiv.org/abs/2605.06685
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

SeamlessEdit: Background Noise Aware Zero-Shot Speech Editing with in-Context Enhancement
von: Chen, Kuan-Yu, et al.
Veröffentlicht: (2025)

Make Some Noise: Towards LLM audio reasoning and generation using sound tokens
von: Mehta, Shivam, et al.
Veröffentlicht: (2025)

Window Size Versus Accuracy Experiments in Voice Activity Detectors
von: McKinnon, Max, et al.
Veröffentlicht: (2026)

Generation of Musical Timbres using a Text-Guided Diffusion Model
von: Yuan, Weixuan, et al.
Veröffentlicht: (2025)

Self-Improvement for Audio Large Language Model using Unlabeled Speech
von: Wang, Shaowen, et al.
Veröffentlicht: (2025)

MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion
von: Li, Pengcheng, et al.
Veröffentlicht: (2024)

ParaNoise-SV: Integrated Approach for Noise-Robust Speaker Verification with Parallel Joint Learning of Speech Enhancement and Noise Extraction
von: Kim, Minu, et al.
Veröffentlicht: (2025)

SemAlignVC: Enhancing zero-shot timbre conversion using semantic alignment
von: Mehta, Shivam, et al.
Veröffentlicht: (2025)

Guitar Tone Morphing by Diffusion-based Model
von: Chen, Kuan-Yu, et al.
Veröffentlicht: (2025)

Bloodroot: When Watermarking Turns Poisonous For Stealthy Backdoor
von: Chen, Kuan-Yu, et al.
Veröffentlicht: (2025)

SFMS-ALR: Script-First Multilingual Speech Synthesis with Adaptive Locale Resolution
von: Donepudi, Dharma Teja
Veröffentlicht: (2025)

Taming Audio VAEs via Target-KL Regularization
von: Seetharaman, Prem, et al.
Veröffentlicht: (2026)

Should you use a probabilistic duration model in TTS? Probably! Especially for spontaneous speech
von: Mehta, Shivam, et al.
Veröffentlicht: (2024)

PerceiverS: A Multi-Scale Perceiver with Effective Segmentation for Long-Term Expressive Symbolic Music Generation
von: Yi, Yungang, et al.
Veröffentlicht: (2024)

Improving Cross-Lingual Phonetic Representation of Low-Resource Languages Through Language Similarity Analysis
von: Kim, Minu, et al.
Veröffentlicht: (2025)

AURA: Agent for Understanding, Reasoning, and Automated Tool Use in Voice-Driven Tasks
von: Maben, Leander Melroy, et al.
Veröffentlicht: (2025)

Reciprocal Latent Fields for Precomputed Sound Propagation
von: Seuté, Hugo, et al.
Veröffentlicht: (2026)

Prevailing Research Areas for Music AI in the Era of Foundation Models
von: Wei, Megan, et al.
Veröffentlicht: (2024)

Matcha-TTS: A fast TTS architecture with conditional flow matching
von: Mehta, Shivam, et al.
Veröffentlicht: (2023)

Understanding the Algorithm Behind Audio Key Detection
von: Silva, Henrique Perez G.
Veröffentlicht: (2025)

Deep Feed-Forward Neural Network for Bangla Isolated Speech Recognition
von: Bhadra, Dipayan, et al.
Veröffentlicht: (2025)

Less Stress, More Privacy: Stress Detection on Anonymized Speech of Air Traffic Controllers
von: Viswanathan, Janaki, et al.
Veröffentlicht: (2025)

SonicVerse: Multi-Task Learning for Music Feature-Informed Captioning
von: Chopra, Anuradha, et al.
Veröffentlicht: (2025)

HELIX: Scaling Raw Audio Understanding with Hybrid Mamba-Attention Beyond the Quadratic Limit
von: Khushiyant, et al.
Veröffentlicht: (2026)

STRUM: A Spectral Transcription and Rhythm Understanding Model for End-to-End Generation of Playable Rhythm-Game Charts
von: Opria, Joshua
Veröffentlicht: (2026)

M6(GPT)3: Generating Multitrack Modifiable Multi-Minute MIDI Music from Text using Genetic algorithms, Probabilistic methods and GPT Models in any Progression and Time Signature
von: Poćwiardowski, Jakub, et al.
Veröffentlicht: (2024)

SonicMaster: Towards Controllable All-in-One Music Restoration and Mastering
von: Melechovsky, Jan, et al.
Veröffentlicht: (2025)

Matlab-based Epoch Extraction for Speaker Differentiation
von: Li, Kunlun, et al.
Veröffentlicht: (2024)

Neural Proxies for Sound Synthesizers: Learning Perceptually Informed Preset Representations
von: Combes, Paolo, et al.
Veröffentlicht: (2025)

The evolution of inharmonicity and noisiness in contemporary popular music
von: Deruty, Emmanuel, et al.
Veröffentlicht: (2024)

MaskClip: Detachable Clip-on Piezoelectric Sensing of Mask Surface Vibrations for Real-time Noise-Robust Speech Input
von: Hiraki, Hirotaka, et al.
Veröffentlicht: (2025)

Improving French Synthetic Speech Quality via SSML Prosody Control
von: Ouali, Nassima Ould, et al.
Veröffentlicht: (2025)

Quantization for OpenAI's Whisper Models: A Comparative Analysis
von: Andreyev, Allison
Veröffentlicht: (2025)

OBHS: An Optimized Block Huffman Scheme for Real-Time Audio Compression
von: Mahfi, Muntahi Safwan, et al.
Veröffentlicht: (2025)

Quantum-Enhanced Analysis and Grading of Vocal Performance
von: Agarwal, Rohan
Veröffentlicht: (2025)

Audio Foundation Models Outperform Symbolic Representations for Piano Performance Evaluation
von: Dhiman, Jai
Veröffentlicht: (2026)

Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition
von: Hori, Takaaki, et al.
Veröffentlicht: (2025)

Improving Speech Recognition Accuracy Using Custom Language Models with the Vosk Toolkit
von: Soni, Aniket Abhishek
Veröffentlicht: (2025)

BMdataset: A Musicologically Curated LilyPond Dataset
von: Spanio, Matteo, et al.
Veröffentlicht: (2026)

Score Distillation Sampling for Audio: Source Separation, Synthesis, and Beyond
von: Richter-Powell, Jessie, et al.
Veröffentlicht: (2025)