:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Khan, Hania, Khalid, Aleena Fatima, Hassan, Zaryab
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing Artificial Intelligence Sound
Online Access:	https://arxiv.org/abs/2401.09354
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Towards Lightweight Adaptation of Speech Enhancement Models in Real-World Environments
by: Cheng, Longbiao, et al.
Published: (2026)

Transfer Learning-Based Deep Residual Learning for Speech Recognition in Clean and Noisy Environments
by: Djeffal, Noussaiba, et al.
Published: (2025)

End-to-End Real-World Polyphonic Piano Audio-to-Score Transcription with Hierarchical Decoding
by: Zeng, Wei, et al.
Published: (2024)

R2-SVC: Towards Real-World Robust and Expressive Zero-shot Singing Voice Conversion
by: Zheng, Junjie, et al.
Published: (2025)

Deploying UDM Series in Real-Life Stuttered Speech Applications: A Clinical Evaluation Framework
by: Zhang, Eric, et al.
Published: (2025)

Leveraging Spatial Cues from Cochlear Implant Microphones to Efficiently Enhance Speech Separation in Real-World Listening Scenes
by: Olalere, Feyisayo, et al.
Published: (2025)

Notochord: a Flexible Probabilistic Model for Real-Time MIDI Performance
by: Shepardson, Victor, et al.
Published: (2024)

Advances in Intelligent Hearing Aids: Deep Learning Approaches to Selective Noise Cancellation
by: Khan, Haris, et al.
Published: (2025)

A Lightweight and Real-Time Binaural Speech Enhancement Model with Spatial Cues Preservation
by: Wang, Jingyuan, et al.
Published: (2024)

Time-Frequency-Based Attention Cache Memory Model for Real-Time Speech Separation
by: Chen, Guo, et al.
Published: (2025)

OpenSep: Leveraging Large Language Models with Textual Inversion for Open World Audio Separation
by: Mahmud, Tanvir, et al.
Published: (2024)

Content-based Controls For Music Large Language Modeling
by: Lin, Liwei, et al.
Published: (2023)

Self-Supervised Models for Phoneme Recognition: Applications in Children's Speech for Reading Learning
by: Medin, Lucas Block, et al.
Published: (2025)

Audio-Driven Reinforcement Learning for Head-Orientation in Naturalistic Environments
by: Ledder, Wessel, et al.
Published: (2024)

Diffusion Timbre Transfer Via Mutual Information Guided Inpainting
by: Lee, Ching Ho, et al.
Published: (2026)

Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer
by: Wang, Yongqi, et al.
Published: (2023)

Auditory Intelligence: Understanding the World Through Sound
by: Nam, Hyeonuk
Published: (2025)

Application of ASV for Voice Identification after VC and Duration Predictor Improvement in TTS Models
by: Nikolayevich, Borodin Kirill, et al.
Published: (2024)

Mamba-Diffusion Model with Learnable Wavelet for Controllable Symbolic Music Generation
by: Zhang, Jincheng, et al.
Published: (2025)

Incorporating Talker Identity Aids With Improving Speech Recognition in Adversarial Environments
by: Alavilli, Sagarika, et al.
Published: (2024)

Improving Pretrained YAMNet for Enhanced Speech Command Detection via Transfer Learning
by: Lachenani, Sidahmed, et al.
Published: (2025)

LLMs-Integrated Automatic Hate Speech Recognition Using Controllable Text Generation Models
by: Oshima, Ryutaro, et al.
Published: (2026)

A Real-Time Voice Activity Detection Based On Lightweight Neural
by: Jia, Jidong, et al.
Published: (2024)

Real-world Music Plagiarism Detection With Music Segment Transcription System
by: Go, Seonghyeon
Published: (2025)

Wearable Music2Emotion : Assessing Emotions Induced by AI-Generated Music through Portable EEG-fNIRS Fusion
by: Zhao, Sha, et al.
Published: (2025)

Speech Foundation Model Ensembles for the Controlled Singing Voice Deepfake Detection (CtrSVDD) Challenge 2024
by: Guragain, Anmol, et al.
Published: (2024)

GuitarFlow: Realistic Electric Guitar Synthesis From Tablatures via Flow Matching and Style Transfer
by: Loth, Jackson, et al.
Published: (2025)

Hyperdimensional Intelligent Sensing for Efficient Real-Time Audio Processing on Extreme Edge
by: Yun, Sanggeon, et al.
Published: (2025)

Go witheFlow: Real-time Emotion Driven Audio Effects Modulation
by: Dervakos, Edmund, et al.
Published: (2025)

EchoMark: Perceptual Acoustic Environment Transfer with Watermark-Embedded Room Impulse Response
by: Huang, Chenpei, et al.
Published: (2025)

Transferable Adversarial Attacks on Audio Deepfake Detection
by: Farooq, Muhammad Umar, et al.
Published: (2025)

GMP-TL: Gender-augmented Multi-scale Pseudo-label Enhanced Transfer Learning for Speech Emotion Recognition
by: Pan, Yu, et al.
Published: (2024)

Are you sure? Analysing Uncertainty Quantification Approaches for Real-world Speech Emotion Recognition
by: Schrüfer, Oliver, et al.
Published: (2024)

Enhanced Sound Event Localization and Detection in Real 360-degree audio-visual soundscapes
by: Roman, Adrian S., et al.
Published: (2024)

Dialogue in Resonance: An Interactive Music Piece for Piano and Real-Time Automatic Transcription System
by: Bang, Hayeon, et al.
Published: (2025)

RealClass: A Framework for Classroom Speech Simulation with Public Datasets and Game Engines
by: Attia, Ahmed Adel, et al.
Published: (2025)

EGSTalker: Real-Time Audio-Driven Talking Head Generation with Efficient Gaussian Deformation
by: Zhu, Tianheng, et al.
Published: (2025)

Defense Against Synthetic Speech: Real-Time Detection of RVC Voice Conversion Attacks
by: Chinchmalatpure, Prajwal, et al.
Published: (2025)

A Two-Stage Hierarchical Deep Filtering Framework for Real-Time Speech Enhancement
by: Lu, Shenghui, et al.
Published: (2025)

EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech
by: Cho, Deok-Hyeon, et al.
Published: (2024)