Saved in:
| Main Authors: | Khan, Hania, Khalid, Aleena Fatima, Hassan, Zaryab |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2401.09354 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Towards Lightweight Adaptation of Speech Enhancement Models in Real-World Environments
by: Cheng, Longbiao, et al.
Published: (2026)
by: Cheng, Longbiao, et al.
Published: (2026)
Transfer Learning-Based Deep Residual Learning for Speech Recognition in Clean and Noisy Environments
by: Djeffal, Noussaiba, et al.
Published: (2025)
by: Djeffal, Noussaiba, et al.
Published: (2025)
End-to-End Real-World Polyphonic Piano Audio-to-Score Transcription with Hierarchical Decoding
by: Zeng, Wei, et al.
Published: (2024)
by: Zeng, Wei, et al.
Published: (2024)
R2-SVC: Towards Real-World Robust and Expressive Zero-shot Singing Voice Conversion
by: Zheng, Junjie, et al.
Published: (2025)
by: Zheng, Junjie, et al.
Published: (2025)
Deploying UDM Series in Real-Life Stuttered Speech Applications: A Clinical Evaluation Framework
by: Zhang, Eric, et al.
Published: (2025)
by: Zhang, Eric, et al.
Published: (2025)
Leveraging Spatial Cues from Cochlear Implant Microphones to Efficiently Enhance Speech Separation in Real-World Listening Scenes
by: Olalere, Feyisayo, et al.
Published: (2025)
by: Olalere, Feyisayo, et al.
Published: (2025)
Notochord: a Flexible Probabilistic Model for Real-Time MIDI Performance
by: Shepardson, Victor, et al.
Published: (2024)
by: Shepardson, Victor, et al.
Published: (2024)
Advances in Intelligent Hearing Aids: Deep Learning Approaches to Selective Noise Cancellation
by: Khan, Haris, et al.
Published: (2025)
by: Khan, Haris, et al.
Published: (2025)
A Lightweight and Real-Time Binaural Speech Enhancement Model with Spatial Cues Preservation
by: Wang, Jingyuan, et al.
Published: (2024)
by: Wang, Jingyuan, et al.
Published: (2024)
Time-Frequency-Based Attention Cache Memory Model for Real-Time Speech Separation
by: Chen, Guo, et al.
Published: (2025)
by: Chen, Guo, et al.
Published: (2025)
OpenSep: Leveraging Large Language Models with Textual Inversion for Open World Audio Separation
by: Mahmud, Tanvir, et al.
Published: (2024)
by: Mahmud, Tanvir, et al.
Published: (2024)
Content-based Controls For Music Large Language Modeling
by: Lin, Liwei, et al.
Published: (2023)
by: Lin, Liwei, et al.
Published: (2023)
Self-Supervised Models for Phoneme Recognition: Applications in Children's Speech for Reading Learning
by: Medin, Lucas Block, et al.
Published: (2025)
by: Medin, Lucas Block, et al.
Published: (2025)
Audio-Driven Reinforcement Learning for Head-Orientation in Naturalistic Environments
by: Ledder, Wessel, et al.
Published: (2024)
by: Ledder, Wessel, et al.
Published: (2024)
Diffusion Timbre Transfer Via Mutual Information Guided Inpainting
by: Lee, Ching Ho, et al.
Published: (2026)
by: Lee, Ching Ho, et al.
Published: (2026)
Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer
by: Wang, Yongqi, et al.
Published: (2023)
by: Wang, Yongqi, et al.
Published: (2023)
Auditory Intelligence: Understanding the World Through Sound
by: Nam, Hyeonuk
Published: (2025)
by: Nam, Hyeonuk
Published: (2025)
Application of ASV for Voice Identification after VC and Duration Predictor Improvement in TTS Models
by: Nikolayevich, Borodin Kirill, et al.
Published: (2024)
by: Nikolayevich, Borodin Kirill, et al.
Published: (2024)
Mamba-Diffusion Model with Learnable Wavelet for Controllable Symbolic Music Generation
by: Zhang, Jincheng, et al.
Published: (2025)
by: Zhang, Jincheng, et al.
Published: (2025)
Incorporating Talker Identity Aids With Improving Speech Recognition in Adversarial Environments
by: Alavilli, Sagarika, et al.
Published: (2024)
by: Alavilli, Sagarika, et al.
Published: (2024)
Improving Pretrained YAMNet for Enhanced Speech Command Detection via Transfer Learning
by: Lachenani, Sidahmed, et al.
Published: (2025)
by: Lachenani, Sidahmed, et al.
Published: (2025)
LLMs-Integrated Automatic Hate Speech Recognition Using Controllable Text Generation Models
by: Oshima, Ryutaro, et al.
Published: (2026)
by: Oshima, Ryutaro, et al.
Published: (2026)
A Real-Time Voice Activity Detection Based On Lightweight Neural
by: Jia, Jidong, et al.
Published: (2024)
by: Jia, Jidong, et al.
Published: (2024)
Real-world Music Plagiarism Detection With Music Segment Transcription System
by: Go, Seonghyeon
Published: (2025)
by: Go, Seonghyeon
Published: (2025)
Wearable Music2Emotion : Assessing Emotions Induced by AI-Generated Music through Portable EEG-fNIRS Fusion
by: Zhao, Sha, et al.
Published: (2025)
by: Zhao, Sha, et al.
Published: (2025)
Speech Foundation Model Ensembles for the Controlled Singing Voice Deepfake Detection (CtrSVDD) Challenge 2024
by: Guragain, Anmol, et al.
Published: (2024)
by: Guragain, Anmol, et al.
Published: (2024)
GuitarFlow: Realistic Electric Guitar Synthesis From Tablatures via Flow Matching and Style Transfer
by: Loth, Jackson, et al.
Published: (2025)
by: Loth, Jackson, et al.
Published: (2025)
Hyperdimensional Intelligent Sensing for Efficient Real-Time Audio Processing on Extreme Edge
by: Yun, Sanggeon, et al.
Published: (2025)
by: Yun, Sanggeon, et al.
Published: (2025)
Go witheFlow: Real-time Emotion Driven Audio Effects Modulation
by: Dervakos, Edmund, et al.
Published: (2025)
by: Dervakos, Edmund, et al.
Published: (2025)
EchoMark: Perceptual Acoustic Environment Transfer with Watermark-Embedded Room Impulse Response
by: Huang, Chenpei, et al.
Published: (2025)
by: Huang, Chenpei, et al.
Published: (2025)
Transferable Adversarial Attacks on Audio Deepfake Detection
by: Farooq, Muhammad Umar, et al.
Published: (2025)
by: Farooq, Muhammad Umar, et al.
Published: (2025)
GMP-TL: Gender-augmented Multi-scale Pseudo-label Enhanced Transfer Learning for Speech Emotion Recognition
by: Pan, Yu, et al.
Published: (2024)
by: Pan, Yu, et al.
Published: (2024)
Are you sure? Analysing Uncertainty Quantification Approaches for Real-world Speech Emotion Recognition
by: Schrüfer, Oliver, et al.
Published: (2024)
by: Schrüfer, Oliver, et al.
Published: (2024)
Enhanced Sound Event Localization and Detection in Real 360-degree audio-visual soundscapes
by: Roman, Adrian S., et al.
Published: (2024)
by: Roman, Adrian S., et al.
Published: (2024)
Dialogue in Resonance: An Interactive Music Piece for Piano and Real-Time Automatic Transcription System
by: Bang, Hayeon, et al.
Published: (2025)
by: Bang, Hayeon, et al.
Published: (2025)
RealClass: A Framework for Classroom Speech Simulation with Public Datasets and Game Engines
by: Attia, Ahmed Adel, et al.
Published: (2025)
by: Attia, Ahmed Adel, et al.
Published: (2025)
EGSTalker: Real-Time Audio-Driven Talking Head Generation with Efficient Gaussian Deformation
by: Zhu, Tianheng, et al.
Published: (2025)
by: Zhu, Tianheng, et al.
Published: (2025)
Defense Against Synthetic Speech: Real-Time Detection of RVC Voice Conversion Attacks
by: Chinchmalatpure, Prajwal, et al.
Published: (2025)
by: Chinchmalatpure, Prajwal, et al.
Published: (2025)
A Two-Stage Hierarchical Deep Filtering Framework for Real-Time Speech Enhancement
by: Lu, Shenghui, et al.
Published: (2025)
by: Lu, Shenghui, et al.
Published: (2025)
EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech
by: Cho, Deok-Hyeon, et al.
Published: (2024)
by: Cho, Deok-Hyeon, et al.
Published: (2024)
Similar Items
-
Towards Lightweight Adaptation of Speech Enhancement Models in Real-World Environments
by: Cheng, Longbiao, et al.
Published: (2026) -
Transfer Learning-Based Deep Residual Learning for Speech Recognition in Clean and Noisy Environments
by: Djeffal, Noussaiba, et al.
Published: (2025) -
End-to-End Real-World Polyphonic Piano Audio-to-Score Transcription with Hierarchical Decoding
by: Zeng, Wei, et al.
Published: (2024) -
R2-SVC: Towards Real-World Robust and Expressive Zero-shot Singing Voice Conversion
by: Zheng, Junjie, et al.
Published: (2025) -
Deploying UDM Series in Real-Life Stuttered Speech Applications: A Clinical Evaluation Framework
by: Zhang, Eric, et al.
Published: (2025)