:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kim, David Joohun, Anjum, Daniyal, Banerjee, Bonny, Abbasi, Omar
Format:	Preprint
Published:	2026
Subjects:	Sound Artificial Intelligence Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2604.08412
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Reverse Attention for Lightweight Speech Enhancement on Edge Devices
by: Ojha, Shuubham, et al.
Published: (2025)

Learnable Pulse Accumulation for On-Device Speech Recognition: How Much Attention Do You Need?
by: Shkolnikov, Yakov Pyotr
Published: (2026)

Two-stage Audio-Visual Target Speaker Extraction System for Real-Time Processing On Edge Device
by: Li, Zixuan, et al.
Published: (2025)

Device Feature based on Graph Fourier Transformation with Logarithmic Processing For Detection of Replay Speech Attacks
by: He, Mingrui, et al.
Published: (2024)

Defense Against Synthetic Speech: Real-Time Detection of RVC Voice Conversion Attacks
by: Chinchmalatpure, Prajwal, et al.
Published: (2025)

Adaptive Knowledge Distillation for Device-Directed Speech Detection
by: Chi, Hyung Gun, et al.
Published: (2025)

Voice Cloning for Dysarthric Speech Synthesis: Addressing Data Scarcity in Speech-Language Pathology
by: Moell, Birger, et al.
Published: (2025)

VoiceTailor: Lightweight Plug-In Adapter for Diffusion-Based Personalized Text-to-Speech
by: Kim, Heeseung, et al.
Published: (2024)

Auden-Voice: General-Purpose Voice Encoder for Speech and Language Understanding
by: Huo, Mingyue, et al.
Published: (2025)

Channel-Combination Algorithms for Robust Distant Voice Activity and Overlapped Speech Detection
by: Mariotte, Théo, et al.
Published: (2024)

NanoVoice: Efficient Speaker-Adaptive Text-to-Speech for Multiple Speakers
by: Park, Nohil, et al.
Published: (2024)

M$^{3}$V: A multi-modal multi-view approach for Device-Directed Speech Detection
by: Wang, Anna, et al.
Published: (2024)

Voice-ENHANCE: Speech Restoration using a Diffusion-based Voice Conversion Framework
by: Byun, Kyungguen, et al.
Published: (2025)

DISPATCH: Distilling Selective Patches for Speech Enhancement
by: Kim, Dohwan, et al.
Published: (2025)

A Multilingual Framework for Dysarthria: Detection, Severity Classification, Speech-to-Text, and Clean Speech Generation
by: Raghu, Ananya, et al.
Published: (2025)

Amplifying Artifacts with Speech Enhancement in Voice Anti-spoofing
by: Trachu, Thanapat, et al.
Published: (2025)

Speech Synthesis along Perceptual Voice Quality Dimensions
by: Rautenberg, Frederik, et al.
Published: (2025)

A Real-Time Voice Activity Detection Based On Lightweight Neural
by: Jia, Jidong, et al.
Published: (2024)

Towards Scalable AASIST: Refining Graph Attention for Speech Deepfake Detection
by: Viakhirev, Ivan, et al.
Published: (2025)

Water Flow Detection Device Based on Sound Data Analysis and Machine Learning to Detect Water Leakage
by: Pourmehrani, Hossein, et al.
Published: (2025)

PI-Whisper: Designing an Adaptive and Incremental Automatic Speech Recognition System for Edge Devices
by: Nassereldine, Amir, et al.
Published: (2024)

SF-Speech: Straightened Flow for Zero-Shot Voice Clone
by: Li, Xuyuan, et al.
Published: (2024)

The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction
by: Huang, Wen-Chin, et al.
Published: (2024)

RAVE for Speech: Efficient Voice Conversion at High Sampling Rates
by: Bargum, Anders R., et al.
Published: (2024)

Quality Assessment of Noisy and Enhanced Speech with Limited Data: UWB-NTIS System for VoiceMOS 2024
by: Kunešová, Marie, et al.
Published: (2025)

End-to-End Integration of Speech Emotion Recognition with Voice Activity Detection using Self-Supervised Learning Features
by: Yamashita, Natsuo, et al.
Published: (2024)

Time-Frequency-Based Attention Cache Memory Model for Real-Time Speech Separation
by: Chen, Guo, et al.
Published: (2025)

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing
by: Zheng, Zhisheng, et al.
Published: (2025)

VoiceGuider: Enhancing Out-of-Domain Performance in Parameter-Efficient Speaker-Adaptive Text-to-Speech via Autoguidance
by: Yeom, Jiheum, et al.
Published: (2024)

Re-Parameterization of Lightweight Transformer for On-Device Speech Emotion Recognition
by: Zhang, Zixing, et al.
Published: (2024)

VC-ENHANCE: Speech Restoration with Integrated Noise Suppression and Voice Conversion
by: Byun, Kyungguen, et al.
Published: (2024)

Loudspeaker Beamforming to Enhance Speech Recognition Performance of Voice Driven Applications
by: de Groot, Dimme, et al.
Published: (2025)

VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration
by: Kirdey, Stanislav
Published: (2025)

StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion
by: Wang, Zhichao, et al.
Published: (2024)

Freeze and Learn: Continual Learning with Selective Freezing for Speech Deepfake Detection
by: Salvi, Davide, et al.
Published: (2024)

REWIND: Speech Time Reversal for Enhancing Speaker Representations in Diffusion-based Voice Conversion
by: Biyani, Ishan D., et al.
Published: (2025)

Speech to Speech Synthesis for Voice Impersonation
by: Johnson, Bjorn, et al.
Published: (2026)

Safe Guard: an LLM-agent for Real-time Voice-based Hate Speech Detection in Social Virtual Reality
by: Xu, Yiwen, et al.
Published: (2024)

SOVA-Bench: Benchmarking the Speech Conversation Ability for LLM-based Voice Assistant
by: Hou, Yixuan, et al.
Published: (2025)

Voice Conversion for Likability Control via Automated Rating of Speech Synthesis Corpora
by: Suda, Hitoshi, et al.
Published: (2025)