Saved in:
| Main Authors: | Kim, David Joohun, Anjum, Daniyal, Banerjee, Bonny, Abbasi, Omar |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.08412 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Reverse Attention for Lightweight Speech Enhancement on Edge Devices
by: Ojha, Shuubham, et al.
Published: (2025)
by: Ojha, Shuubham, et al.
Published: (2025)
Learnable Pulse Accumulation for On-Device Speech Recognition: How Much Attention Do You Need?
by: Shkolnikov, Yakov Pyotr
Published: (2026)
by: Shkolnikov, Yakov Pyotr
Published: (2026)
Two-stage Audio-Visual Target Speaker Extraction System for Real-Time Processing On Edge Device
by: Li, Zixuan, et al.
Published: (2025)
by: Li, Zixuan, et al.
Published: (2025)
Device Feature based on Graph Fourier Transformation with Logarithmic Processing For Detection of Replay Speech Attacks
by: He, Mingrui, et al.
Published: (2024)
by: He, Mingrui, et al.
Published: (2024)
Defense Against Synthetic Speech: Real-Time Detection of RVC Voice Conversion Attacks
by: Chinchmalatpure, Prajwal, et al.
Published: (2025)
by: Chinchmalatpure, Prajwal, et al.
Published: (2025)
Adaptive Knowledge Distillation for Device-Directed Speech Detection
by: Chi, Hyung Gun, et al.
Published: (2025)
by: Chi, Hyung Gun, et al.
Published: (2025)
Voice Cloning for Dysarthric Speech Synthesis: Addressing Data Scarcity in Speech-Language Pathology
by: Moell, Birger, et al.
Published: (2025)
by: Moell, Birger, et al.
Published: (2025)
VoiceTailor: Lightweight Plug-In Adapter for Diffusion-Based Personalized Text-to-Speech
by: Kim, Heeseung, et al.
Published: (2024)
by: Kim, Heeseung, et al.
Published: (2024)
Auden-Voice: General-Purpose Voice Encoder for Speech and Language Understanding
by: Huo, Mingyue, et al.
Published: (2025)
by: Huo, Mingyue, et al.
Published: (2025)
Channel-Combination Algorithms for Robust Distant Voice Activity and Overlapped Speech Detection
by: Mariotte, Théo, et al.
Published: (2024)
by: Mariotte, Théo, et al.
Published: (2024)
NanoVoice: Efficient Speaker-Adaptive Text-to-Speech for Multiple Speakers
by: Park, Nohil, et al.
Published: (2024)
by: Park, Nohil, et al.
Published: (2024)
M$^{3}$V: A multi-modal multi-view approach for Device-Directed Speech Detection
by: Wang, Anna, et al.
Published: (2024)
by: Wang, Anna, et al.
Published: (2024)
Voice-ENHANCE: Speech Restoration using a Diffusion-based Voice Conversion Framework
by: Byun, Kyungguen, et al.
Published: (2025)
by: Byun, Kyungguen, et al.
Published: (2025)
DISPATCH: Distilling Selective Patches for Speech Enhancement
by: Kim, Dohwan, et al.
Published: (2025)
by: Kim, Dohwan, et al.
Published: (2025)
A Multilingual Framework for Dysarthria: Detection, Severity Classification, Speech-to-Text, and Clean Speech Generation
by: Raghu, Ananya, et al.
Published: (2025)
by: Raghu, Ananya, et al.
Published: (2025)
Amplifying Artifacts with Speech Enhancement in Voice Anti-spoofing
by: Trachu, Thanapat, et al.
Published: (2025)
by: Trachu, Thanapat, et al.
Published: (2025)
Speech Synthesis along Perceptual Voice Quality Dimensions
by: Rautenberg, Frederik, et al.
Published: (2025)
by: Rautenberg, Frederik, et al.
Published: (2025)
A Real-Time Voice Activity Detection Based On Lightweight Neural
by: Jia, Jidong, et al.
Published: (2024)
by: Jia, Jidong, et al.
Published: (2024)
Towards Scalable AASIST: Refining Graph Attention for Speech Deepfake Detection
by: Viakhirev, Ivan, et al.
Published: (2025)
by: Viakhirev, Ivan, et al.
Published: (2025)
Water Flow Detection Device Based on Sound Data Analysis and Machine Learning to Detect Water Leakage
by: Pourmehrani, Hossein, et al.
Published: (2025)
by: Pourmehrani, Hossein, et al.
Published: (2025)
PI-Whisper: Designing an Adaptive and Incremental Automatic Speech Recognition System for Edge Devices
by: Nassereldine, Amir, et al.
Published: (2024)
by: Nassereldine, Amir, et al.
Published: (2024)
SF-Speech: Straightened Flow for Zero-Shot Voice Clone
by: Li, Xuyuan, et al.
Published: (2024)
by: Li, Xuyuan, et al.
Published: (2024)
The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction
by: Huang, Wen-Chin, et al.
Published: (2024)
by: Huang, Wen-Chin, et al.
Published: (2024)
RAVE for Speech: Efficient Voice Conversion at High Sampling Rates
by: Bargum, Anders R., et al.
Published: (2024)
by: Bargum, Anders R., et al.
Published: (2024)
Quality Assessment of Noisy and Enhanced Speech with Limited Data: UWB-NTIS System for VoiceMOS 2024
by: Kunešová, Marie, et al.
Published: (2025)
by: Kunešová, Marie, et al.
Published: (2025)
End-to-End Integration of Speech Emotion Recognition with Voice Activity Detection using Self-Supervised Learning Features
by: Yamashita, Natsuo, et al.
Published: (2024)
by: Yamashita, Natsuo, et al.
Published: (2024)
Time-Frequency-Based Attention Cache Memory Model for Real-Time Speech Separation
by: Chen, Guo, et al.
Published: (2025)
by: Chen, Guo, et al.
Published: (2025)
VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing
by: Zheng, Zhisheng, et al.
Published: (2025)
by: Zheng, Zhisheng, et al.
Published: (2025)
VoiceGuider: Enhancing Out-of-Domain Performance in Parameter-Efficient Speaker-Adaptive Text-to-Speech via Autoguidance
by: Yeom, Jiheum, et al.
Published: (2024)
by: Yeom, Jiheum, et al.
Published: (2024)
Re-Parameterization of Lightweight Transformer for On-Device Speech Emotion Recognition
by: Zhang, Zixing, et al.
Published: (2024)
by: Zhang, Zixing, et al.
Published: (2024)
VC-ENHANCE: Speech Restoration with Integrated Noise Suppression and Voice Conversion
by: Byun, Kyungguen, et al.
Published: (2024)
by: Byun, Kyungguen, et al.
Published: (2024)
Loudspeaker Beamforming to Enhance Speech Recognition Performance of Voice Driven Applications
by: de Groot, Dimme, et al.
Published: (2025)
by: de Groot, Dimme, et al.
Published: (2025)
VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration
by: Kirdey, Stanislav
Published: (2025)
by: Kirdey, Stanislav
Published: (2025)
StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion
by: Wang, Zhichao, et al.
Published: (2024)
by: Wang, Zhichao, et al.
Published: (2024)
Freeze and Learn: Continual Learning with Selective Freezing for Speech Deepfake Detection
by: Salvi, Davide, et al.
Published: (2024)
by: Salvi, Davide, et al.
Published: (2024)
REWIND: Speech Time Reversal for Enhancing Speaker Representations in Diffusion-based Voice Conversion
by: Biyani, Ishan D., et al.
Published: (2025)
by: Biyani, Ishan D., et al.
Published: (2025)
Speech to Speech Synthesis for Voice Impersonation
by: Johnson, Bjorn, et al.
Published: (2026)
by: Johnson, Bjorn, et al.
Published: (2026)
Safe Guard: an LLM-agent for Real-time Voice-based Hate Speech Detection in Social Virtual Reality
by: Xu, Yiwen, et al.
Published: (2024)
by: Xu, Yiwen, et al.
Published: (2024)
SOVA-Bench: Benchmarking the Speech Conversation Ability for LLM-based Voice Assistant
by: Hou, Yixuan, et al.
Published: (2025)
by: Hou, Yixuan, et al.
Published: (2025)
Voice Conversion for Likability Control via Automated Rating of Speech Synthesis Corpora
by: Suda, Hitoshi, et al.
Published: (2025)
by: Suda, Hitoshi, et al.
Published: (2025)
Similar Items
-
Reverse Attention for Lightweight Speech Enhancement on Edge Devices
by: Ojha, Shuubham, et al.
Published: (2025) -
Learnable Pulse Accumulation for On-Device Speech Recognition: How Much Attention Do You Need?
by: Shkolnikov, Yakov Pyotr
Published: (2026) -
Two-stage Audio-Visual Target Speaker Extraction System for Real-Time Processing On Edge Device
by: Li, Zixuan, et al.
Published: (2025) -
Device Feature based on Graph Fourier Transformation with Logarithmic Processing For Detection of Replay Speech Attacks
by: He, Mingrui, et al.
Published: (2024) -
Defense Against Synthetic Speech: Real-Time Detection of RVC Voice Conversion Attacks
by: Chinchmalatpure, Prajwal, et al.
Published: (2025)