:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Moon, Seokhoon, Jung, Kyudan, Choo, Jaegul
Format:	Preprint
Published:	2026
Subjects:	Sound
Online Access:	https://arxiv.org/abs/2603.05302
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SNAP: Speaker Nulling for Artifact Projection in Speech Deepfake Detection
by: Jung, Kyudan, et al.
Published: (2026)

Sommelier: Scalable Open Multi-turn Audio Pre-processing for Full-duplex Speech Language Models
by: Jung, Kyudan, et al.
Published: (2026)

Evaluating Automatic Speech Recognition Systems for Korean Meteorological Experts
by: Park, ChaeHun, et al.
Published: (2024)

MathReader : Text-to-Speech for Mathematical Documents
by: Hyeon, Sieun, et al.
Published: (2025)

Improving Design of Input Condition Invariant Speech Enhancement
by: Zhang, Wangyou, et al.
Published: (2024)

Layer-wise Analysis for Quality of Multilingual Synthesized Speech
by: Cooper, Erica, et al.
Published: (2025)

Enhancing ASR Performance through OCR Word Frequency Analysis: Theoretical Foundations
by: Jung, Kyudan, et al.
Published: (2024)

FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching
by: Jung, Chaeyoung, et al.
Published: (2024)

DISPATCH: Distilling Selective Patches for Speech Enhancement
by: Kim, Dohwan, et al.
Published: (2025)

Efficient Speech Enhancement via Embeddings from Pre-trained Generative Audioencoders
by: Sun, Xingwei, et al.
Published: (2025)

Unified Architecture and Unsupervised Speech Disentanglement for Speaker Embedding-Free Enrollment in Personalized Speech Enhancement
by: Huang, Ziling, et al.
Published: (2025)

Talk to Your Slides: High-Efficiency Slide Editing via Language-Driven Structured Data Manipulation
by: Jung, Kyudan, et al.
Published: (2025)

Toward Universal Speech Enhancement for Diverse Input Conditions
by: Zhang, Wangyou, et al.
Published: (2023)

Flowing Straighter with Conditional Flow Matching for Accurate Speech Enhancement
by: Cross, Mattias, et al.
Published: (2025)

Contrastive Knowledge Distillation for Embedding Refinement in Personalized Speech Enhancement
by: Serre, Thomas, et al.
Published: (2026)

Interpreting End-to-End Deep Learning Models for Speech Source Localization Using Layer-wise Relevance Propagation
by: Comanducci, Luca, et al.
Published: (2024)

Universal Discrete-Domain Speech Enhancement
by: Liu, Fei, et al.
Published: (2025)

Investigating the Effects of Diffusion-based Conditional Generative Speech Models Used for Speech Enhancement on Dysarthric Speech
by: Reszka, Joanna, et al.
Published: (2024)

Conditional Latent Diffusion-Based Speech Enhancement Via Dual Context Learning
by: Zhao, Shengkui, et al.
Published: (2025)

Low-latency Speech Enhancement via Speech Token Generation
by: Xue, Huaying, et al.
Published: (2023)

Which Data Matter? Embedding-Based Data Selection for Speech Recognition
by: Aldeneh, Zakaria, et al.
Published: (2026)

Multi-Channel Speech Enhancement for Cocktail Party Speech Emotion Recognition
by: Chen, Youjun, et al.
Published: (2026)

Interventional Speech Noise Injection for ASR Generalizable Spoken Language Understanding
by: Jung, Yeonjoon, et al.
Published: (2024)

Diffusion-based Frameworks for Unsupervised Speech Enhancement
by: Ayilo, Jean-Eudes, et al.
Published: (2026)

Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement
by: Zhang, Wangyou, et al.
Published: (2024)

VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesis
by: Jung, Jaemin, et al.
Published: (2024)

Speech Enhancement Using Continuous Embeddings of Neural Audio Codec
by: Li, Haoyang, et al.
Published: (2025)

Combining Deterministic Enhanced Conditions with Dual-Streaming Encoding for Diffusion-Based Speech Enhancement
by: Shi, Hao, et al.
Published: (2025)

Bone-conduction Guided Multimodal Speech Enhancement with Conditional Diffusion Models
by: Khanagha, Sina, et al.
Published: (2026)

A Probabilistic Generative Model for Spectral Speech Enhancement
by: Hidalgo-Araya, Marco, et al.
Published: (2026)

NOMAD: Unsupervised Learning of Perceptual Embeddings for Speech Enhancement and Non-matching Reference Audio Quality Assessment
by: Ragano, Alessandro, et al.
Published: (2023)

Autoregressive Speech Enhancement via Acoustic Tokens
by: Della Libera, Luca, et al.
Published: (2025)

Personalized Speech Enhancement Without a Separate Speaker Embedding Model
by: Pärnamaa, Tanel, et al.
Published: (2024)

Robust One-step Speech Enhancement via Consistency Distillation
by: Xu, Liang, et al.
Published: (2025)

GAN-Based Speech Enhancement for Low SNR Using Latent Feature Conditioning
by: Shetu, Shrishti Saha, et al.
Published: (2024)

Input Conditioned Layer Dropping in Speech Foundation Models
by: Hannan, Abdul, et al.
Published: (2025)

Crab: Multi Layer Contrastive Supervision to Improve Speech Emotion Recognition Under Both Acted and Natural Speech Condition
by: Ueda, Lucas H., et al.
Published: (2026)

Plugin Speech Enhancement: A Universal Speech Enhancement Framework Inspired by Dynamic Neural Network
by: Chen, Yanan, et al.
Published: (2024)

ParaGSE: Parallel Generative Speech Enhancement with Group-Vector-Quantization-based Neural Speech Codec
by: Liu, Fei, et al.
Published: (2026)

Comprehensive Layer-wise Analysis of SSL Models for Audio Deepfake Detection
by: Kheir, Yassine El, et al.
Published: (2025)