:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lee, Euihyeok, Kim, Seonghyeon, Im, SangHun, Oh, Heung-Seon, Kang, Seungwoo
Format:	Preprint
Published:	2025
Subjects:	Sound Artificial Intelligence
Online Access:	https://arxiv.org/abs/2511.07493
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

TARDiS : Text Augmentation for Refining Diversity and Separability
by: Kim, Kyungmin, et al.
Published: (2025)

Fusion Segment Transformer: Bi-Directional Attention Guided Fusion Network for AI-Generated Music Detection
by: Kim, Yumin, et al.
Published: (2026)

Segment Transformer: AI-Generated Music Detection via Music Structural Analysis
by: Kim, Yumin, et al.
Published: (2025)

HAIM: Human-AI Music Datasets for AI Music Production Tracking Benchmark
by: Go, Seonghyeon, et al.
Published: (2026)

Real-world Music Plagiarism Detection With Music Segment Transcription System
by: Go, Seonghyeon
Published: (2025)

Music Plagiarism Detection: Problem Formulation and a Segment-based Solution
by: Go, Seonghyeon, et al.
Published: (2026)

ASKD-Whisper: Adaptive Self-knowledge Distillation for Efficient and Low-Latency Automatic Speech Recognition
by: Lee, Junseok, et al.
Published: (2026)

EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech
by: Cho, Deok-Hyeon, et al.
Published: (2024)

A Novel Automatic Framework for Speaker Drift Detection in Synthesized Speech
by: Huang, Jia-Hong, et al.
Published: (2026)

Robust TTS Training via Self-Purifying Flow Matching for the WildSpoof 2026 TTS Track
by: Yi, June Young, et al.
Published: (2025)

DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech
by: Cho, Deok-Hyeon, et al.
Published: (2025)

Enabling Automatic Disordered Speech Recognition: An Impaired Speech Dataset in the Akan Language
by: Wiafe, Isaac, et al.
Published: (2026)

From Talking to Singing: A New Challenge for Audio-Visual Deepfake Detection
by: Liu, Ke, et al.
Published: (2026)

Alternating Approach-Putt Models for Multi-Stage Speech Enhancement
by: Jeong, Iksoon, et al.
Published: (2025)

Toward Complex-Valued Neural Networks for Waveform Generation
by: Oh, Hyung-Seok, et al.
Published: (2026)

DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment
by: Oh, Hyung-Seok, et al.
Published: (2024)

Noise-Agnostic Multitask Whisper Training for Reducing False Alarm Errors in Call-for-Help Detection
by: Ryu, Myeonghoon, et al.
Published: (2025)

SNAP: Speaker Nulling for Artifact Projection in Speech Deepfake Detection
by: Jung, Kyudan, et al.
Published: (2026)

SAGE-LD: Towards Scalable and Generalizable End-to-End Language Diarization via Simulated Data Augmentation
by: Lee, Sangmin, et al.
Published: (2025)

Subject-Independent Imagined Speech Detection via Cross-Subject Generalization and Calibration
by: Ko, Byung-Kwan, et al.
Published: (2025)

EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector
by: Cho, Deok-Hyeon, et al.
Published: (2024)

Training Flow Matching Models with Reliable Labels via Self-Purification
by: Kim, Hyeongju, et al.
Published: (2025)

TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models
by: Low, Chetwin, et al.
Published: (2025)

BERT-APC: A Reference-free Framework for Automatic Pitch Correction via Musical Context Inference
by: Kim, Sungjae, et al.
Published: (2025)

DRASP: A Dual-Resolution Attentive Statistics Pooling Framework for Automatic MOS Prediction
by: Yang, Cheng-Yeh, et al.
Published: (2025)

Beyond Monologue: Interactive Talking-Listening Avatar Generation with Conversational Audio Context-Aware Kernels
by: Weng, Yuzhe, et al.
Published: (2026)

AsymTalker: Identity-Consistent Long-Term Talking Head Generation via Asymmetric Distillation
by: Lu, Yuxin, et al.
Published: (2026)

EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning
by: Kim, Jaeyeon, et al.
Published: (2024)

Deepfake Audio Detection Using Self-supervised Fusion Representations
by: Zaman, Khalid, et al.
Published: (2026)

Towards Realistic Synthetic Data for Automatic Drum Transcription
by: Melucci, Pierfrancesco, et al.
Published: (2026)

Uncertainty-Aware 3D Emotional Talking Face Synthesis with Emotion Prior Distillation
by: Shen, Nanhan, et al.
Published: (2026)

VorTEX: Various overlap ratio for Target speech EXtraction
by: Oh, Ro-hoon, et al.
Published: (2026)

RAS: a Reliability Oriented Metric for Automatic Speech Recognition
by: Huang, Wenbin, et al.
Published: (2026)

Expanding on EnCLAP with Auxiliary Retrieval Model for Automated Audio Captioning
by: Kim, Jaeyeon, et al.
Published: (2024)

EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance
by: Kim, Jaeyeon, et al.
Published: (2024)

WoW-Bench: Evaluating Fine-Grained Acoustic Perception in Audio-Language Models via Marine Mammal Vocalizations
by: Kim, Jaeyeon, et al.
Published: (2025)

HuBERT-VIC: Improving Noise-Robust Automatic Speech Recognition of Speech Foundation Model via Variance-Invariance-Covariance Regularization
by: Ahn, Hyebin, et al.
Published: (2025)

ZeSTA: Zero-Shot TTS Augmentation with Domain-Conditioned Training for Data-Efficient Personalized Speech Synthesis
by: Choi, Youngwon, et al.
Published: (2026)

Do we really need Self-Attention for Streaming Automatic Speech Recognition?
by: Dkhissi, Youness, et al.
Published: (2026)

TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and Text-to-Instrument
by: Kim, Kyungsu, et al.
Published: (2025)