Saved in:
| Main Authors: | Lee, Euihyeok, Kim, Seonghyeon, Im, SangHun, Oh, Heung-Seon, Kang, Seungwoo |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.07493 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TARDiS : Text Augmentation for Refining Diversity and Separability
by: Kim, Kyungmin, et al.
Published: (2025)
by: Kim, Kyungmin, et al.
Published: (2025)
Fusion Segment Transformer: Bi-Directional Attention Guided Fusion Network for AI-Generated Music Detection
by: Kim, Yumin, et al.
Published: (2026)
by: Kim, Yumin, et al.
Published: (2026)
Segment Transformer: AI-Generated Music Detection via Music Structural Analysis
by: Kim, Yumin, et al.
Published: (2025)
by: Kim, Yumin, et al.
Published: (2025)
HAIM: Human-AI Music Datasets for AI Music Production Tracking Benchmark
by: Go, Seonghyeon, et al.
Published: (2026)
by: Go, Seonghyeon, et al.
Published: (2026)
Real-world Music Plagiarism Detection With Music Segment Transcription System
by: Go, Seonghyeon
Published: (2025)
by: Go, Seonghyeon
Published: (2025)
Music Plagiarism Detection: Problem Formulation and a Segment-based Solution
by: Go, Seonghyeon, et al.
Published: (2026)
by: Go, Seonghyeon, et al.
Published: (2026)
ASKD-Whisper: Adaptive Self-knowledge Distillation for Efficient and Low-Latency Automatic Speech Recognition
by: Lee, Junseok, et al.
Published: (2026)
by: Lee, Junseok, et al.
Published: (2026)
EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech
by: Cho, Deok-Hyeon, et al.
Published: (2024)
by: Cho, Deok-Hyeon, et al.
Published: (2024)
A Novel Automatic Framework for Speaker Drift Detection in Synthesized Speech
by: Huang, Jia-Hong, et al.
Published: (2026)
by: Huang, Jia-Hong, et al.
Published: (2026)
Robust TTS Training via Self-Purifying Flow Matching for the WildSpoof 2026 TTS Track
by: Yi, June Young, et al.
Published: (2025)
by: Yi, June Young, et al.
Published: (2025)
DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech
by: Cho, Deok-Hyeon, et al.
Published: (2025)
by: Cho, Deok-Hyeon, et al.
Published: (2025)
Enabling Automatic Disordered Speech Recognition: An Impaired Speech Dataset in the Akan Language
by: Wiafe, Isaac, et al.
Published: (2026)
by: Wiafe, Isaac, et al.
Published: (2026)
From Talking to Singing: A New Challenge for Audio-Visual Deepfake Detection
by: Liu, Ke, et al.
Published: (2026)
by: Liu, Ke, et al.
Published: (2026)
Alternating Approach-Putt Models for Multi-Stage Speech Enhancement
by: Jeong, Iksoon, et al.
Published: (2025)
by: Jeong, Iksoon, et al.
Published: (2025)
Toward Complex-Valued Neural Networks for Waveform Generation
by: Oh, Hyung-Seok, et al.
Published: (2026)
by: Oh, Hyung-Seok, et al.
Published: (2026)
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment
by: Oh, Hyung-Seok, et al.
Published: (2024)
by: Oh, Hyung-Seok, et al.
Published: (2024)
Noise-Agnostic Multitask Whisper Training for Reducing False Alarm Errors in Call-for-Help Detection
by: Ryu, Myeonghoon, et al.
Published: (2025)
by: Ryu, Myeonghoon, et al.
Published: (2025)
SNAP: Speaker Nulling for Artifact Projection in Speech Deepfake Detection
by: Jung, Kyudan, et al.
Published: (2026)
by: Jung, Kyudan, et al.
Published: (2026)
SAGE-LD: Towards Scalable and Generalizable End-to-End Language Diarization via Simulated Data Augmentation
by: Lee, Sangmin, et al.
Published: (2025)
by: Lee, Sangmin, et al.
Published: (2025)
Subject-Independent Imagined Speech Detection via Cross-Subject Generalization and Calibration
by: Ko, Byung-Kwan, et al.
Published: (2025)
by: Ko, Byung-Kwan, et al.
Published: (2025)
EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector
by: Cho, Deok-Hyeon, et al.
Published: (2024)
by: Cho, Deok-Hyeon, et al.
Published: (2024)
Training Flow Matching Models with Reliable Labels via Self-Purification
by: Kim, Hyeongju, et al.
Published: (2025)
by: Kim, Hyeongju, et al.
Published: (2025)
TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models
by: Low, Chetwin, et al.
Published: (2025)
by: Low, Chetwin, et al.
Published: (2025)
BERT-APC: A Reference-free Framework for Automatic Pitch Correction via Musical Context Inference
by: Kim, Sungjae, et al.
Published: (2025)
by: Kim, Sungjae, et al.
Published: (2025)
DRASP: A Dual-Resolution Attentive Statistics Pooling Framework for Automatic MOS Prediction
by: Yang, Cheng-Yeh, et al.
Published: (2025)
by: Yang, Cheng-Yeh, et al.
Published: (2025)
Beyond Monologue: Interactive Talking-Listening Avatar Generation with Conversational Audio Context-Aware Kernels
by: Weng, Yuzhe, et al.
Published: (2026)
by: Weng, Yuzhe, et al.
Published: (2026)
AsymTalker: Identity-Consistent Long-Term Talking Head Generation via Asymmetric Distillation
by: Lu, Yuxin, et al.
Published: (2026)
by: Lu, Yuxin, et al.
Published: (2026)
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning
by: Kim, Jaeyeon, et al.
Published: (2024)
by: Kim, Jaeyeon, et al.
Published: (2024)
Deepfake Audio Detection Using Self-supervised Fusion Representations
by: Zaman, Khalid, et al.
Published: (2026)
by: Zaman, Khalid, et al.
Published: (2026)
Towards Realistic Synthetic Data for Automatic Drum Transcription
by: Melucci, Pierfrancesco, et al.
Published: (2026)
by: Melucci, Pierfrancesco, et al.
Published: (2026)
Uncertainty-Aware 3D Emotional Talking Face Synthesis with Emotion Prior Distillation
by: Shen, Nanhan, et al.
Published: (2026)
by: Shen, Nanhan, et al.
Published: (2026)
VorTEX: Various overlap ratio for Target speech EXtraction
by: Oh, Ro-hoon, et al.
Published: (2026)
by: Oh, Ro-hoon, et al.
Published: (2026)
RAS: a Reliability Oriented Metric for Automatic Speech Recognition
by: Huang, Wenbin, et al.
Published: (2026)
by: Huang, Wenbin, et al.
Published: (2026)
Expanding on EnCLAP with Auxiliary Retrieval Model for Automated Audio Captioning
by: Kim, Jaeyeon, et al.
Published: (2024)
by: Kim, Jaeyeon, et al.
Published: (2024)
EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance
by: Kim, Jaeyeon, et al.
Published: (2024)
by: Kim, Jaeyeon, et al.
Published: (2024)
WoW-Bench: Evaluating Fine-Grained Acoustic Perception in Audio-Language Models via Marine Mammal Vocalizations
by: Kim, Jaeyeon, et al.
Published: (2025)
by: Kim, Jaeyeon, et al.
Published: (2025)
HuBERT-VIC: Improving Noise-Robust Automatic Speech Recognition of Speech Foundation Model via Variance-Invariance-Covariance Regularization
by: Ahn, Hyebin, et al.
Published: (2025)
by: Ahn, Hyebin, et al.
Published: (2025)
ZeSTA: Zero-Shot TTS Augmentation with Domain-Conditioned Training for Data-Efficient Personalized Speech Synthesis
by: Choi, Youngwon, et al.
Published: (2026)
by: Choi, Youngwon, et al.
Published: (2026)
Do we really need Self-Attention for Streaming Automatic Speech Recognition?
by: Dkhissi, Youness, et al.
Published: (2026)
by: Dkhissi, Youness, et al.
Published: (2026)
TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and Text-to-Instrument
by: Kim, Kyungsu, et al.
Published: (2025)
by: Kim, Kyungsu, et al.
Published: (2025)
Similar Items
-
TARDiS : Text Augmentation for Refining Diversity and Separability
by: Kim, Kyungmin, et al.
Published: (2025) -
Fusion Segment Transformer: Bi-Directional Attention Guided Fusion Network for AI-Generated Music Detection
by: Kim, Yumin, et al.
Published: (2026) -
Segment Transformer: AI-Generated Music Detection via Music Structural Analysis
by: Kim, Yumin, et al.
Published: (2025) -
HAIM: Human-AI Music Datasets for AI Music Production Tracking Benchmark
by: Go, Seonghyeon, et al.
Published: (2026) -
Real-world Music Plagiarism Detection With Music Segment Transcription System
by: Go, Seonghyeon
Published: (2025)