Saved in:
| Main Authors: | Ok, Seaone, Choi, Min Jun, Kim, Eungbeom, Han, Seungu, Lee, Kyogu |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.08293 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Rethinking Speech Representation Aggregation in Speech Enhancement: A Phonetic Mutual Information Perspective
by: Han, Seungu, et al.
Published: (2026)
by: Han, Seungu, et al.
Published: (2026)
Few-step Adversarial Schrödinger Bridge for Generative Speech Enhancement
by: Han, Seungu, et al.
Published: (2025)
by: Han, Seungu, et al.
Published: (2025)
Differentiable Acoustic Radiance Transfer
by: Lee, Sungho, et al.
Published: (2025)
by: Lee, Sungho, et al.
Published: (2025)
Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation
by: Kim, Eungbeom, et al.
Published: (2024)
by: Kim, Eungbeom, et al.
Published: (2024)
Towards Bitrate-Efficient and Noise-Robust Speech Coding with Variable Bitrate RVQ
by: Chae, Yunkee, et al.
Published: (2025)
by: Chae, Yunkee, et al.
Published: (2025)
Improving Noise Robust Audio-Visual Speech Recognition via Router-Gated Cross-Modal Feature Fusion
by: Lim, DongHoon, et al.
Published: (2025)
by: Lim, DongHoon, et al.
Published: (2025)
Differentiable Modal Synthesis for Physical Modeling of Planar String Sound and Motion Simulation
by: Lee, Jin Woo, et al.
Published: (2024)
by: Lee, Jin Woo, et al.
Published: (2024)
String Sound Synthesizer on GPU-accelerated Finite Difference Scheme
by: Lee, Jin Woo, et al.
Published: (2023)
by: Lee, Jin Woo, et al.
Published: (2023)
Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition
by: Kim, Sungnyun, et al.
Published: (2024)
by: Kim, Sungnyun, et al.
Published: (2024)
Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder
by: Dai, Yusheng, et al.
Published: (2023)
by: Dai, Yusheng, et al.
Published: (2023)
XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
by: Han, HyoJung, et al.
Published: (2024)
by: Han, HyoJung, et al.
Published: (2024)
Learning Semantic Information from Raw Audio Signal Using Both Contextual and Phonetic Representations
by: Kim, Jaeyeon, et al.
Published: (2024)
by: Kim, Jaeyeon, et al.
Published: (2024)
Robust LLM-based Audio-Visual Speech Recognition with Sparse Modality Alignment and Visual Unit-Guided Refinement
by: Su, Fei, et al.
Published: (2026)
by: Su, Fei, et al.
Published: (2026)
Interpreting the Role of Visemes in Audio-Visual Speech Recognition
by: Papadopoulos, Aristeidis, et al.
Published: (2025)
by: Papadopoulos, Aristeidis, et al.
Published: (2025)
Removing Speaker Information from Speech Representation using Variable-Length Soft Pooling
by: Hwang, Injune, et al.
Published: (2024)
by: Hwang, Injune, et al.
Published: (2024)
Audio-Visual Speech Separation via Bottleneck Iterative Network
by: Zhang, Sidong, et al.
Published: (2025)
by: Zhang, Sidong, et al.
Published: (2025)
Uncovering the Visual Contribution in Audio-Visual Speech Recognition
by: Lin, Zhaofeng, et al.
Published: (2024)
by: Lin, Zhaofeng, et al.
Published: (2024)
MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition
by: Wang, He, et al.
Published: (2024)
by: Wang, He, et al.
Published: (2024)
Vo-Ve: An Explainable Voice-Vector for Speaker Identity Evaluation
by: Lee, Jaejun, et al.
Published: (2025)
by: Lee, Jaejun, et al.
Published: (2025)
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition
by: Rouditchenko, Andrew, et al.
Published: (2025)
by: Rouditchenko, Andrew, et al.
Published: (2025)
AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech Recognition
by: Liu, Zehua, et al.
Published: (2024)
by: Liu, Zehua, et al.
Published: (2024)
Human-Inspired Computing for Robust and Efficient Audio-Visual Speech Recognition
by: Liu, Qianhui, et al.
Published: (2024)
by: Liu, Qianhui, et al.
Published: (2024)
UNet-Based Fusion and Exponential Moving Average Adaptation for Noise-Robust Speaker Recognition
by: Gan, Chong-Xin, et al.
Published: (2026)
by: Gan, Chong-Xin, et al.
Published: (2026)
Enhancing Dialogue Speech Recognition with Robust Contextual Awareness via Noise Representation Learning
by: Lee, Wonjun, et al.
Published: (2024)
by: Lee, Wonjun, et al.
Published: (2024)
FairASR: Fair Audio Contrastive Learning for Automatic Speech Recognition
by: Kim, Jongsuk, et al.
Published: (2025)
by: Kim, Jongsuk, et al.
Published: (2025)
Wavespace: A Highly Explorable Wavetable Generator
by: Lee, Hazounne, et al.
Published: (2024)
by: Lee, Hazounne, et al.
Published: (2024)
Bridging the Modality Gap: Softly Discretizing Audio Representation for LLM-based Automatic Speech Recognition
by: Yang, Mu, et al.
Published: (2025)
by: Yang, Mu, et al.
Published: (2025)
Audio-Visual Feature Synchronization for Robust Speech Enhancement in Hearing Aids
by: Saleem, Nasir, et al.
Published: (2025)
by: Saleem, Nasir, et al.
Published: (2025)
MoHAVE: Mixture of Hierarchical Audio-Visual Experts for Robust Speech Recognition
by: Kim, Sungnyun, et al.
Published: (2025)
by: Kim, Sungnyun, et al.
Published: (2025)
An Investigation Into Explainable Audio Hate Speech Detection
by: An, Jinmyeong, et al.
Published: (2024)
by: An, Jinmyeong, et al.
Published: (2024)
Music De-limiter Networks via Sample-wise Gain Inversion
by: Jeon, Chang-Bin, et al.
Published: (2023)
by: Jeon, Chang-Bin, et al.
Published: (2023)
Multimodal Representation Loss Between Timed Text and Audio for Regularized Speech Separation
by: Hsieh, Tsun-An, et al.
Published: (2024)
by: Hsieh, Tsun-An, et al.
Published: (2024)
Scalable Frameworks for Real-World Audio-Visual Speech Recognition
by: Kim, Sungnyun
Published: (2025)
by: Kim, Sungnyun
Published: (2025)
Noisy Disentanglement with Tri-stage Training for Noise-Robust Speech Recognition
by: Chen, Shuangyuan, et al.
Published: (2025)
by: Chen, Shuangyuan, et al.
Published: (2025)
DyPCL: Dynamic Phoneme-level Contrastive Learning for Dysarthric Speech Recognition
by: Lee, Wonjun, et al.
Published: (2025)
by: Lee, Wonjun, et al.
Published: (2025)
Purification Before Fusion: Toward Mask-Free Speech Enhancement for Robust Audio-Visual Speech Recognition
by: Wu, Linzhi, et al.
Published: (2026)
by: Wu, Linzhi, et al.
Published: (2026)
FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching
by: Jung, Chaeyoung, et al.
Published: (2024)
by: Jung, Chaeyoung, et al.
Published: (2024)
Robust Audio-Visual Speech Enhancement: Correcting Misassignments in Complex Environments with Advanced Post-Processing
by: Ren, Wenze, et al.
Published: (2024)
by: Ren, Wenze, et al.
Published: (2024)
GRAFX: An Open-Source Library for Audio Processing Graphs in PyTorch
by: Lee, Sungho, et al.
Published: (2024)
by: Lee, Sungho, et al.
Published: (2024)
Do Captioning Metrics Reflect Music Semantic Alignment?
by: Lee, Jinwoo, et al.
Published: (2024)
by: Lee, Jinwoo, et al.
Published: (2024)
Similar Items
-
Rethinking Speech Representation Aggregation in Speech Enhancement: A Phonetic Mutual Information Perspective
by: Han, Seungu, et al.
Published: (2026) -
Few-step Adversarial Schrödinger Bridge for Generative Speech Enhancement
by: Han, Seungu, et al.
Published: (2025) -
Differentiable Acoustic Radiance Transfer
by: Lee, Sungho, et al.
Published: (2025) -
Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation
by: Kim, Eungbeom, et al.
Published: (2024) -
Towards Bitrate-Efficient and Noise-Robust Speech Coding with Variable Bitrate RVQ
by: Chae, Yunkee, et al.
Published: (2025)