:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ok, Seaone, Choi, Min Jun, Kim, Eungbeom, Han, Seungu, Lee, Kyogu
Format:	Preprint
Published:	2026
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2602.08293
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Rethinking Speech Representation Aggregation in Speech Enhancement: A Phonetic Mutual Information Perspective
by: Han, Seungu, et al.
Published: (2026)

Few-step Adversarial Schrödinger Bridge for Generative Speech Enhancement
by: Han, Seungu, et al.
Published: (2025)

Differentiable Acoustic Radiance Transfer
by: Lee, Sungho, et al.
Published: (2025)

Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation
by: Kim, Eungbeom, et al.
Published: (2024)

Towards Bitrate-Efficient and Noise-Robust Speech Coding with Variable Bitrate RVQ
by: Chae, Yunkee, et al.
Published: (2025)

Improving Noise Robust Audio-Visual Speech Recognition via Router-Gated Cross-Modal Feature Fusion
by: Lim, DongHoon, et al.
Published: (2025)

Differentiable Modal Synthesis for Physical Modeling of Planar String Sound and Motion Simulation
by: Lee, Jin Woo, et al.
Published: (2024)

String Sound Synthesizer on GPU-accelerated Finite Difference Scheme
by: Lee, Jin Woo, et al.
Published: (2023)

Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition
by: Kim, Sungnyun, et al.
Published: (2024)

Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder
by: Dai, Yusheng, et al.
Published: (2023)

XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
by: Han, HyoJung, et al.
Published: (2024)

Learning Semantic Information from Raw Audio Signal Using Both Contextual and Phonetic Representations
by: Kim, Jaeyeon, et al.
Published: (2024)

Robust LLM-based Audio-Visual Speech Recognition with Sparse Modality Alignment and Visual Unit-Guided Refinement
by: Su, Fei, et al.
Published: (2026)

Interpreting the Role of Visemes in Audio-Visual Speech Recognition
by: Papadopoulos, Aristeidis, et al.
Published: (2025)

Removing Speaker Information from Speech Representation using Variable-Length Soft Pooling
by: Hwang, Injune, et al.
Published: (2024)

Audio-Visual Speech Separation via Bottleneck Iterative Network
by: Zhang, Sidong, et al.
Published: (2025)

Uncovering the Visual Contribution in Audio-Visual Speech Recognition
by: Lin, Zhaofeng, et al.
Published: (2024)

MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition
by: Wang, He, et al.
Published: (2024)

Vo-Ve: An Explainable Voice-Vector for Speaker Identity Evaluation
by: Lee, Jaejun, et al.
Published: (2025)

mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition
by: Rouditchenko, Andrew, et al.
Published: (2025)

AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech Recognition
by: Liu, Zehua, et al.
Published: (2024)

Human-Inspired Computing for Robust and Efficient Audio-Visual Speech Recognition
by: Liu, Qianhui, et al.
Published: (2024)

UNet-Based Fusion and Exponential Moving Average Adaptation for Noise-Robust Speaker Recognition
by: Gan, Chong-Xin, et al.
Published: (2026)

Enhancing Dialogue Speech Recognition with Robust Contextual Awareness via Noise Representation Learning
by: Lee, Wonjun, et al.
Published: (2024)

FairASR: Fair Audio Contrastive Learning for Automatic Speech Recognition
by: Kim, Jongsuk, et al.
Published: (2025)

Wavespace: A Highly Explorable Wavetable Generator
by: Lee, Hazounne, et al.
Published: (2024)

Bridging the Modality Gap: Softly Discretizing Audio Representation for LLM-based Automatic Speech Recognition
by: Yang, Mu, et al.
Published: (2025)

Audio-Visual Feature Synchronization for Robust Speech Enhancement in Hearing Aids
by: Saleem, Nasir, et al.
Published: (2025)

MoHAVE: Mixture of Hierarchical Audio-Visual Experts for Robust Speech Recognition
by: Kim, Sungnyun, et al.
Published: (2025)

An Investigation Into Explainable Audio Hate Speech Detection
by: An, Jinmyeong, et al.
Published: (2024)

Music De-limiter Networks via Sample-wise Gain Inversion
by: Jeon, Chang-Bin, et al.
Published: (2023)

Multimodal Representation Loss Between Timed Text and Audio for Regularized Speech Separation
by: Hsieh, Tsun-An, et al.
Published: (2024)

Scalable Frameworks for Real-World Audio-Visual Speech Recognition
by: Kim, Sungnyun
Published: (2025)

Noisy Disentanglement with Tri-stage Training for Noise-Robust Speech Recognition
by: Chen, Shuangyuan, et al.
Published: (2025)

DyPCL: Dynamic Phoneme-level Contrastive Learning for Dysarthric Speech Recognition
by: Lee, Wonjun, et al.
Published: (2025)

Purification Before Fusion: Toward Mask-Free Speech Enhancement for Robust Audio-Visual Speech Recognition
by: Wu, Linzhi, et al.
Published: (2026)

FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching
by: Jung, Chaeyoung, et al.
Published: (2024)

Robust Audio-Visual Speech Enhancement: Correcting Misassignments in Complex Environments with Advanced Post-Processing
by: Ren, Wenze, et al.
Published: (2024)

GRAFX: An Open-Source Library for Audio Processing Graphs in PyTorch
by: Lee, Sungho, et al.
Published: (2024)

Do Captioning Metrics Reflect Music Semantic Alignment?
by: Lee, Jinwoo, et al.
Published: (2024)