:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ani, Saja Al, Cleland, Joanne, Zoha, Ahmed
Format:	Preprint
Published:	2024
Subjects:	Sound Artificial Intelligence Computer Vision and Pattern Recognition Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2402.17482
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Exploring Phonetic Context-Aware Lip-Sync For Talking Face Generation
by: Park, Se Jin, et al.
Published: (2023)

GIRAFE: Glottal Imaging Dataset for Advanced Segmentation, Analysis, and Facilitative Playbacks Evaluation
by: Andrade-Miranda, G., et al.
Published: (2024)

ICASSP 2024 Speech Signal Improvement Challenge
by: Ristea, Nicolae Catalin, et al.
Published: (2024)

SAVE: Segment Audio-Visual Easy way using Segment Anything Model
by: Nguyen, Khanh-Binh, et al.
Published: (2024)

Shushing! Let's Imagine an Authentic Speech from the Silent Video
by: Ye, Jiaxin, et al.
Published: (2025)

Emotional Vietnamese Speech-Based Depression Diagnosis Using Dynamic Attention Mechanism
by: D., Quang-Anh N., et al.
Published: (2024)

SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer
by: Park, Young-Hu, et al.
Published: (2025)

From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
by: Kim, Ji-Hoon, et al.
Published: (2025)

Improving Bird Classification with Primary Color Additives
by: R, Ezhini Rasendiran, et al.
Published: (2025)

CleanUMamba: A Compact Mamba Network for Speech Denoising using Channel Pruning
by: Groot, Sjoerd, et al.
Published: (2024)

See the Speaker: Crafting High-Resolution Talking Faces from Speech with Prior Guidance and Region Refinement
by: Wang, Jinting, et al.
Published: (2025)

Emotional Face-to-Speech
by: Ye, Jiaxin, et al.
Published: (2025)

V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow
by: Choi, Jeongsoo, et al.
Published: (2024)

Seeing Your Speech Style: A Novel Zero-Shot Identity-Disentanglement Face-based Voice Conversion
by: Rong, Yan, et al.
Published: (2024)

Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
by: Yang, Qi, et al.
Published: (2023)

UniCUE: Unified Recognition and Generation Framework for Chinese Cued Speech Video-to-Speech Generation
by: Wang, Jinting, et al.
Published: (2025)

Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation
by: Kim, Minsu, et al.
Published: (2024)

Enhancing Child Vocalization Classification with Phonetically-Tuned Embeddings for Assisting Autism Diagnosis
by: Li, Jialu, et al.
Published: (2023)

Learning Semantic Information from Raw Audio Signal Using Both Contextual and Phonetic Representations
by: Kim, Jaeyeon, et al.
Published: (2024)

SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic Spaces
by: Vallés-Pérez, Ivan, et al.
Published: (2023)

GaussianSpeech: Audio-Driven Gaussian Avatars
by: Aneja, Shivangi, et al.
Published: (2024)

Enhancing CTC-Based Visual Speech Recognition
by: Laux, Hendrik, et al.
Published: (2024)

Segment Beyond View: Handling Partially Missing Modality for Audio-Visual Semantic Segmentation
by: Wu, Renjie, et al.
Published: (2023)

Input Conditioned Layer Dropping in Speech Foundation Models
by: Hannan, Abdul, et al.
Published: (2025)

Seeing Speech and Sound: Distinguishing and Locating Audios in Visual Scenes
by: Ryu, Hyeonggon, et al.
Published: (2025)

Spiking Structured State Space Model for Monaural Speech Enhancement
by: Du, Yu, et al.
Published: (2023)

Unified Cross-modal Translation of Score Images, Symbolic Music, and Performance Audio
by: Jung, Jongmin, et al.
Published: (2025)

MoME: Mixture of Matryoshka Experts for Audio-Visual Speech Recognition
by: Cappellazzo, Umberto, et al.
Published: (2025)

CNVSRC 2024: The Second Chinese Continuous Visual Speech Recognition Challenge
by: Liu, Zehua, et al.
Published: (2025)

Segmenting Collision Sound Sources in Egocentric Videos
by: Parida, Kranti Kumar, et al.
Published: (2025)

Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer
by: Burchi, Maxime, et al.
Published: (2024)

Face-StyleSpeech: Enhancing Zero-shot Speech Synthesis from Face Images with Improved Face-to-Speech Mapping
by: Kang, Minki, et al.
Published: (2023)

mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition
by: Rouditchenko, Andrew, et al.
Published: (2025)

Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models
by: Cappellazzo, Umberto, et al.
Published: (2025)

Mitigating Attention Sinks and Massive Activations in Audio-Visual Speech Recognition with LLMs
by: Anand, et al.
Published: (2025)

Computation and Parameter Efficient Multi-Modal Fusion Transformer for Cued Speech Recognition
by: Liu, Lei, et al.
Published: (2024)

Improving Acoustic Scene Classification with City Features
by: Cai, Yiqiang, et al.
Published: (2025)

End-to-end Audio Deepfake Detection from RAW Waveforms: a RawNet-Based Approach with Cross-Dataset Evaluation
by: Di Pierno, Andrea, et al.
Published: (2025)

Audio-Visual Segmentation via Unlabeled Frame Exploitation
by: Liu, Jinxiang, et al.
Published: (2024)

Bootstrapping Audio-Visual Segmentation by Strengthening Audio Cues
by: Chen, Tianxiang, et al.
Published: (2024)