Saved in:
| Main Authors: | Ani, Saja Al, Cleland, Joanne, Zoha, Ahmed |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.17482 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Exploring Phonetic Context-Aware Lip-Sync For Talking Face Generation
by: Park, Se Jin, et al.
Published: (2023)
by: Park, Se Jin, et al.
Published: (2023)
GIRAFE: Glottal Imaging Dataset for Advanced Segmentation, Analysis, and Facilitative Playbacks Evaluation
by: Andrade-Miranda, G., et al.
Published: (2024)
by: Andrade-Miranda, G., et al.
Published: (2024)
ICASSP 2024 Speech Signal Improvement Challenge
by: Ristea, Nicolae Catalin, et al.
Published: (2024)
by: Ristea, Nicolae Catalin, et al.
Published: (2024)
SAVE: Segment Audio-Visual Easy way using Segment Anything Model
by: Nguyen, Khanh-Binh, et al.
Published: (2024)
by: Nguyen, Khanh-Binh, et al.
Published: (2024)
Shushing! Let's Imagine an Authentic Speech from the Silent Video
by: Ye, Jiaxin, et al.
Published: (2025)
by: Ye, Jiaxin, et al.
Published: (2025)
Emotional Vietnamese Speech-Based Depression Diagnosis Using Dynamic Attention Mechanism
by: D., Quang-Anh N., et al.
Published: (2024)
by: D., Quang-Anh N., et al.
Published: (2024)
SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer
by: Park, Young-Hu, et al.
Published: (2025)
by: Park, Young-Hu, et al.
Published: (2025)
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
by: Kim, Ji-Hoon, et al.
Published: (2025)
by: Kim, Ji-Hoon, et al.
Published: (2025)
Improving Bird Classification with Primary Color Additives
by: R, Ezhini Rasendiran, et al.
Published: (2025)
by: R, Ezhini Rasendiran, et al.
Published: (2025)
CleanUMamba: A Compact Mamba Network for Speech Denoising using Channel Pruning
by: Groot, Sjoerd, et al.
Published: (2024)
by: Groot, Sjoerd, et al.
Published: (2024)
See the Speaker: Crafting High-Resolution Talking Faces from Speech with Prior Guidance and Region Refinement
by: Wang, Jinting, et al.
Published: (2025)
by: Wang, Jinting, et al.
Published: (2025)
Emotional Face-to-Speech
by: Ye, Jiaxin, et al.
Published: (2025)
by: Ye, Jiaxin, et al.
Published: (2025)
V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow
by: Choi, Jeongsoo, et al.
Published: (2024)
by: Choi, Jeongsoo, et al.
Published: (2024)
Seeing Your Speech Style: A Novel Zero-Shot Identity-Disentanglement Face-based Voice Conversion
by: Rong, Yan, et al.
Published: (2024)
by: Rong, Yan, et al.
Published: (2024)
Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
by: Yang, Qi, et al.
Published: (2023)
by: Yang, Qi, et al.
Published: (2023)
UniCUE: Unified Recognition and Generation Framework for Chinese Cued Speech Video-to-Speech Generation
by: Wang, Jinting, et al.
Published: (2025)
by: Wang, Jinting, et al.
Published: (2025)
Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation
by: Kim, Minsu, et al.
Published: (2024)
by: Kim, Minsu, et al.
Published: (2024)
Enhancing Child Vocalization Classification with Phonetically-Tuned Embeddings for Assisting Autism Diagnosis
by: Li, Jialu, et al.
Published: (2023)
by: Li, Jialu, et al.
Published: (2023)
Learning Semantic Information from Raw Audio Signal Using Both Contextual and Phonetic Representations
by: Kim, Jaeyeon, et al.
Published: (2024)
by: Kim, Jaeyeon, et al.
Published: (2024)
SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic Spaces
by: Vallés-Pérez, Ivan, et al.
Published: (2023)
by: Vallés-Pérez, Ivan, et al.
Published: (2023)
GaussianSpeech: Audio-Driven Gaussian Avatars
by: Aneja, Shivangi, et al.
Published: (2024)
by: Aneja, Shivangi, et al.
Published: (2024)
Enhancing CTC-Based Visual Speech Recognition
by: Laux, Hendrik, et al.
Published: (2024)
by: Laux, Hendrik, et al.
Published: (2024)
Segment Beyond View: Handling Partially Missing Modality for Audio-Visual Semantic Segmentation
by: Wu, Renjie, et al.
Published: (2023)
by: Wu, Renjie, et al.
Published: (2023)
Input Conditioned Layer Dropping in Speech Foundation Models
by: Hannan, Abdul, et al.
Published: (2025)
by: Hannan, Abdul, et al.
Published: (2025)
Seeing Speech and Sound: Distinguishing and Locating Audios in Visual Scenes
by: Ryu, Hyeonggon, et al.
Published: (2025)
by: Ryu, Hyeonggon, et al.
Published: (2025)
Spiking Structured State Space Model for Monaural Speech Enhancement
by: Du, Yu, et al.
Published: (2023)
by: Du, Yu, et al.
Published: (2023)
Unified Cross-modal Translation of Score Images, Symbolic Music, and Performance Audio
by: Jung, Jongmin, et al.
Published: (2025)
by: Jung, Jongmin, et al.
Published: (2025)
MoME: Mixture of Matryoshka Experts for Audio-Visual Speech Recognition
by: Cappellazzo, Umberto, et al.
Published: (2025)
by: Cappellazzo, Umberto, et al.
Published: (2025)
CNVSRC 2024: The Second Chinese Continuous Visual Speech Recognition Challenge
by: Liu, Zehua, et al.
Published: (2025)
by: Liu, Zehua, et al.
Published: (2025)
Segmenting Collision Sound Sources in Egocentric Videos
by: Parida, Kranti Kumar, et al.
Published: (2025)
by: Parida, Kranti Kumar, et al.
Published: (2025)
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer
by: Burchi, Maxime, et al.
Published: (2024)
by: Burchi, Maxime, et al.
Published: (2024)
Face-StyleSpeech: Enhancing Zero-shot Speech Synthesis from Face Images with Improved Face-to-Speech Mapping
by: Kang, Minki, et al.
Published: (2023)
by: Kang, Minki, et al.
Published: (2023)
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition
by: Rouditchenko, Andrew, et al.
Published: (2025)
by: Rouditchenko, Andrew, et al.
Published: (2025)
Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models
by: Cappellazzo, Umberto, et al.
Published: (2025)
by: Cappellazzo, Umberto, et al.
Published: (2025)
Mitigating Attention Sinks and Massive Activations in Audio-Visual Speech Recognition with LLMs
by: Anand, et al.
Published: (2025)
by: Anand, et al.
Published: (2025)
Computation and Parameter Efficient Multi-Modal Fusion Transformer for Cued Speech Recognition
by: Liu, Lei, et al.
Published: (2024)
by: Liu, Lei, et al.
Published: (2024)
Improving Acoustic Scene Classification with City Features
by: Cai, Yiqiang, et al.
Published: (2025)
by: Cai, Yiqiang, et al.
Published: (2025)
End-to-end Audio Deepfake Detection from RAW Waveforms: a RawNet-Based Approach with Cross-Dataset Evaluation
by: Di Pierno, Andrea, et al.
Published: (2025)
by: Di Pierno, Andrea, et al.
Published: (2025)
Audio-Visual Segmentation via Unlabeled Frame Exploitation
by: Liu, Jinxiang, et al.
Published: (2024)
by: Liu, Jinxiang, et al.
Published: (2024)
Bootstrapping Audio-Visual Segmentation by Strengthening Audio Cues
by: Chen, Tianxiang, et al.
Published: (2024)
by: Chen, Tianxiang, et al.
Published: (2024)
Similar Items
-
Exploring Phonetic Context-Aware Lip-Sync For Talking Face Generation
by: Park, Se Jin, et al.
Published: (2023) -
GIRAFE: Glottal Imaging Dataset for Advanced Segmentation, Analysis, and Facilitative Playbacks Evaluation
by: Andrade-Miranda, G., et al.
Published: (2024) -
ICASSP 2024 Speech Signal Improvement Challenge
by: Ristea, Nicolae Catalin, et al.
Published: (2024) -
SAVE: Segment Audio-Visual Easy way using Segment Anything Model
by: Nguyen, Khanh-Binh, et al.
Published: (2024) -
Shushing! Let's Imagine an Authentic Speech from the Silent Video
by: Ye, Jiaxin, et al.
Published: (2025)