:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Tathe, Aniket, Kamble, Anand, Kumbharkar, Suyash, Bhandare, Atharva, Mitra, Anirban C.
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing Artificial Intelligence Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2401.06183
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Transcription and translation of videos using fine-tuned XLSR Wav2Vec2 on custom dataset and mBART
by: Tathe, Aniket, et al.
Published: (2024)

Custom Data Augmentation for low resource ASR using Bark and Retrieval-Based Voice Conversion
by: Kamble, Anand, et al.
Published: (2023)

Wav2Small: Distilling Wav2Vec2 to 72K parameters for Low-Resource Speech emotion recognition
by: Kounadis-Bastian, Dionyssos, et al.
Published: (2024)

SpecWav-Attack: Leveraging Spectrogram Resizing and Wav2Vec 2.0 for Attacking Anonymized Speech
by: Li, Yuqi, et al.
Published: (2025)

ROAR: Reinforcing Original to Augmented Data Ratio Dynamics for Wav2Vec2.0 Based ASR
by: Singh, Vishwanath Pratap, et al.
Published: (2024)

Over-the-air White-box Attack on the Wav2Vec Speech Recognition Neural Network
by: Alexey, Protopopov
Published: (2026)

A Closer Look at Wav2Vec2 Embeddings for On-Device Single-Channel Speech Enhancement
by: Shankar, Ravi, et al.
Published: (2024)

Whisper Turns Stronger: Augmenting Wav2Vec 2.0 for Superior ASR in Low-Resource Languages
by: Anidjar, Or Haim, et al.
Published: (2024)

Exploring ASR-Based Wav2Vec2 for Automated Speech Disorder Assessment: Insights and Analysis
by: Nguyen, Tuan, et al.
Published: (2024)

Quality of Automatic Speech Recognition -- Polish Language case study -- from Wav2Vec to Scribe ElevenLabs
by: Pietroń, Marcin, et al.
Published: (2026)

Improving endpoint detection in end-to-end streaming ASR for conversational speech
by: C, Anandh, et al.
Published: (2025)

Exploring Pathological Speech Quality Assessment with ASR-Powered Wav2Vec2 in Data-Scarce Context
by: Nguyen, Tuan, et al.
Published: (2024)

XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models
by: Kumar, Shashi, et al.
Published: (2024)

Speaker Emotion Recognition: Leveraging Self-Supervised Models for Feature Extraction Using Wav2Vec2 and HuBERT
by: Jafarzadeh, Pourya, et al.
Published: (2024)

Human-like Linguistic Biases in Neural Speech Models: Phonetic Categorization and Phonotactic Constraints in Wav2Vec2.0
by: Kloots, Marianne de Heer, et al.
Published: (2024)

Adaptability of ASR Models on Low-Resource Language: A Comparative Study of Whisper and Wav2Vec-BERT on Bangla
by: Ridoy, Md Sazzadul Islam, et al.
Published: (2025)

Evaluating the Effectiveness of Transformer Layers in Wav2Vec 2.0, XLS-R, and Whisper for Speaker Identification Tasks
by: Stuhlmann, Linus, et al.
Published: (2025)

Wav2Prompt: End-to-End Speech Prompt Generation and Tuning For LLM in Zero and Few-shot Learning
by: Deng, Keqi, et al.
Published: (2024)

End-to-end streaming model for low-latency speech anonymization
by: Quamer, Waris, et al.
Published: (2024)

End-to-end multi-channel speaker extraction and binaural speech synthesis
by: Chi, Cheng, et al.
Published: (2024)

IIITH-BUT system for IWSLT 2025 low-resource Bhojpuri to Hindi speech translation
by: Akkiraju, Bhavana, et al.
Published: (2025)

Seed LiveInterpret 2.0: End-to-end Simultaneous Speech-to-speech Translation with Your Voice
by: Cheng, Shanbo, et al.
Published: (2025)

Prominence-aware automatic speech recognition for conversational speech
by: Linke, Julian, et al.
Published: (2025)

Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications
by: Vecino, Biel Tura, et al.
Published: (2025)

Exploring WavLM Back-ends for Speech Spoofing and Deepfake Detection
by: Stourbe, Theophile, et al.
Published: (2024)

End-to-end transfer learning for speaker-independent cross-language and cross-corpus speech emotion recognition
by: Tang, Duowei, et al.
Published: (2023)

A unified front-end framework for English text-to-speech synthesis
by: Ying, Zelin, et al.
Published: (2023)

XLSR-MamBo: Scaling the Hybrid Mamba-Attention Backbone for Audio Deepfake Detection
by: Ng, Kwok-Ho, et al.
Published: (2026)

XLSR-Kanformer: A KAN-Intergrated model for Synthetic Speech Detection
by: Dat, Phuong Tuan, et al.
Published: (2025)

XLSR-Mamba: A Dual-Column Bidirectional State Space Model for Spoofing Attack Detection
by: Xiao, Yang, et al.
Published: (2024)

Granite-speech: open-source speech-aware LLMs with strong English ASR capabilities
by: Saon, George, et al.
Published: (2025)

Audio is all in one: speech-driven gesture synthetics using WavLM pre-trained model
by: Zhang, Fan, et al.
Published: (2023)

Code-Switching in End-to-End Automatic Speech Recognition: A Systematic Literature Review
by: Agro, Maha Tufail, et al.
Published: (2025)

WavLM model ensemble for audio deepfake detection
by: Combei, David, et al.
Published: (2024)

Automatic Speech Recognition for Hindi
by: Saha, Anish, et al.
Published: (2024)

The CHiME-7 UDASE task: Unsupervised domain adaptation for conversational speech enhancement
by: Leglaive, Simon, et al.
Published: (2023)

WavMark: Watermarking for Audio Generation
by: Chen, Guangyu, et al.
Published: (2023)

WavCraft: Audio Editing and Generation with Large Language Models
by: Liang, Jinhua, et al.
Published: (2024)

SLM-S2ST: A multimodal language model for direct speech-to-speech translation
by: Hu, Yuxuan, et al.
Published: (2025)

WavInWav: Time-domain Speech Hiding via Invertible Neural Network
by: Fan, Wei, et al.
Published: (2025)