Saved in:
| Main Authors: | Tathe, Aniket, Kamble, Anand, Kumbharkar, Suyash, Bhandare, Atharva, Mitra, Anirban C. |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2401.06183 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Transcription and translation of videos using fine-tuned XLSR Wav2Vec2 on custom dataset and mBART
by: Tathe, Aniket, et al.
Published: (2024)
by: Tathe, Aniket, et al.
Published: (2024)
Custom Data Augmentation for low resource ASR using Bark and Retrieval-Based Voice Conversion
by: Kamble, Anand, et al.
Published: (2023)
by: Kamble, Anand, et al.
Published: (2023)
Wav2Small: Distilling Wav2Vec2 to 72K parameters for Low-Resource Speech emotion recognition
by: Kounadis-Bastian, Dionyssos, et al.
Published: (2024)
by: Kounadis-Bastian, Dionyssos, et al.
Published: (2024)
SpecWav-Attack: Leveraging Spectrogram Resizing and Wav2Vec 2.0 for Attacking Anonymized Speech
by: Li, Yuqi, et al.
Published: (2025)
by: Li, Yuqi, et al.
Published: (2025)
ROAR: Reinforcing Original to Augmented Data Ratio Dynamics for Wav2Vec2.0 Based ASR
by: Singh, Vishwanath Pratap, et al.
Published: (2024)
by: Singh, Vishwanath Pratap, et al.
Published: (2024)
Over-the-air White-box Attack on the Wav2Vec Speech Recognition Neural Network
by: Alexey, Protopopov
Published: (2026)
by: Alexey, Protopopov
Published: (2026)
A Closer Look at Wav2Vec2 Embeddings for On-Device Single-Channel Speech Enhancement
by: Shankar, Ravi, et al.
Published: (2024)
by: Shankar, Ravi, et al.
Published: (2024)
Whisper Turns Stronger: Augmenting Wav2Vec 2.0 for Superior ASR in Low-Resource Languages
by: Anidjar, Or Haim, et al.
Published: (2024)
by: Anidjar, Or Haim, et al.
Published: (2024)
Exploring ASR-Based Wav2Vec2 for Automated Speech Disorder Assessment: Insights and Analysis
by: Nguyen, Tuan, et al.
Published: (2024)
by: Nguyen, Tuan, et al.
Published: (2024)
Quality of Automatic Speech Recognition -- Polish Language case study -- from Wav2Vec to Scribe ElevenLabs
by: Pietroń, Marcin, et al.
Published: (2026)
by: Pietroń, Marcin, et al.
Published: (2026)
Improving endpoint detection in end-to-end streaming ASR for conversational speech
by: C, Anandh, et al.
Published: (2025)
by: C, Anandh, et al.
Published: (2025)
Exploring Pathological Speech Quality Assessment with ASR-Powered Wav2Vec2 in Data-Scarce Context
by: Nguyen, Tuan, et al.
Published: (2024)
by: Nguyen, Tuan, et al.
Published: (2024)
XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models
by: Kumar, Shashi, et al.
Published: (2024)
by: Kumar, Shashi, et al.
Published: (2024)
Speaker Emotion Recognition: Leveraging Self-Supervised Models for Feature Extraction Using Wav2Vec2 and HuBERT
by: Jafarzadeh, Pourya, et al.
Published: (2024)
by: Jafarzadeh, Pourya, et al.
Published: (2024)
Human-like Linguistic Biases in Neural Speech Models: Phonetic Categorization and Phonotactic Constraints in Wav2Vec2.0
by: Kloots, Marianne de Heer, et al.
Published: (2024)
by: Kloots, Marianne de Heer, et al.
Published: (2024)
Adaptability of ASR Models on Low-Resource Language: A Comparative Study of Whisper and Wav2Vec-BERT on Bangla
by: Ridoy, Md Sazzadul Islam, et al.
Published: (2025)
by: Ridoy, Md Sazzadul Islam, et al.
Published: (2025)
Evaluating the Effectiveness of Transformer Layers in Wav2Vec 2.0, XLS-R, and Whisper for Speaker Identification Tasks
by: Stuhlmann, Linus, et al.
Published: (2025)
by: Stuhlmann, Linus, et al.
Published: (2025)
Wav2Prompt: End-to-End Speech Prompt Generation and Tuning For LLM in Zero and Few-shot Learning
by: Deng, Keqi, et al.
Published: (2024)
by: Deng, Keqi, et al.
Published: (2024)
End-to-end streaming model for low-latency speech anonymization
by: Quamer, Waris, et al.
Published: (2024)
by: Quamer, Waris, et al.
Published: (2024)
End-to-end multi-channel speaker extraction and binaural speech synthesis
by: Chi, Cheng, et al.
Published: (2024)
by: Chi, Cheng, et al.
Published: (2024)
IIITH-BUT system for IWSLT 2025 low-resource Bhojpuri to Hindi speech translation
by: Akkiraju, Bhavana, et al.
Published: (2025)
by: Akkiraju, Bhavana, et al.
Published: (2025)
Seed LiveInterpret 2.0: End-to-end Simultaneous Speech-to-speech Translation with Your Voice
by: Cheng, Shanbo, et al.
Published: (2025)
by: Cheng, Shanbo, et al.
Published: (2025)
Prominence-aware automatic speech recognition for conversational speech
by: Linke, Julian, et al.
Published: (2025)
by: Linke, Julian, et al.
Published: (2025)
Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications
by: Vecino, Biel Tura, et al.
Published: (2025)
by: Vecino, Biel Tura, et al.
Published: (2025)
Exploring WavLM Back-ends for Speech Spoofing and Deepfake Detection
by: Stourbe, Theophile, et al.
Published: (2024)
by: Stourbe, Theophile, et al.
Published: (2024)
End-to-end transfer learning for speaker-independent cross-language and cross-corpus speech emotion recognition
by: Tang, Duowei, et al.
Published: (2023)
by: Tang, Duowei, et al.
Published: (2023)
A unified front-end framework for English text-to-speech synthesis
by: Ying, Zelin, et al.
Published: (2023)
by: Ying, Zelin, et al.
Published: (2023)
XLSR-MamBo: Scaling the Hybrid Mamba-Attention Backbone for Audio Deepfake Detection
by: Ng, Kwok-Ho, et al.
Published: (2026)
by: Ng, Kwok-Ho, et al.
Published: (2026)
XLSR-Kanformer: A KAN-Intergrated model for Synthetic Speech Detection
by: Dat, Phuong Tuan, et al.
Published: (2025)
by: Dat, Phuong Tuan, et al.
Published: (2025)
XLSR-Mamba: A Dual-Column Bidirectional State Space Model for Spoofing Attack Detection
by: Xiao, Yang, et al.
Published: (2024)
by: Xiao, Yang, et al.
Published: (2024)
Granite-speech: open-source speech-aware LLMs with strong English ASR capabilities
by: Saon, George, et al.
Published: (2025)
by: Saon, George, et al.
Published: (2025)
Audio is all in one: speech-driven gesture synthetics using WavLM pre-trained model
by: Zhang, Fan, et al.
Published: (2023)
by: Zhang, Fan, et al.
Published: (2023)
Code-Switching in End-to-End Automatic Speech Recognition: A Systematic Literature Review
by: Agro, Maha Tufail, et al.
Published: (2025)
by: Agro, Maha Tufail, et al.
Published: (2025)
WavLM model ensemble for audio deepfake detection
by: Combei, David, et al.
Published: (2024)
by: Combei, David, et al.
Published: (2024)
Automatic Speech Recognition for Hindi
by: Saha, Anish, et al.
Published: (2024)
by: Saha, Anish, et al.
Published: (2024)
The CHiME-7 UDASE task: Unsupervised domain adaptation for conversational speech enhancement
by: Leglaive, Simon, et al.
Published: (2023)
by: Leglaive, Simon, et al.
Published: (2023)
WavMark: Watermarking for Audio Generation
by: Chen, Guangyu, et al.
Published: (2023)
by: Chen, Guangyu, et al.
Published: (2023)
WavCraft: Audio Editing and Generation with Large Language Models
by: Liang, Jinhua, et al.
Published: (2024)
by: Liang, Jinhua, et al.
Published: (2024)
SLM-S2ST: A multimodal language model for direct speech-to-speech translation
by: Hu, Yuxuan, et al.
Published: (2025)
by: Hu, Yuxuan, et al.
Published: (2025)
WavInWav: Time-domain Speech Hiding via Invertible Neural Network
by: Fan, Wei, et al.
Published: (2025)
by: Fan, Wei, et al.
Published: (2025)
Similar Items
-
Transcription and translation of videos using fine-tuned XLSR Wav2Vec2 on custom dataset and mBART
by: Tathe, Aniket, et al.
Published: (2024) -
Custom Data Augmentation for low resource ASR using Bark and Retrieval-Based Voice Conversion
by: Kamble, Anand, et al.
Published: (2023) -
Wav2Small: Distilling Wav2Vec2 to 72K parameters for Low-Resource Speech emotion recognition
by: Kounadis-Bastian, Dionyssos, et al.
Published: (2024) -
SpecWav-Attack: Leveraging Spectrogram Resizing and Wav2Vec 2.0 for Attacking Anonymized Speech
by: Li, Yuqi, et al.
Published: (2025) -
ROAR: Reinforcing Original to Augmented Data Ratio Dynamics for Wav2Vec2.0 Based ASR
by: Singh, Vishwanath Pratap, et al.
Published: (2024)