Saved in:
| Main Authors: | Ferrari, Alessio, Huichapa, Thaide, Spoletini, Paola, Novielli, Nicole, Fucci, Davide, Girardi, Daniela |
|---|---|
| Format: | Preprint |
| Published: |
2021
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2104.02410 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Sink or SWIM: Tackling Real-Time ASR at Scale
by: Bruzzone, Federico, et al.
Published: (2026)
by: Bruzzone, Federico, et al.
Published: (2026)
TorchFX: A modern approach to Audio DSP with PyTorch and GPU acceleration
by: Spanio, Matteo, et al.
Published: (2025)
by: Spanio, Matteo, et al.
Published: (2025)
Assessing the Understandability and Acceptance of Attack-Defense Trees for Modelling Security Requirements
by: Broccia, Giovanna, et al.
Published: (2024)
by: Broccia, Giovanna, et al.
Published: (2024)
A Voice-based Triage for Type 2 Diabetes using a Conversational Virtual Assistant in the Home Environment
by: Summoogum, Kelvin, et al.
Published: (2024)
by: Summoogum, Kelvin, et al.
Published: (2024)
Enhancing Audio Generation Diversity with Visual Information
by: Xie, Zeyu, et al.
Published: (2024)
by: Xie, Zeyu, et al.
Published: (2024)
Knowledge Distillation for Real-Time Classification of Early Media in Voice Communications
by: Altwlkany, Kemal, et al.
Published: (2024)
by: Altwlkany, Kemal, et al.
Published: (2024)
Neutone SDK: An Open Source Framework for Neural Audio Processing
by: Mitcheltree, Christopher, et al.
Published: (2025)
by: Mitcheltree, Christopher, et al.
Published: (2025)
MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion
by: Li, Pengcheng, et al.
Published: (2024)
by: Li, Pengcheng, et al.
Published: (2024)
Experimental Study: Enhancing Voice Spoofing Detection Models with wav2vec 2.0
by: Kang, Taein, et al.
Published: (2024)
by: Kang, Taein, et al.
Published: (2024)
Ultraspherical/Gegenbauer polynomials to unify 2D/3D Ambisonic directivity designs
by: Zotter, Franz
Published: (2024)
by: Zotter, Franz
Published: (2024)
SpeechAccentLLM: A Unified Framework for Foreign Accent Conversion and Text to Speech
by: Cheng, Zhuangfei, et al.
Published: (2025)
by: Cheng, Zhuangfei, et al.
Published: (2025)
Joint Feature and Output Distillation for Low-complexity Acoustic Scene Classification
by: Li, Haowen, et al.
Published: (2025)
by: Li, Haowen, et al.
Published: (2025)
Quantifying the effect of speech pathology on automatic and human speaker verification
by: Halpern, Bence Mark, et al.
Published: (2024)
by: Halpern, Bence Mark, et al.
Published: (2024)
Model Generation with LLMs: From Requirements to UML Sequence Diagrams
by: Ferrari, Alessio, et al.
Published: (2024)
by: Ferrari, Alessio, et al.
Published: (2024)
Self-supervised Pretraining for Robust Personalized Voice Activity Detection in Adverse Conditions
by: Bovbjerg, Holger Severin, et al.
Published: (2023)
by: Bovbjerg, Holger Severin, et al.
Published: (2023)
Noise-Robust Target-Speaker Voice Activity Detection Through Self-Supervised Pretraining
by: Bovbjerg, Holger Severin, et al.
Published: (2025)
by: Bovbjerg, Holger Severin, et al.
Published: (2025)
STAR: Speech-to-Audio Generation via Representation Learning
by: Xie, Zeyu, et al.
Published: (2025)
by: Xie, Zeyu, et al.
Published: (2025)
FakeSound2: A Benchmark for Explainable and Generalizable Deepfake Sound Detection
by: Xie, Zeyu, et al.
Published: (2025)
by: Xie, Zeyu, et al.
Published: (2025)
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation
by: Xie, Zeyu, et al.
Published: (2024)
by: Xie, Zeyu, et al.
Published: (2024)
Non-Invasive Suicide Risk Prediction Through Speech Analysis
by: Amiriparian, Shahin, et al.
Published: (2024)
by: Amiriparian, Shahin, et al.
Published: (2024)
AudioTime: A Temporally-aligned Audio-text Benchmark Dataset
by: Xie, Zeyu, et al.
Published: (2024)
by: Xie, Zeyu, et al.
Published: (2024)
Investigation into respiratory sound classification for an imbalanced data set using hybrid LSTM-KAN architectures
by: K. V, Nithinkumar, et al.
Published: (2026)
by: K. V, Nithinkumar, et al.
Published: (2026)
Sound Safeguarding for Acoustic Measurement Using Any Sounds: Tools and Applications
by: Kawahara, Hideki, et al.
Published: (2025)
by: Kawahara, Hideki, et al.
Published: (2025)
Whisper Speaker Identification: Leveraging Pre-Trained Multilingual Transformers for Robust Speaker Embeddings
by: Emon, Jakaria Islam, et al.
Published: (2025)
by: Emon, Jakaria Islam, et al.
Published: (2025)
CAST-TTS: A Simple Cross-Attention Framework for Unified Timbre Control in TTS
by: Zheng, Zihao, et al.
Published: (2026)
by: Zheng, Zihao, et al.
Published: (2026)
PicoAudio2: Temporal Controllable Text-to-Audio Generation with Natural Language Description
by: Zheng, Zihao, et al.
Published: (2025)
by: Zheng, Zihao, et al.
Published: (2025)
FakeSound: Deepfake General Audio Detection
by: Xie, Zeyu, et al.
Published: (2024)
by: Xie, Zeyu, et al.
Published: (2024)
MENASpeechBank: A Reference Voice Bank with Persona-Conditioned Multi-Turn Conversations for AudioLLMs
by: Ali, Zien Sheikh, et al.
Published: (2026)
by: Ali, Zien Sheikh, et al.
Published: (2026)
Are Music Foundation Models Better at Singing Voice Deepfake Detection? Far-Better Fuse them with Speech Foundation Models
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
Large Language Models (LLMs) for Requirements Engineering (RE): A Systematic Literature Review
by: Zadenoori, Mohammad Amin, et al.
Published: (2025)
by: Zadenoori, Mohammad Amin, et al.
Published: (2025)
A Unified Model For Voice and Accent Conversion In Speech and Singing using Self-Supervised Learning and Feature Extraction
by: Cheripally, Sowmya
Published: (2024)
by: Cheripally, Sowmya
Published: (2024)
EchoVoices: Preserving Generational Voices and Memories for Seniors and Children
by: Xu, Haiying, et al.
Published: (2025)
by: Xu, Haiying, et al.
Published: (2025)
Beyond Deep Learning: Speech Segmentation and Phone Classification with Neural Assemblies
by: Adelson, Trevor, et al.
Published: (2026)
by: Adelson, Trevor, et al.
Published: (2026)
Navigating the United States Legislative Landscape on Voice Privacy: Existing Laws, Proposed Bills, Protection for Children, and Synthetic Data for AI
by: Dutta, Satwik, et al.
Published: (2024)
by: Dutta, Satwik, et al.
Published: (2024)
TuneGenie: Reasoning-based LLM agents for preferential music generation
by: Pandey, Amitesh, et al.
Published: (2025)
by: Pandey, Amitesh, et al.
Published: (2025)
SW-ASR: A Context-Aware Hybrid ASR Pipeline for Robust Single Word Speech Recognition
by: Sharma, Manali, et al.
Published: (2026)
by: Sharma, Manali, et al.
Published: (2026)
Investigating Prosodic Signatures via Speech Pre-Trained Models for Audio Deepfake Source Attribution
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
Strong Alone, Stronger Together: Synergizing Modality-Binding Foundation Models with Optimal Transport for Non-Verbal Emotion Recognition
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
MuTox: Universal MUltilingual Audio-based TOXicity Dataset and Zero-shot Detector
by: Costa-jussà, Marta R., et al.
Published: (2024)
by: Costa-jussà, Marta R., et al.
Published: (2024)
Multi-View Multi-Task Modeling with Speech Foundation Models for Speech Forensic Tasks
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
Similar Items
-
Sink or SWIM: Tackling Real-Time ASR at Scale
by: Bruzzone, Federico, et al.
Published: (2026) -
TorchFX: A modern approach to Audio DSP with PyTorch and GPU acceleration
by: Spanio, Matteo, et al.
Published: (2025) -
Assessing the Understandability and Acceptance of Attack-Defense Trees for Modelling Security Requirements
by: Broccia, Giovanna, et al.
Published: (2024) -
A Voice-based Triage for Type 2 Diabetes using a Conversational Virtual Assistant in the Home Environment
by: Summoogum, Kelvin, et al.
Published: (2024) -
Enhancing Audio Generation Diversity with Visual Information
by: Xie, Zeyu, et al.
Published: (2024)