:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ferrari, Alessio, Huichapa, Thaide, Spoletini, Paola, Novielli, Nicole, Fucci, Davide, Girardi, Daniela
Format:	Preprint
Published:	2021
Subjects:	Software Engineering Machine Learning Sound Audio and Speech Processing 68N30 D.2.1; D.2.2
Online Access:	https://arxiv.org/abs/2104.02410
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Sink or SWIM: Tackling Real-Time ASR at Scale
by: Bruzzone, Federico, et al.
Published: (2026)

TorchFX: A modern approach to Audio DSP with PyTorch and GPU acceleration
by: Spanio, Matteo, et al.
Published: (2025)

Assessing the Understandability and Acceptance of Attack-Defense Trees for Modelling Security Requirements
by: Broccia, Giovanna, et al.
Published: (2024)

A Voice-based Triage for Type 2 Diabetes using a Conversational Virtual Assistant in the Home Environment
by: Summoogum, Kelvin, et al.
Published: (2024)

Enhancing Audio Generation Diversity with Visual Information
by: Xie, Zeyu, et al.
Published: (2024)

Knowledge Distillation for Real-Time Classification of Early Media in Voice Communications
by: Altwlkany, Kemal, et al.
Published: (2024)

Neutone SDK: An Open Source Framework for Neural Audio Processing
by: Mitcheltree, Christopher, et al.
Published: (2025)

MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion
by: Li, Pengcheng, et al.
Published: (2024)

Experimental Study: Enhancing Voice Spoofing Detection Models with wav2vec 2.0
by: Kang, Taein, et al.
Published: (2024)

Ultraspherical/Gegenbauer polynomials to unify 2D/3D Ambisonic directivity designs
by: Zotter, Franz
Published: (2024)

SpeechAccentLLM: A Unified Framework for Foreign Accent Conversion and Text to Speech
by: Cheng, Zhuangfei, et al.
Published: (2025)

Joint Feature and Output Distillation for Low-complexity Acoustic Scene Classification
by: Li, Haowen, et al.
Published: (2025)

Quantifying the effect of speech pathology on automatic and human speaker verification
by: Halpern, Bence Mark, et al.
Published: (2024)

Model Generation with LLMs: From Requirements to UML Sequence Diagrams
by: Ferrari, Alessio, et al.
Published: (2024)

Self-supervised Pretraining for Robust Personalized Voice Activity Detection in Adverse Conditions
by: Bovbjerg, Holger Severin, et al.
Published: (2023)

Noise-Robust Target-Speaker Voice Activity Detection Through Self-Supervised Pretraining
by: Bovbjerg, Holger Severin, et al.
Published: (2025)

STAR: Speech-to-Audio Generation via Representation Learning
by: Xie, Zeyu, et al.
Published: (2025)

FakeSound2: A Benchmark for Explainable and Generalizable Deepfake Sound Detection
by: Xie, Zeyu, et al.
Published: (2025)

PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation
by: Xie, Zeyu, et al.
Published: (2024)

Non-Invasive Suicide Risk Prediction Through Speech Analysis
by: Amiriparian, Shahin, et al.
Published: (2024)

AudioTime: A Temporally-aligned Audio-text Benchmark Dataset
by: Xie, Zeyu, et al.
Published: (2024)

Investigation into respiratory sound classification for an imbalanced data set using hybrid LSTM-KAN architectures
by: K. V, Nithinkumar, et al.
Published: (2026)

Sound Safeguarding for Acoustic Measurement Using Any Sounds: Tools and Applications
by: Kawahara, Hideki, et al.
Published: (2025)

Whisper Speaker Identification: Leveraging Pre-Trained Multilingual Transformers for Robust Speaker Embeddings
by: Emon, Jakaria Islam, et al.
Published: (2025)

CAST-TTS: A Simple Cross-Attention Framework for Unified Timbre Control in TTS
by: Zheng, Zihao, et al.
Published: (2026)

PicoAudio2: Temporal Controllable Text-to-Audio Generation with Natural Language Description
by: Zheng, Zihao, et al.
Published: (2025)

FakeSound: Deepfake General Audio Detection
by: Xie, Zeyu, et al.
Published: (2024)

MENASpeechBank: A Reference Voice Bank with Persona-Conditioned Multi-Turn Conversations for AudioLLMs
by: Ali, Zien Sheikh, et al.
Published: (2026)

Are Music Foundation Models Better at Singing Voice Deepfake Detection? Far-Better Fuse them with Speech Foundation Models
by: Phukan, Orchid Chetia, et al.
Published: (2024)

Large Language Models (LLMs) for Requirements Engineering (RE): A Systematic Literature Review
by: Zadenoori, Mohammad Amin, et al.
Published: (2025)

A Unified Model For Voice and Accent Conversion In Speech and Singing using Self-Supervised Learning and Feature Extraction
by: Cheripally, Sowmya
Published: (2024)

EchoVoices: Preserving Generational Voices and Memories for Seniors and Children
by: Xu, Haiying, et al.
Published: (2025)

Beyond Deep Learning: Speech Segmentation and Phone Classification with Neural Assemblies
by: Adelson, Trevor, et al.
Published: (2026)

Navigating the United States Legislative Landscape on Voice Privacy: Existing Laws, Proposed Bills, Protection for Children, and Synthetic Data for AI
by: Dutta, Satwik, et al.
Published: (2024)

TuneGenie: Reasoning-based LLM agents for preferential music generation
by: Pandey, Amitesh, et al.
Published: (2025)

SW-ASR: A Context-Aware Hybrid ASR Pipeline for Robust Single Word Speech Recognition
by: Sharma, Manali, et al.
Published: (2026)

Investigating Prosodic Signatures via Speech Pre-Trained Models for Audio Deepfake Source Attribution
by: Phukan, Orchid Chetia, et al.
Published: (2024)

Strong Alone, Stronger Together: Synergizing Modality-Binding Foundation Models with Optimal Transport for Non-Verbal Emotion Recognition
by: Phukan, Orchid Chetia, et al.
Published: (2024)

MuTox: Universal MUltilingual Audio-based TOXicity Dataset and Zero-shot Detector
by: Costa-jussà, Marta R., et al.
Published: (2024)

Multi-View Multi-Task Modeling with Speech Foundation Models for Speech Forensic Tasks
by: Phukan, Orchid Chetia, et al.
Published: (2024)