:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Khairaldeen, Darvan Shvan, Hassani, Hossein
Format:	Preprint
Published:	2026
Subjects:	Sound Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.20744
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

VocalParse: Towards Unified and Scalable Singing Voice Transcription with Large Audio Language Models
by: Chen, Yukun, et al.
Published: (2026)

Spectral Mapping of Singing Voices: U-Net-Assisted Vocal Segmentation
by: Sorrenti, Adam
Published: (2024)

Where Are You From? Let Me Guess! Subdialect Recognition of Speeches in Sorani Kurdish
by: Isam, Sana, et al.
Published: (2024)

The First Voice Timbre Attribute Detection Challenge
by: Chen, Liping, et al.
Published: (2025)

Physics-Guided Deepfake Detection for Voice Authentication Systems
by: Mohammadi, Alireza, et al.
Published: (2025)

An Agent-Based Framework for Automated Higher-Voice Harmony Generation
by: Ganapathy, Nia D'Souza, et al.
Published: (2025)

VocalAgent: Large Language Models for Vocal Health Diagnostics with Safety-Aware Evaluation
by: Kim, Yubin, et al.
Published: (2025)

Multi-Accent Mandarin Dry-Vocal Singing Dataset: Benchmark for Singing Accent Recognition
by: Wang, Zihao, et al.
Published: (2025)

Generating Separated Singing Vocals Using a Diffusion Model Conditioned on Music Mixtures
by: Plaja-Roglans, Genís, et al.
Published: (2025)

$τ$-Voice: Benchmarking Full-Duplex Voice Agents on Real-World Domains
by: Ray, Soham, et al.
Published: (2026)

i-LAVA: Insights on Low Latency Voice-2-Voice Architecture for Agents
by: Purwar, Anupam, et al.
Published: (2025)

IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities
by: Zhang, Xin, et al.
Published: (2024)

Proactive Detection of Voice Cloning with Localized Watermarking
by: Roman, Robin San, et al.
Published: (2024)

A Real-Time Voice Activity Detection Based On Lightweight Neural
by: Jia, Jidong, et al.
Published: (2024)

Environmental Sound Deepfake Detection Using Deep-Learning Framework
by: Pham, Lam, et al.
Published: (2026)

Audio-to-Image Encoding for Improved Voice Characteristic Detection Using Deep Convolutional Neural Networks
by: Atif, Youness
Published: (2025)

Learning Physiology-Informed Vocal Spectrotemporal Representations for Speech Emotion Recognition
by: Zhang, Xu, et al.
Published: (2026)

Super Kawaii Vocalics: Amplifying the "Cute" Factor in Computer Voice
by: Mandai, Yuto, et al.
Published: (2025)

Tutti: Expressive Multi-Singer Synthesis via Structure-Level Timbre Control and Vocal Texture Modeling
by: Chen, Jiatao, et al.
Published: (2026)

VoiceWukong: Benchmarking Deepfake Voice Detection
by: Yan, Ziwei, et al.
Published: (2024)

Efficient and Fast Generative-Based Singing Voice Separation using a Latent Diffusion Model
by: Plaja-Roglans, Genís, et al.
Published: (2025)

DeepASMR: LLM-Based Zero-Shot ASMR Speech Generation for Anyone of Any Voice
by: Zhang, Leying, et al.
Published: (2026)

SingFake: Singing Voice Deepfake Detection
by: Zang, Yongyi, et al.
Published: (2023)

Deepfake Detection of Singing Voices With Whisper Encodings
by: Sharma, Falguni, et al.
Published: (2025)

Voice Privacy from an Attribute-based Perspective
by: Rahman, Mehtab Ur, et al.
Published: (2026)

Probabilistic Verification of Voice Anti-Spoofing Models
by: Kushnir, Evgeny, et al.
Published: (2026)

AI-Driven Acoustic Voice Biomarker-Based Hierarchical Classification of Benign Laryngeal Voice Disorders from Sustained Vowels
by: Annabestani, Mohsen, et al.
Published: (2025)

VoiceGRPO: Modern MoE Transformers with Group Relative Policy Optimization GRPO for AI Voice Health Care Applications on Voice Pathology Detection
by: Togootogtokh, Enkhtogtokh, et al.
Published: (2025)

Selective Attention System (SAS): Device-Addressed Speech Detection for Real-Time On-Device Voice AI
by: Kim, David Joohun, et al.
Published: (2026)

Spectral Masking and Interpolation Attack (SMIA): A Black-box Adversarial Attack against Voice Authentication and Anti-Spoofing Systems
by: Kamel, Kamel, et al.
Published: (2025)

Self Voice Conversion as an Attack against Neural Audio Watermarking
by: Özer, Yigitcan, et al.
Published: (2026)

NVSpeech: An Integrated and Scalable Pipeline for Human-Like Speech Modeling with Paralinguistic Vocalizations
by: Liao, Huan, et al.
Published: (2025)

VoiceBench: Benchmarking LLM-Based Voice Assistants
by: Chen, Yiming, et al.
Published: (2024)

MOSS-VoiceGenerator: Create Realistic Voices with Natural Language Descriptions
by: Huang, Kexin, et al.
Published: (2026)

The Voice Timbre Attribute Detection 2025 Challenge Evaluation Plan
by: Sheng, Zhengyan, et al.
Published: (2025)

AVEX: What Matters for Animal Vocalization Encoding
by: Miron, Marius, et al.
Published: (2025)

Learning Marmoset Vocal Patterns with a Masked Autoencoder for Robust Call Segmentation, Classification, and Caller Identification
by: Wu, Bin, et al.
Published: (2024)

Neural Multi-Speaker Voice Cloning for Nepali in Low-Resource Settings
by: Shrestha, Aayush M., et al.
Published: (2026)

DAST: A Dual-Stream Voice Anonymization Attacker with Staged Training
by: Arefeen, Ridwan, et al.
Published: (2026)

StyleStream: Real-Time Zero-Shot Voice Style Conversion
by: Liu, Yisi, et al.
Published: (2026)