Saved in:
| Main Authors: | Khairaldeen, Darvan Shvan, Hassani, Hossein |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.20744 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
VocalParse: Towards Unified and Scalable Singing Voice Transcription with Large Audio Language Models
by: Chen, Yukun, et al.
Published: (2026)
by: Chen, Yukun, et al.
Published: (2026)
Spectral Mapping of Singing Voices: U-Net-Assisted Vocal Segmentation
by: Sorrenti, Adam
Published: (2024)
by: Sorrenti, Adam
Published: (2024)
Where Are You From? Let Me Guess! Subdialect Recognition of Speeches in Sorani Kurdish
by: Isam, Sana, et al.
Published: (2024)
by: Isam, Sana, et al.
Published: (2024)
The First Voice Timbre Attribute Detection Challenge
by: Chen, Liping, et al.
Published: (2025)
by: Chen, Liping, et al.
Published: (2025)
Physics-Guided Deepfake Detection for Voice Authentication Systems
by: Mohammadi, Alireza, et al.
Published: (2025)
by: Mohammadi, Alireza, et al.
Published: (2025)
An Agent-Based Framework for Automated Higher-Voice Harmony Generation
by: Ganapathy, Nia D'Souza, et al.
Published: (2025)
by: Ganapathy, Nia D'Souza, et al.
Published: (2025)
VocalAgent: Large Language Models for Vocal Health Diagnostics with Safety-Aware Evaluation
by: Kim, Yubin, et al.
Published: (2025)
by: Kim, Yubin, et al.
Published: (2025)
Multi-Accent Mandarin Dry-Vocal Singing Dataset: Benchmark for Singing Accent Recognition
by: Wang, Zihao, et al.
Published: (2025)
by: Wang, Zihao, et al.
Published: (2025)
Generating Separated Singing Vocals Using a Diffusion Model Conditioned on Music Mixtures
by: Plaja-Roglans, Genís, et al.
Published: (2025)
by: Plaja-Roglans, Genís, et al.
Published: (2025)
$τ$-Voice: Benchmarking Full-Duplex Voice Agents on Real-World Domains
by: Ray, Soham, et al.
Published: (2026)
by: Ray, Soham, et al.
Published: (2026)
i-LAVA: Insights on Low Latency Voice-2-Voice Architecture for Agents
by: Purwar, Anupam, et al.
Published: (2025)
by: Purwar, Anupam, et al.
Published: (2025)
IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities
by: Zhang, Xin, et al.
Published: (2024)
by: Zhang, Xin, et al.
Published: (2024)
Proactive Detection of Voice Cloning with Localized Watermarking
by: Roman, Robin San, et al.
Published: (2024)
by: Roman, Robin San, et al.
Published: (2024)
A Real-Time Voice Activity Detection Based On Lightweight Neural
by: Jia, Jidong, et al.
Published: (2024)
by: Jia, Jidong, et al.
Published: (2024)
Environmental Sound Deepfake Detection Using Deep-Learning Framework
by: Pham, Lam, et al.
Published: (2026)
by: Pham, Lam, et al.
Published: (2026)
Audio-to-Image Encoding for Improved Voice Characteristic Detection Using Deep Convolutional Neural Networks
by: Atif, Youness
Published: (2025)
by: Atif, Youness
Published: (2025)
Learning Physiology-Informed Vocal Spectrotemporal Representations for Speech Emotion Recognition
by: Zhang, Xu, et al.
Published: (2026)
by: Zhang, Xu, et al.
Published: (2026)
Super Kawaii Vocalics: Amplifying the "Cute" Factor in Computer Voice
by: Mandai, Yuto, et al.
Published: (2025)
by: Mandai, Yuto, et al.
Published: (2025)
Tutti: Expressive Multi-Singer Synthesis via Structure-Level Timbre Control and Vocal Texture Modeling
by: Chen, Jiatao, et al.
Published: (2026)
by: Chen, Jiatao, et al.
Published: (2026)
VoiceWukong: Benchmarking Deepfake Voice Detection
by: Yan, Ziwei, et al.
Published: (2024)
by: Yan, Ziwei, et al.
Published: (2024)
Efficient and Fast Generative-Based Singing Voice Separation using a Latent Diffusion Model
by: Plaja-Roglans, Genís, et al.
Published: (2025)
by: Plaja-Roglans, Genís, et al.
Published: (2025)
DeepASMR: LLM-Based Zero-Shot ASMR Speech Generation for Anyone of Any Voice
by: Zhang, Leying, et al.
Published: (2026)
by: Zhang, Leying, et al.
Published: (2026)
SingFake: Singing Voice Deepfake Detection
by: Zang, Yongyi, et al.
Published: (2023)
by: Zang, Yongyi, et al.
Published: (2023)
Deepfake Detection of Singing Voices With Whisper Encodings
by: Sharma, Falguni, et al.
Published: (2025)
by: Sharma, Falguni, et al.
Published: (2025)
Voice Privacy from an Attribute-based Perspective
by: Rahman, Mehtab Ur, et al.
Published: (2026)
by: Rahman, Mehtab Ur, et al.
Published: (2026)
Probabilistic Verification of Voice Anti-Spoofing Models
by: Kushnir, Evgeny, et al.
Published: (2026)
by: Kushnir, Evgeny, et al.
Published: (2026)
AI-Driven Acoustic Voice Biomarker-Based Hierarchical Classification of Benign Laryngeal Voice Disorders from Sustained Vowels
by: Annabestani, Mohsen, et al.
Published: (2025)
by: Annabestani, Mohsen, et al.
Published: (2025)
VoiceGRPO: Modern MoE Transformers with Group Relative Policy Optimization GRPO for AI Voice Health Care Applications on Voice Pathology Detection
by: Togootogtokh, Enkhtogtokh, et al.
Published: (2025)
by: Togootogtokh, Enkhtogtokh, et al.
Published: (2025)
Selective Attention System (SAS): Device-Addressed Speech Detection for Real-Time On-Device Voice AI
by: Kim, David Joohun, et al.
Published: (2026)
by: Kim, David Joohun, et al.
Published: (2026)
Spectral Masking and Interpolation Attack (SMIA): A Black-box Adversarial Attack against Voice Authentication and Anti-Spoofing Systems
by: Kamel, Kamel, et al.
Published: (2025)
by: Kamel, Kamel, et al.
Published: (2025)
Self Voice Conversion as an Attack against Neural Audio Watermarking
by: Özer, Yigitcan, et al.
Published: (2026)
by: Özer, Yigitcan, et al.
Published: (2026)
NVSpeech: An Integrated and Scalable Pipeline for Human-Like Speech Modeling with Paralinguistic Vocalizations
by: Liao, Huan, et al.
Published: (2025)
by: Liao, Huan, et al.
Published: (2025)
VoiceBench: Benchmarking LLM-Based Voice Assistants
by: Chen, Yiming, et al.
Published: (2024)
by: Chen, Yiming, et al.
Published: (2024)
MOSS-VoiceGenerator: Create Realistic Voices with Natural Language Descriptions
by: Huang, Kexin, et al.
Published: (2026)
by: Huang, Kexin, et al.
Published: (2026)
The Voice Timbre Attribute Detection 2025 Challenge Evaluation Plan
by: Sheng, Zhengyan, et al.
Published: (2025)
by: Sheng, Zhengyan, et al.
Published: (2025)
AVEX: What Matters for Animal Vocalization Encoding
by: Miron, Marius, et al.
Published: (2025)
by: Miron, Marius, et al.
Published: (2025)
Learning Marmoset Vocal Patterns with a Masked Autoencoder for Robust Call Segmentation, Classification, and Caller Identification
by: Wu, Bin, et al.
Published: (2024)
by: Wu, Bin, et al.
Published: (2024)
Neural Multi-Speaker Voice Cloning for Nepali in Low-Resource Settings
by: Shrestha, Aayush M., et al.
Published: (2026)
by: Shrestha, Aayush M., et al.
Published: (2026)
DAST: A Dual-Stream Voice Anonymization Attacker with Staged Training
by: Arefeen, Ridwan, et al.
Published: (2026)
by: Arefeen, Ridwan, et al.
Published: (2026)
StyleStream: Real-Time Zero-Shot Voice Style Conversion
by: Liu, Yisi, et al.
Published: (2026)
by: Liu, Yisi, et al.
Published: (2026)
Similar Items
-
VocalParse: Towards Unified and Scalable Singing Voice Transcription with Large Audio Language Models
by: Chen, Yukun, et al.
Published: (2026) -
Spectral Mapping of Singing Voices: U-Net-Assisted Vocal Segmentation
by: Sorrenti, Adam
Published: (2024) -
Where Are You From? Let Me Guess! Subdialect Recognition of Speeches in Sorani Kurdish
by: Isam, Sana, et al.
Published: (2024) -
The First Voice Timbre Attribute Detection Challenge
by: Chen, Liping, et al.
Published: (2025) -
Physics-Guided Deepfake Detection for Voice Authentication Systems
by: Mohammadi, Alireza, et al.
Published: (2025)