:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Yucong, Zou, Xin, Yang, Jinshan, Chen, Wenjun, Liu, Juan, Liang, Faya, Li, Ming
Format:	Preprint
Published:	2024
Subjects:	Sound Artificial Intelligence Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2409.03597
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Toward Multimodal Industrial Fault Analysis: A Single-Speed Chain Conveyor Dataset with Audio and Vibration Signals
by: Chen, Zhang, et al.
Published: (2026)

Enhancing Child Vocalization Classification with Phonetically-Tuned Embeddings for Assisting Autism Diagnosis
by: Li, Jialu, et al.
Published: (2023)

Mel-RoFormer for Vocal Separation and Vocal Melody Transcription
by: Wang, Ju-Chiang, et al.
Published: (2024)

Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations
by: Li, Jialu, et al.
Published: (2024)

ECHO: Frequency-aware Hierarchical Encoding for Variable-length Signals
by: Zhang, Yucong, et al.
Published: (2025)

CVSM: Contrastive Vocal Similarity Modeling
by: Garoufis, Christos, et al.
Published: (2025)

RMVPE: A Robust Model for Vocal Pitch Estimation in Polyphonic Music
by: Wei, Haojie, et al.
Published: (2023)

Biodenoising: Animal Vocalization Denoising without Access to Clean Data
by: Miron, Marius, et al.
Published: (2024)

VocalAgent: Large Language Models for Vocal Health Diagnostics with Safety-Aware Evaluation
by: Kim, Yubin, et al.
Published: (2025)

Melodic and Metrical Elements of Expressiveness in Hindustani Vocal Music
by: Bhake, Yash, et al.
Published: (2025)

Auditory Representation Effective for Estimating Vocal Tract Information
by: Irino, Toshio, et al.
Published: (2023)

Spectral Mapping of Singing Voices: U-Net-Assisted Vocal Segmentation
by: Sorrenti, Adam
Published: (2024)

HSDreport: Heart Sound Diagnosis with Echocardiography Reports
by: Zhao, Zihan, et al.
Published: (2024)

A Reliable and Efficient Detection Pipeline for Rodent Ultrasonic Vocalizations
by: Anis, Sabah Shahnoor, et al.
Published: (2025)

Drum-to-Vocal Percussion Sound Conversion and Its Evaluation Methodology
by: Nobukawa, Rinka, et al.
Published: (2025)

DJCM: A Deep Joint Cascade Model for Singing Voice Separation and Vocal Pitch Estimation
by: Wei, Haojie, et al.
Published: (2024)

voc2vec: A Foundation Model for Non-Verbal Vocalization
by: Koudounas, Alkis, et al.
Published: (2025)

Hearing Health in Home Healthcare: Leveraging LLMs for Illness Scoring and ALMs for Vocal Biomarker Extraction
by: Chen, Yu-Wen, et al.
Published: (2025)

Learning Vocal-Tract Area and Radiation with a Physics-Informed Webster Model
by: Lu, Minhui, et al.
Published: (2026)

DiffVox: A Differentiable Model for Capturing and Analysing Vocal Effects Distributions
by: Yu, Chin-Yun, et al.
Published: (2025)

Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions
by: Xin, Yifei, et al.
Published: (2023)

PoolingVQ: A VQVAE Variant for Reducing Audio Redundancy and Boosting Multi-Modal Fusion in Music Emotion Analysis
by: Zou, Dinghao, et al.
Published: (2025)

Live Vocal Extraction from K-pop Performances
by: Kim, Yujin, et al.
Published: (2025)

Relating the Neural Representations of Vocalized, Mimed, and Imagined Speech
by: Maghsoudi, Maryam, et al.
Published: (2026)

Computational Extraction of Intonation and Tuning Systems from Multiple Microtonal Monophonic Vocal Recordings with Diverse Modes
by: Shafiei, Sepideh, et al.
Published: (2025)

Audio-text Retrieval with Transformer-based Hierarchical Alignment and Disentangled Cross-modal Representation
by: Xin, Yifei, et al.
Published: (2024)

Extract and Diffuse: Latent Integration for Improved Diffusion-based Speech and Vocal Enhancement
by: Yang, Yudong, et al.
Published: (2024)

Temporally Heterogeneous Graph Contrastive Learning for Multimodal Acoustic event Classification
by: Chen, Yuanjian, et al.
Published: (2025)

Beyond Acoustic Sparsity and Linguistic Bias: A Prompt-Free Paradigm for Mispronunciation Detection and Diagnosis
by: Geng, Haopeng, et al.
Published: (2026)

MNV-17: A High-Quality Performative Mandarin Dataset for Nonverbal Vocalization Recognition in Speech
by: Mai, Jialong, et al.
Published: (2025)

Towards the Synthesis of Non-speech Vocalizations
by: Hoq, Enjamamul, et al.
Published: (2024)

Bird Vocalization Embedding Extraction Using Self-Supervised Disentangled Representation Learning
by: Shi, Runwu, et al.
Published: (2024)

VocalCrypt: Novel Active Defense Against Deepfake Voice Based on Masking Effect
by: Fei, Qingyuan, et al.
Published: (2025)

DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation
by: Li, Baihan, et al.
Published: (2024)

Transfer Learning in Vocal Education: Technical Evaluation of Limited Samples Describing Mezzo-soprano
by: Hou, Zhenyi, et al.
Published: (2024)

Learning Physiology-Informed Vocal Spectrotemporal Representations for Speech Emotion Recognition
by: Zhang, Xu, et al.
Published: (2026)

Beyond Discrete Categories: Multi-Task Valence-Arousal Modeling for Pet Vocalization Analysis
by: Huang, Junyao, et al.
Published: (2025)

Few-Shot Bioacoustic Event Detection with Frame-Level Embedding Learning System
by: Zhao, PengYuan, et al.
Published: (2024)

FADI-AEC: Fast Score Based Diffusion Model Guided by Far-end Signal for Acoustic Echo Cancellation
by: Liu, Yang, et al.
Published: (2024)

Semantically consistent Video-to-Audio Generation using Multimodal Language Large Model
by: Chen, Gehui, et al.
Published: (2024)