Saved in:
| Main Authors: | Zhang, Yucong, Zou, Xin, Yang, Jinshan, Chen, Wenjun, Liu, Juan, Liang, Faya, Li, Ming |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.03597 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Toward Multimodal Industrial Fault Analysis: A Single-Speed Chain Conveyor Dataset with Audio and Vibration Signals
by: Chen, Zhang, et al.
Published: (2026)
by: Chen, Zhang, et al.
Published: (2026)
Enhancing Child Vocalization Classification with Phonetically-Tuned Embeddings for Assisting Autism Diagnosis
by: Li, Jialu, et al.
Published: (2023)
by: Li, Jialu, et al.
Published: (2023)
Mel-RoFormer for Vocal Separation and Vocal Melody Transcription
by: Wang, Ju-Chiang, et al.
Published: (2024)
by: Wang, Ju-Chiang, et al.
Published: (2024)
Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations
by: Li, Jialu, et al.
Published: (2024)
by: Li, Jialu, et al.
Published: (2024)
ECHO: Frequency-aware Hierarchical Encoding for Variable-length Signals
by: Zhang, Yucong, et al.
Published: (2025)
by: Zhang, Yucong, et al.
Published: (2025)
CVSM: Contrastive Vocal Similarity Modeling
by: Garoufis, Christos, et al.
Published: (2025)
by: Garoufis, Christos, et al.
Published: (2025)
RMVPE: A Robust Model for Vocal Pitch Estimation in Polyphonic Music
by: Wei, Haojie, et al.
Published: (2023)
by: Wei, Haojie, et al.
Published: (2023)
Biodenoising: Animal Vocalization Denoising without Access to Clean Data
by: Miron, Marius, et al.
Published: (2024)
by: Miron, Marius, et al.
Published: (2024)
VocalAgent: Large Language Models for Vocal Health Diagnostics with Safety-Aware Evaluation
by: Kim, Yubin, et al.
Published: (2025)
by: Kim, Yubin, et al.
Published: (2025)
Melodic and Metrical Elements of Expressiveness in Hindustani Vocal Music
by: Bhake, Yash, et al.
Published: (2025)
by: Bhake, Yash, et al.
Published: (2025)
Auditory Representation Effective for Estimating Vocal Tract Information
by: Irino, Toshio, et al.
Published: (2023)
by: Irino, Toshio, et al.
Published: (2023)
Spectral Mapping of Singing Voices: U-Net-Assisted Vocal Segmentation
by: Sorrenti, Adam
Published: (2024)
by: Sorrenti, Adam
Published: (2024)
HSDreport: Heart Sound Diagnosis with Echocardiography Reports
by: Zhao, Zihan, et al.
Published: (2024)
by: Zhao, Zihan, et al.
Published: (2024)
A Reliable and Efficient Detection Pipeline for Rodent Ultrasonic Vocalizations
by: Anis, Sabah Shahnoor, et al.
Published: (2025)
by: Anis, Sabah Shahnoor, et al.
Published: (2025)
Drum-to-Vocal Percussion Sound Conversion and Its Evaluation Methodology
by: Nobukawa, Rinka, et al.
Published: (2025)
by: Nobukawa, Rinka, et al.
Published: (2025)
DJCM: A Deep Joint Cascade Model for Singing Voice Separation and Vocal Pitch Estimation
by: Wei, Haojie, et al.
Published: (2024)
by: Wei, Haojie, et al.
Published: (2024)
voc2vec: A Foundation Model for Non-Verbal Vocalization
by: Koudounas, Alkis, et al.
Published: (2025)
by: Koudounas, Alkis, et al.
Published: (2025)
Hearing Health in Home Healthcare: Leveraging LLMs for Illness Scoring and ALMs for Vocal Biomarker Extraction
by: Chen, Yu-Wen, et al.
Published: (2025)
by: Chen, Yu-Wen, et al.
Published: (2025)
Learning Vocal-Tract Area and Radiation with a Physics-Informed Webster Model
by: Lu, Minhui, et al.
Published: (2026)
by: Lu, Minhui, et al.
Published: (2026)
DiffVox: A Differentiable Model for Capturing and Analysing Vocal Effects Distributions
by: Yu, Chin-Yun, et al.
Published: (2025)
by: Yu, Chin-Yun, et al.
Published: (2025)
Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions
by: Xin, Yifei, et al.
Published: (2023)
by: Xin, Yifei, et al.
Published: (2023)
PoolingVQ: A VQVAE Variant for Reducing Audio Redundancy and Boosting Multi-Modal Fusion in Music Emotion Analysis
by: Zou, Dinghao, et al.
Published: (2025)
by: Zou, Dinghao, et al.
Published: (2025)
Live Vocal Extraction from K-pop Performances
by: Kim, Yujin, et al.
Published: (2025)
by: Kim, Yujin, et al.
Published: (2025)
Relating the Neural Representations of Vocalized, Mimed, and Imagined Speech
by: Maghsoudi, Maryam, et al.
Published: (2026)
by: Maghsoudi, Maryam, et al.
Published: (2026)
Computational Extraction of Intonation and Tuning Systems from Multiple Microtonal Monophonic Vocal Recordings with Diverse Modes
by: Shafiei, Sepideh, et al.
Published: (2025)
by: Shafiei, Sepideh, et al.
Published: (2025)
Audio-text Retrieval with Transformer-based Hierarchical Alignment and Disentangled Cross-modal Representation
by: Xin, Yifei, et al.
Published: (2024)
by: Xin, Yifei, et al.
Published: (2024)
Extract and Diffuse: Latent Integration for Improved Diffusion-based Speech and Vocal Enhancement
by: Yang, Yudong, et al.
Published: (2024)
by: Yang, Yudong, et al.
Published: (2024)
Temporally Heterogeneous Graph Contrastive Learning for Multimodal Acoustic event Classification
by: Chen, Yuanjian, et al.
Published: (2025)
by: Chen, Yuanjian, et al.
Published: (2025)
Beyond Acoustic Sparsity and Linguistic Bias: A Prompt-Free Paradigm for Mispronunciation Detection and Diagnosis
by: Geng, Haopeng, et al.
Published: (2026)
by: Geng, Haopeng, et al.
Published: (2026)
MNV-17: A High-Quality Performative Mandarin Dataset for Nonverbal Vocalization Recognition in Speech
by: Mai, Jialong, et al.
Published: (2025)
by: Mai, Jialong, et al.
Published: (2025)
Towards the Synthesis of Non-speech Vocalizations
by: Hoq, Enjamamul, et al.
Published: (2024)
by: Hoq, Enjamamul, et al.
Published: (2024)
Bird Vocalization Embedding Extraction Using Self-Supervised Disentangled Representation Learning
by: Shi, Runwu, et al.
Published: (2024)
by: Shi, Runwu, et al.
Published: (2024)
VocalCrypt: Novel Active Defense Against Deepfake Voice Based on Masking Effect
by: Fei, Qingyuan, et al.
Published: (2025)
by: Fei, Qingyuan, et al.
Published: (2025)
DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation
by: Li, Baihan, et al.
Published: (2024)
by: Li, Baihan, et al.
Published: (2024)
Transfer Learning in Vocal Education: Technical Evaluation of Limited Samples Describing Mezzo-soprano
by: Hou, Zhenyi, et al.
Published: (2024)
by: Hou, Zhenyi, et al.
Published: (2024)
Learning Physiology-Informed Vocal Spectrotemporal Representations for Speech Emotion Recognition
by: Zhang, Xu, et al.
Published: (2026)
by: Zhang, Xu, et al.
Published: (2026)
Beyond Discrete Categories: Multi-Task Valence-Arousal Modeling for Pet Vocalization Analysis
by: Huang, Junyao, et al.
Published: (2025)
by: Huang, Junyao, et al.
Published: (2025)
Few-Shot Bioacoustic Event Detection with Frame-Level Embedding Learning System
by: Zhao, PengYuan, et al.
Published: (2024)
by: Zhao, PengYuan, et al.
Published: (2024)
FADI-AEC: Fast Score Based Diffusion Model Guided by Far-end Signal for Acoustic Echo Cancellation
by: Liu, Yang, et al.
Published: (2024)
by: Liu, Yang, et al.
Published: (2024)
Semantically consistent Video-to-Audio Generation using Multimodal Language Large Model
by: Chen, Gehui, et al.
Published: (2024)
by: Chen, Gehui, et al.
Published: (2024)
Similar Items
-
Toward Multimodal Industrial Fault Analysis: A Single-Speed Chain Conveyor Dataset with Audio and Vibration Signals
by: Chen, Zhang, et al.
Published: (2026) -
Enhancing Child Vocalization Classification with Phonetically-Tuned Embeddings for Assisting Autism Diagnosis
by: Li, Jialu, et al.
Published: (2023) -
Mel-RoFormer for Vocal Separation and Vocal Melody Transcription
by: Wang, Ju-Chiang, et al.
Published: (2024) -
Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations
by: Li, Jialu, et al.
Published: (2024) -
ECHO: Frequency-aware Hierarchical Encoding for Variable-length Signals
by: Zhang, Yucong, et al.
Published: (2025)