:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Chu, Wu, Jinhong, Wang, Yanzhi, Zha, Zhijian, Zhou, Qi
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2403.01132
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Physics-informed neural network for acoustic resonance analysis in a one-dimensional acoustic tube
by: Yokota, Kazuya, et al.
Published: (2023)

KS-Net: Multi-band joint speech restoration and enhancement network for 2024 ICASSP SSI Challenge
by: Yu, Guochen, et al.
Published: (2024)

A k-space approach to modeling multi-channel parametric array loudspeaker systems
by: Zhuang, Tao, et al.
Published: (2025)

Point to the Hidden: Exposing Speech Audio Splicing via Signal Pointer Nets
by: Moussa, Denise, et al.
Published: (2023)

CrossNet: Leveraging Global, Cross-Band, Narrow-Band, and Positional Encoding for Single- and Multi-Channel Speaker Separation
by: Kalkhorani, Vahid Ahmadi, et al.
Published: (2024)

HingeNet: A Harmonic-Aware Fine-Tuning Approach for Beat Tracking
by: Ru, Ganghui, et al.
Published: (2025)

META-CAT: Speaker-Informed Speech Embeddings via Meta Information Concatenation for Multi-talker ASR
by: Wang, Jinhan, et al.
Published: (2024)

The Arrow of Time in Music -- Revisiting the Temporal Structure of Music with Distinguishability and Unique Orientability as the Anchor Point
by: Xu, Qi
Published: (2023)

The THU-HCSI Multi-Speaker Multi-Lingual Few-Shot Voice Cloning System for LIMMITS'24 Challenge
by: Zhou, Yixuan, et al.
Published: (2024)

EZhouNet:A framework based on graph neural network and anchor interval for the respiratory sound event detection
by: Chu, Yun, et al.
Published: (2025)

Multi-Scale Accent Modeling and Disentangling for Multi-Speaker Multi-Accent Text-to-Speech Synthesis
by: Zhou, Xuehao, et al.
Published: (2024)

Phoneme-based speech recognition driven by large language models and sampling marginalization
by: Ma, Te, et al.
Published: (2025)

CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based Streaming ASR
by: Zhao, Wenbo, et al.
Published: (2024)

GMM-ResNet2: Ensemble of Group ResNet Networks for Synthetic Speech Detection
by: Lei, Zhenchun, et al.
Published: (2024)

UniTTS: An end-to-end TTS system without decoupling of acoustic and semantic information
by: Wang, Rui, et al.
Published: (2025)

Theory and investigation of acoustic multiple-input multiple-output systems based on spherical arrays in a room
by: Morgenstern, Hai, et al.
Published: (2024)

PhiNet: Speaker Verification with Phonetic Interpretability
by: Ma, Yi, et al.
Published: (2026)

SingNet: Towards a Large-Scale, Diverse, and In-the-Wild Singing Voice Dataset
by: Gu, Yicheng, et al.
Published: (2025)

Phoenix-VAD: Streaming Semantic Endpoint Detection for Full-Duplex Speech Interaction
by: Wu, Weijie, et al.
Published: (2025)

PrimeK-Net: Multi-scale Spectral Learning via Group Prime-Kernel Convolutional Neural Networks for Single Channel Speech Enhancement
by: Lin, Zizhen, et al.
Published: (2025)

Physics-Informed Machine Learning For Sound Field Estimation
by: Koyama, Shoichi, et al.
Published: (2024)

Identification of Physical Properties in Acoustic Tubes Using Physics-Informed Neural Networks
by: Yokota, Kazuya, et al.
Published: (2024)

Improving Real-Time Music Accompaniment Separation with MMDenseNet
by: Wang, Chun-Hsiang, et al.
Published: (2024)

MSU-Bench: Towards Understanding the Conversational Multi-talker Scenarios
by: Wang, Shuai, et al.
Published: (2025)

WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for Authenticity Verification
by: Zhou, Junzuo, et al.
Published: (2024)

E2E-AEC: Implementing an end-to-end neural network learning approach for acoustic echo cancellation
by: Jiang, Yiheng, et al.
Published: (2026)

ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics
by: Oh, Heewon
Published: (2026)

Deep learning classification system for coconut maturity levels based on acoustic signals
by: Caladcad, June Anne, et al.
Published: (2024)

Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction
by: Chen, Xueyuan, et al.
Published: (2024)

A robust audio deepfake detection system via multi-view feature
by: Yang, Yujie, et al.
Published: (2024)

Communication conditions in virtual acoustic scenes in an underground station
by: Hládek, Ľuboš, et al.
Published: (2021)

Robust DOA estimation using deep acoustic imaging
by: Roman, Adrian S., et al.
Published: (2024)

Polyphonia: Zero-Shot Timbre Transfer in Polyphonic Music with Acoustic-Informed Attention Calibration
by: Li, Haowen, et al.
Published: (2026)

Multi-Step Prediction and Control of Hierarchical Emotion Distribution in Text-to-Speech Synthesis
by: Inoue, Sho, et al.
Published: (2025)

A toolbox for rendering virtual acoustic environments in the context of audiology
by: Grimm, Giso, et al.
Published: (2018)

Guiding the underwater acoustic target recognition with interpretable contrastive learning
by: Xie, Yuan, et al.
Published: (2024)

Learning Vocal-Tract Area and Radiation with a Physics-Informed Webster Model
by: Lu, Minhui, et al.
Published: (2026)

In This Environment, As That Speaker: A Text-Driven Framework for Multi-Attribute Speech Conversion
by: Jin, Jiawei, et al.
Published: (2025)

Phase-Retrieval-Based Physics-Informed Neural Networks For Acoustic Magnitude Field Reconstruction
by: Schrader, Karl, et al.
Published: (2026)

Unsupervised Multi-channel Speech Dereverberation via Diffusion
by: Wu, Yulun, et al.
Published: (2025)