:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Yuxiang, Zhang, You, Duan, Zhiyao, Bocko, Mark
Format:	Preprint
Published:	2022
Subjects:	Audio and Speech Processing Sound
Online Access:	https://arxiv.org/abs/2207.14352
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A Probabilistic Fusion Framework for Spoofing Aware Speaker Verification
by: Zhang, You, et al.
Published: (2022)

Towards Perception-Informed Latent HRTF Representations
by: Zhang, You, et al.
Published: (2025)

PartialEdit: Identifying Partial Deepfakes in the Era of Neural Speech Editing
by: Zhang, You, et al.
Published: (2025)

UR Channel-Robust Synthetic Speech Detection System for ASVspoof 2021
by: Chen, Xinhui, et al.
Published: (2021)

An Empirical Study on Channel Effects for Synthetic Voice Spoofing Countermeasure Systems
by: Zhang, You, et al.
Published: (2021)

A Multi-Stream Fusion Approach with One-Class Learning for Audio-Visual Deepfake Detection
by: Lee, Kyungbok, et al.
Published: (2024)

ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Speed
by: Chen, Meiying, et al.
Published: (2022)

Generating Novel and Realistic Speakers for Voice Conversion
by: Chen, Meiying Melissa, et al.
Published: (2025)

A Data-Driven Exploration of Elevation Cues in HRTFs: An Explainable AI Perspective Across Multiple Datasets
by: De Rus, Juan Antonio, et al.
Published: (2025)

SingFake: Singing Voice Deepfake Detection
by: Zang, Yongyi, et al.
Published: (2023)

Cacophony: An Improved Contrastive Audio-Text Model
by: Zhu, Ge, et al.
Published: (2024)

Audio Generation Through Score-Based Generative Modeling: Design Principles and Implementation
by: Zhu, Ge, et al.
Published: (2025)

SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge
by: Zhang, You, et al.
Published: (2024)

Scoring Time Intervals using Non-Hierarchical Transformer For Automatic Piano Transcription
by: Yan, Yujia, et al.
Published: (2024)

Toward Fully Self-Supervised Multi-Pitch Estimation
by: Cwitkowitz, Frank, et al.
Published: (2024)

Investigating an Overfitting and Degeneration Phenomenon in Self-Supervised Multi-Pitch Estimation
by: Cwitkowitz, Frank, et al.
Published: (2025)

EchoScan: Scanning Complex Room Geometries via Acoustic Echoes
by: Yeon, Inmo, et al.
Published: (2023)

Conan: A Chunkwise Online Network for Zero-Shot Adaptive Voice Conversion
by: Zhang, Yu, et al.
Published: (2025)

Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
by: Wang, Shuai, et al.
Published: (2024)

Compositional Audio Representation Learning
by: Sridhar, Sripathi, et al.
Published: (2024)

Head-Related Transfer Function Individualization Using Anthropometric Features and Spatially Independent Latent Representation
by: Niu, Ryan, et al.
Published: (2025)

MusicHiFi: Fast High-Fidelity Stereo Vocoding
by: Zhu, Ge, et al.
Published: (2024)

Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition
by: Cornell, Samuele, et al.
Published: (2024)

CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection
by: Zang, Yongyi, et al.
Published: (2024)

Sound Field Reconstruction Using a Compact Acoustics-informed Neural Network
by: Ma, Fei, et al.
Published: (2024)

Head Orientation Estimation with Distributed Microphones Using Speech Radiation Patterns
by: Müller, Kaspar, et al.
Published: (2023)

Learning Disentangled Speech Representations with Contrastive Learning and Time-Invariant Retrieval
by: Deng, Yimin, et al.
Published: (2024)

Improving Short Utterance Anti-Spoofing with AASIST2
by: Zhang, Yuxiang, et al.
Published: (2023)

Selective-Memory Meta-Learning with Environment Representations for Sound Event Localization and Detection
by: Hu, Jinbo, et al.
Published: (2023)

Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech
by: Zhou, Enting, et al.
Published: (2023)

Deep Speech Synthesis from Multimodal Articulatory Representations
by: Wu, Peter, et al.
Published: (2024)

Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision
by: Chen, Yafeng, et al.
Published: (2024)

Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision
by: Chen, Yafeng, et al.
Published: (2023)

SVDD Challenge 2024: A Singing Voice Deepfake Detection Challenge Evaluation Plan
by: Zhang, You, et al.
Published: (2024)

Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation
by: Deng, Yimin, et al.
Published: (2024)

Adaptive Speech Emotion Representation Learning Based On Dynamic Graph
by: Gao, Yingxue, et al.
Published: (2024)

Progressive Residual Extraction based Pre-training for Speech Representation Learning
by: Wang, Tianrui, et al.
Published: (2024)

Feasibility of Mental Health Triage Call Priority Prediction Using Machine Learning
by: Rana, Rajib, et al.
Published: (2024)

ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
by: Fu, Ruibo, et al.
Published: (2024)

3D Room Geometry Inference from Multichannel Room Impulse Response using Deep Neural Network
by: Yeon, Inmo, et al.
Published: (2024)