:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Gu, Hao, Yi, JiangYan, Wang, Chenglong, Ren, Yong, Tao, Jianhua, Yan, Xinrui, Chen, Yujie, Zhang, Xiaohui
Format:	Preprint
Published:	2024
Subjects:	Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2408.17009
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ADD 2023: Towards Audio Deepfake Detection and Analysis in the Wild
by: Yi, Jiangyan, et al.
Published: (2024)

Region-Based Optimization in Continual Learning for Audio Deepfake Detection
by: Chen, Yujie, et al.
Published: (2024)

EmoFake: An Initial Dataset for Emotion Fake Audio Detection
by: Zhao, Yan, et al.
Published: (2022)

Reject Threshold Adaptation for Open-Set Model Attribution of Deepfake Audio
by: Yan, Xinrui, et al.
Published: (2024)

ALLM4ADD: Unlocking the Capabilities of Audio Large Language Models for Audio Deepfake Detection
by: Gu, Hao, et al.
Published: (2025)

Audio Deepfake Attribution: An Initial Dataset and Investigation
by: Yan, Xinrui, et al.
Published: (2022)

Distinguishing Neural Speech Synthesis Models Through Fingerprints in Speech Waveforms
by: Zhang, Chu Yuan, et al.
Published: (2023)

RawBMamba: End-to-End Bidirectional State Space Model for Audio Deepfake Detection
by: Chen, Yujie, et al.
Published: (2024)

Towards Robust Audio Deepfake Detection: A Evolving Benchmark for Continual Learning
by: Zhang, Xiaohui, et al.
Published: (2024)

An Unsupervised Domain Adaptation Method for Locating Manipulated Region in partially fake Audio
by: Zeng, Siding, et al.
Published: (2024)

Residual Speaker Representation for One-Shot Voice Conversion
by: Xu, Le, et al.
Published: (2023)

ADD 2022: the First Audio Deep Synthesis Detection Challenge
by: Yi, Jiangyan, et al.
Published: (2022)

Edit Content, Preserve Acoustics: Imperceptible Text-Based Speech Editing via Self-Consistency Rewards
by: Ren, Yong, et al.
Published: (2026)

SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection
by: Yi, Jiangyan, et al.
Published: (2022)

OV-InstructTTS: Towards Open-Vocabulary Instruct Text-to-Speech
by: Ren, Yong, et al.
Published: (2026)

Spatial Reconstructed Local Attention Res2Net with F0 Subband for Fake Speech Detection
by: Fan, Cunhang, et al.
Published: (2023)

Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
by: Tao, Ruijie, et al.
Published: (2024)

ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
by: Fu, Ruibo, et al.
Published: (2024)

Unified Audio Event Detection
by: Jiang, Yidi, et al.
Published: (2024)

Can Audio Large Language Models Verify Speaker Identity?
by: Ren, Yiming, et al.
Published: (2025)

WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for Authenticity Verification
by: Zhou, Junzuo, et al.
Published: (2024)

Profile-Error-Tolerant Target-Speaker Voice Activity Detection
by: Wang, Dongmei, et al.
Published: (2023)

Fewer-token Neural Speech Codec with Time-invariant Codes
by: Ren, Yong, et al.
Published: (2023)

Review of MEMS Speakers for Audio Applications
by: Wittek, Nils, et al.
Published: (2025)

Hearing from Silence: Reasoning Audio Descriptions from Silent Videos via Vision-Language Model
by: Ren, Yong, et al.
Published: (2025)

Generalized Fake Audio Detection via Deep Stable Learning
by: Wang, Zhiyong, et al.
Published: (2024)

Two-stage Audio-Visual Target Speaker Extraction System for Real-Time Processing On Edge Device
by: Li, Zixuan, et al.
Published: (2025)

A Noval Feature via Color Quantisation for Fake Audio Detection
by: Wang, Zhiyong, et al.
Published: (2024)

Online Audio-Visual Autoregressive Speaker Extraction
by: Pan, Zexu, et al.
Published: (2025)

Genuine-Focused Learning using Mask AutoEncoder for Generalized Fake Audio Detection
by: Wang, Xiaopeng, et al.
Published: (2024)

TraceableSpeech: Towards Proactively Traceable Text-to-Speech with Watermarking
by: Zhou, Junzuo, et al.
Published: (2024)

Descriptor:: Extended-Length Audio Dataset for Synthetic Voice Detection and Speaker Recognition (ELAD-SVDSR)
by: Vijaykumar, Rahul, et al.
Published: (2025)

Speech to Speech Synthesis for Voice Impersonation
by: Johnson, Bjorn, et al.
Published: (2026)

RPRA-ADD: Forgery Trace Enhancement-Driven Audio Deepfake Detection
by: Fu, Ruibo, et al.
Published: (2025)

Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0
by: Wang, Zhiyong, et al.
Published: (2024)

Speaker Distance Estimation in Enclosures from Single-Channel Audio
by: Neri, Michael, et al.
Published: (2024)

Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions
by: Zhao, Jinzheng, et al.
Published: (2023)

Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition
by: Li, Guinan, et al.
Published: (2024)

Prototype based Masked Audio Model for Self-Supervised Learning of Sound Event Detection
by: Cai, Pengfei, et al.
Published: (2024)

From Contrast to Commonality: Audio Commonality Captioning for Enhanced Audio-Text Cross-modal Understanding in Multimodal LLMs
by: Jia, Yuhang, et al.
Published: (2025)