:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Xiaokang, Gong, Yicheng, Zou, Dinghao, Cao, Xin, Lee, Sunbowen
Format:	Preprint
Published:	2025
Subjects:	Sound
Online Access:	https://arxiv.org/abs/2509.10781
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

PoolingVQ: A VQVAE Variant for Reducing Audio Redundancy and Boosting Multi-Modal Fusion in Music Emotion Analysis
by: Zou, Dinghao, et al.
Published: (2025)

Investigating self-supervised representations for audio-visual deepfake detection
by: Boldisor, Dragos-Alexandru, et al.
Published: (2025)

A robust audio deepfake detection system via multi-view feature
by: Yang, Yujie, et al.
Published: (2024)

Forensic deepfake audio detection using segmental speech features
by: Yang, Tianle, et al.
Published: (2025)

Where are we in audio deepfake detection? A systematic analysis over generative and detection models
by: Li, Xiang, et al.
Published: (2024)

Leveraging large multimodal models for audio-video deepfake detection: a pilot study
by: Cao, Songjun, et al.
Published: (2026)

Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning
by: Smeu, Stefan, et al.
Published: (2024)

Versatile audio-visual learning for emotion recognition
by: Goncalves, Lucas, et al.
Published: (2023)

A tunable binaural audio telepresence system capable of balancing immersive and enhanced modes
by: Hsu, Yicheng, et al.
Published: (2024)

GRAM: Spatial general-purpose audio representations for real-world environments
by: Yuksel, Goksenin, et al.
Published: (2026)

Multi-layer attentive probing improves transfer of audio representations for bioacoustics
by: Miron, Marius, et al.
Published: (2026)

Transformation of audio embeddings into interpretable, concept-based representations
by: Zhang, Alice, et al.
Published: (2025)

AxLSTMs: learning self-supervised audio representations with xLSTMs
by: Yadav, Sarthak, et al.
Published: (2024)

Sparse deepfake detection promotes better disentanglement
by: Teissier, Antoine, et al.
Published: (2025)

Detecting music deepfakes is easy but actually hard
by: Afchar, Darius, et al.
Published: (2024)

Recomposer: Event-roll-guided generative audio editing
by: Ellis, Daniel P. W., et al.
Published: (2025)

Keep what you need : extracting efficient subnetworks from large audio representation models
by: Genova, David, et al.
Published: (2025)

Late fusion ensembles for speech recognition on diverse input audio representations
by: Jezidžić, Marin, et al.
Published: (2024)

SS-DPPN: A self-supervised dual-path foundation model for the generalizable cardiac audio representation
by: Muna, Ummy Maria, et al.
Published: (2025)

Stage-adaptive audio diffusion modeling
by: Zhang, Xuanhao, et al.
Published: (2026)

Self-supervised learning method using multiple sampling strategies for general-purpose audio representation
by: Kuroyanagi, Ibuki, et al.
Published: (2025)

EDTC: enhance depth of text comprehension in automated audio captioning
by: Tan, Liwen, et al.
Published: (2024)

AudioMAE++: learning better masked audio representations with SwiGLU FFNs
by: Yadav, Sarthak, et al.
Published: (2025)

GRAM: Spatial general-purpose audio representation models for real-world applications
by: Yuksel, Goksenin, et al.
Published: (2025)

EnCodecMAE: Leveraging neural codecs for universal audio representation learning
by: Pepino, Leonardo, et al.
Published: (2023)

Better audio representations are more brain-like: linking model-brain alignment with performance in downstream auditory tasks
by: Pepino, Leonardo, et al.
Published: (2025)

Easy, Interpretable, Effective: openSMILE for voice deepfake detection
by: Pascu, Octavian, et al.
Published: (2024)

Echoes: A semantically-aligned music deepfake detection dataset
by: Pascu, Octavian, et al.
Published: (2026)

Towards audio language modeling -- an overview
by: Wu, Haibin, et al.
Published: (2024)

MBCodec:Thorough disentangle for high-fidelity audio compression
by: Zhang, Ruonan, et al.
Published: (2025)

Exploring bat song syllable representations in self-supervised audio encoders
by: Kloots, Marianne de Heer, et al.
Published: (2024)

Scaling up masked audio encoder learning for general audio classification
by: Dinkel, Heinrich, et al.
Published: (2024)

Making deep neural networks work for medical audio: representation, compression and domain adaptation
by: Onu, Charles C
Published: (2025)

An overview of neural architectures for self-supervised audio representation learning from masked spectrograms
by: Yadav, Sarthak, et al.
Published: (2025)

Generalizable speech deepfake detection via meta-learned LoRA
by: Laakkonen, Janne, et al.
Published: (2025)

Bird detection in audio: a survey and a challenge
by: Stowell, Dan, et al.
Published: (2016)

TQCodec: Towards neural audio codec for high-fidelity music streaming
by: He, Lixing, et al.
Published: (2026)

Counterfactual experience augmented off-policy reinforcement learning
by: Lee, Sunbowen, et al.
Published: (2025)

ParaCLAP -- Towards a general language-audio model for computational paralinguistic tasks
by: Jing, Xin, et al.
Published: (2024)

Omni-CLST: Error-aware Curriculum Learning with guided Selective chain-of-Thought for audio question answering
by: Zhao, Jinghua, et al.
Published: (2025)