Saved in:
| Main Authors: | Li, Xiaokang, Gong, Yicheng, Zou, Dinghao, Cao, Xin, Lee, Sunbowen |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.10781 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
PoolingVQ: A VQVAE Variant for Reducing Audio Redundancy and Boosting Multi-Modal Fusion in Music Emotion Analysis
by: Zou, Dinghao, et al.
Published: (2025)
by: Zou, Dinghao, et al.
Published: (2025)
Investigating self-supervised representations for audio-visual deepfake detection
by: Boldisor, Dragos-Alexandru, et al.
Published: (2025)
by: Boldisor, Dragos-Alexandru, et al.
Published: (2025)
A robust audio deepfake detection system via multi-view feature
by: Yang, Yujie, et al.
Published: (2024)
by: Yang, Yujie, et al.
Published: (2024)
Forensic deepfake audio detection using segmental speech features
by: Yang, Tianle, et al.
Published: (2025)
by: Yang, Tianle, et al.
Published: (2025)
Where are we in audio deepfake detection? A systematic analysis over generative and detection models
by: Li, Xiang, et al.
Published: (2024)
by: Li, Xiang, et al.
Published: (2024)
Leveraging large multimodal models for audio-video deepfake detection: a pilot study
by: Cao, Songjun, et al.
Published: (2026)
by: Cao, Songjun, et al.
Published: (2026)
Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning
by: Smeu, Stefan, et al.
Published: (2024)
by: Smeu, Stefan, et al.
Published: (2024)
Versatile audio-visual learning for emotion recognition
by: Goncalves, Lucas, et al.
Published: (2023)
by: Goncalves, Lucas, et al.
Published: (2023)
A tunable binaural audio telepresence system capable of balancing immersive and enhanced modes
by: Hsu, Yicheng, et al.
Published: (2024)
by: Hsu, Yicheng, et al.
Published: (2024)
GRAM: Spatial general-purpose audio representations for real-world environments
by: Yuksel, Goksenin, et al.
Published: (2026)
by: Yuksel, Goksenin, et al.
Published: (2026)
Multi-layer attentive probing improves transfer of audio representations for bioacoustics
by: Miron, Marius, et al.
Published: (2026)
by: Miron, Marius, et al.
Published: (2026)
Transformation of audio embeddings into interpretable, concept-based representations
by: Zhang, Alice, et al.
Published: (2025)
by: Zhang, Alice, et al.
Published: (2025)
AxLSTMs: learning self-supervised audio representations with xLSTMs
by: Yadav, Sarthak, et al.
Published: (2024)
by: Yadav, Sarthak, et al.
Published: (2024)
Sparse deepfake detection promotes better disentanglement
by: Teissier, Antoine, et al.
Published: (2025)
by: Teissier, Antoine, et al.
Published: (2025)
Detecting music deepfakes is easy but actually hard
by: Afchar, Darius, et al.
Published: (2024)
by: Afchar, Darius, et al.
Published: (2024)
Recomposer: Event-roll-guided generative audio editing
by: Ellis, Daniel P. W., et al.
Published: (2025)
by: Ellis, Daniel P. W., et al.
Published: (2025)
Keep what you need : extracting efficient subnetworks from large audio representation models
by: Genova, David, et al.
Published: (2025)
by: Genova, David, et al.
Published: (2025)
Late fusion ensembles for speech recognition on diverse input audio representations
by: Jezidžić, Marin, et al.
Published: (2024)
by: Jezidžić, Marin, et al.
Published: (2024)
SS-DPPN: A self-supervised dual-path foundation model for the generalizable cardiac audio representation
by: Muna, Ummy Maria, et al.
Published: (2025)
by: Muna, Ummy Maria, et al.
Published: (2025)
Stage-adaptive audio diffusion modeling
by: Zhang, Xuanhao, et al.
Published: (2026)
by: Zhang, Xuanhao, et al.
Published: (2026)
Self-supervised learning method using multiple sampling strategies for general-purpose audio representation
by: Kuroyanagi, Ibuki, et al.
Published: (2025)
by: Kuroyanagi, Ibuki, et al.
Published: (2025)
EDTC: enhance depth of text comprehension in automated audio captioning
by: Tan, Liwen, et al.
Published: (2024)
by: Tan, Liwen, et al.
Published: (2024)
AudioMAE++: learning better masked audio representations with SwiGLU FFNs
by: Yadav, Sarthak, et al.
Published: (2025)
by: Yadav, Sarthak, et al.
Published: (2025)
GRAM: Spatial general-purpose audio representation models for real-world applications
by: Yuksel, Goksenin, et al.
Published: (2025)
by: Yuksel, Goksenin, et al.
Published: (2025)
EnCodecMAE: Leveraging neural codecs for universal audio representation learning
by: Pepino, Leonardo, et al.
Published: (2023)
by: Pepino, Leonardo, et al.
Published: (2023)
Better audio representations are more brain-like: linking model-brain alignment with performance in downstream auditory tasks
by: Pepino, Leonardo, et al.
Published: (2025)
by: Pepino, Leonardo, et al.
Published: (2025)
Easy, Interpretable, Effective: openSMILE for voice deepfake detection
by: Pascu, Octavian, et al.
Published: (2024)
by: Pascu, Octavian, et al.
Published: (2024)
Echoes: A semantically-aligned music deepfake detection dataset
by: Pascu, Octavian, et al.
Published: (2026)
by: Pascu, Octavian, et al.
Published: (2026)
Towards audio language modeling -- an overview
by: Wu, Haibin, et al.
Published: (2024)
by: Wu, Haibin, et al.
Published: (2024)
MBCodec:Thorough disentangle for high-fidelity audio compression
by: Zhang, Ruonan, et al.
Published: (2025)
by: Zhang, Ruonan, et al.
Published: (2025)
Exploring bat song syllable representations in self-supervised audio encoders
by: Kloots, Marianne de Heer, et al.
Published: (2024)
by: Kloots, Marianne de Heer, et al.
Published: (2024)
Scaling up masked audio encoder learning for general audio classification
by: Dinkel, Heinrich, et al.
Published: (2024)
by: Dinkel, Heinrich, et al.
Published: (2024)
Making deep neural networks work for medical audio: representation, compression and domain adaptation
by: Onu, Charles C
Published: (2025)
by: Onu, Charles C
Published: (2025)
An overview of neural architectures for self-supervised audio representation learning from masked spectrograms
by: Yadav, Sarthak, et al.
Published: (2025)
by: Yadav, Sarthak, et al.
Published: (2025)
Generalizable speech deepfake detection via meta-learned LoRA
by: Laakkonen, Janne, et al.
Published: (2025)
by: Laakkonen, Janne, et al.
Published: (2025)
Bird detection in audio: a survey and a challenge
by: Stowell, Dan, et al.
Published: (2016)
by: Stowell, Dan, et al.
Published: (2016)
TQCodec: Towards neural audio codec for high-fidelity music streaming
by: He, Lixing, et al.
Published: (2026)
by: He, Lixing, et al.
Published: (2026)
Counterfactual experience augmented off-policy reinforcement learning
by: Lee, Sunbowen, et al.
Published: (2025)
by: Lee, Sunbowen, et al.
Published: (2025)
ParaCLAP -- Towards a general language-audio model for computational paralinguistic tasks
by: Jing, Xin, et al.
Published: (2024)
by: Jing, Xin, et al.
Published: (2024)
Omni-CLST: Error-aware Curriculum Learning with guided Selective chain-of-Thought for audio question answering
by: Zhao, Jinghua, et al.
Published: (2025)
by: Zhao, Jinghua, et al.
Published: (2025)
Similar Items
-
PoolingVQ: A VQVAE Variant for Reducing Audio Redundancy and Boosting Multi-Modal Fusion in Music Emotion Analysis
by: Zou, Dinghao, et al.
Published: (2025) -
Investigating self-supervised representations for audio-visual deepfake detection
by: Boldisor, Dragos-Alexandru, et al.
Published: (2025) -
A robust audio deepfake detection system via multi-view feature
by: Yang, Yujie, et al.
Published: (2024) -
Forensic deepfake audio detection using segmental speech features
by: Yang, Tianle, et al.
Published: (2025) -
Where are we in audio deepfake detection? A systematic analysis over generative and detection models
by: Li, Xiang, et al.
Published: (2024)