Saved in:
| Main Authors: | Fan, Zexia, Chen, Yu, Zhang, Qiquan, Chen, Kainan, Qian, Xinyuan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.18335 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Analytic Class Incremental Learning for Sound Source Localization with Privacy Protection
by: Qian, Xinyuan, et al.
Published: (2024)
by: Qian, Xinyuan, et al.
Published: (2024)
SAV-SE: Scene-aware Audio-Visual Speech Enhancement with Selective State Space Model
by: Qian, Xinyuan, et al.
Published: (2024)
by: Qian, Xinyuan, et al.
Published: (2024)
SELD-Mamba: Selective State-Space Model for Sound Event Localization and Detection with Source Distance Estimation
by: Mu, Da, et al.
Published: (2024)
by: Mu, Da, et al.
Published: (2024)
MARS-Sep: Multimodal-Aligned Reinforced Sound Separation
by: Zhang, Zihan, et al.
Published: (2025)
by: Zhang, Zihan, et al.
Published: (2025)
AV-SSAN: Audio-Visual Selective DoA Estimation through Explicit Multi-Band Semantic-Spatial Alignment
by: Chen, Yu, et al.
Published: (2025)
by: Chen, Yu, et al.
Published: (2025)
Improving Anomalous Sound Detection with Attribute-aware Representation from Domain-adaptive Pre-training
by: Fang, Xin, et al.
Published: (2025)
by: Fang, Xin, et al.
Published: (2025)
Efficient and Microphone-Fault-Tolerant 3D Sound Source Localization
by: Yang, Yiyuan, et al.
Published: (2025)
by: Yang, Yiyuan, et al.
Published: (2025)
EvA: An Evidence-First Audio Understanding Paradigm for LALMs
by: Xie, Xinyuan, et al.
Published: (2026)
by: Xie, Xinyuan, et al.
Published: (2026)
Sub-Band Spectral Matching with Localized Score Aggregation for Robust Anomalous Sound Detection
by: Saengthong, Phurich, et al.
Published: (2026)
by: Saengthong, Phurich, et al.
Published: (2026)
When LLMs Meets Acoustic Landmarks: An Efficient Approach to Integrate Speech into Large Language Models for Depression Detection
by: Zhang, Xiangyu, et al.
Published: (2024)
by: Zhang, Xiangyu, et al.
Published: (2024)
AudioCapBench: Quick Evaluation on Audio Captioning across Sound, Music, and Speech
by: Qiu, Jielin, et al.
Published: (2026)
by: Qiu, Jielin, et al.
Published: (2026)
Environmental Sound Deepfake Detection Using Deep-Learning Framework
by: Pham, Lam, et al.
Published: (2026)
by: Pham, Lam, et al.
Published: (2026)
Quantizer-Aware Hierarchical Neural Codec Modeling for Speech Deepfake Detection
by: Wu, Jinyang, et al.
Published: (2026)
by: Wu, Jinyang, et al.
Published: (2026)
FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles
by: Zhang, Tian-Hao, et al.
Published: (2025)
by: Zhang, Tian-Hao, et al.
Published: (2025)
Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models
by: Zheng, Xinhu, et al.
Published: (2024)
by: Zheng, Xinhu, et al.
Published: (2024)
Learning What To Hear: Boosting Sound-Source Association For Robust Audiovisual Instance Segmentation
by: Seo, Jinbae, et al.
Published: (2025)
by: Seo, Jinbae, et al.
Published: (2025)
SyncTrack: Rhythmic Stability and Synchronization in Multi-Track Music Generation
by: Wang, Hongrui, et al.
Published: (2026)
by: Wang, Hongrui, et al.
Published: (2026)
Elastic Net Regularization and Gabor Dictionary for Classification of Heart Sound Signals using Deep Learning
by: Fakhry, Mahmoud, et al.
Published: (2026)
by: Fakhry, Mahmoud, et al.
Published: (2026)
An Effective Automated Speaking Assessment Approach to Mitigating Data Scarcity and Imbalanced Distribution
by: Lo, Tien-Hong, et al.
Published: (2024)
by: Lo, Tien-Hong, et al.
Published: (2024)
DGFNet: End-to-End Audio-Visual Source Separation Based on Dynamic Gating Fusion
by: Yu, Yinfeng, et al.
Published: (2025)
by: Yu, Yinfeng, et al.
Published: (2025)
MAJL: A Model-Agnostic Joint Learning Framework for Music Source Separation and Pitch Estimation
by: Wei, Haojie, et al.
Published: (2025)
by: Wei, Haojie, et al.
Published: (2025)
DuoTok: Source-Aware Dual-Track Tokenization for Multi-Track Music Language Modeling
by: Lin, Rui, et al.
Published: (2025)
by: Lin, Rui, et al.
Published: (2025)
Towards Open World Sound Event Detection
by: Hai, P. H., et al.
Published: (2026)
by: Hai, P. H., et al.
Published: (2026)
Quantifying Multimodal Imbalance: A GMM-Guided Adaptive Loss for Audio-Visual Learning
by: Liu, Zhaocheng, et al.
Published: (2025)
by: Liu, Zhaocheng, et al.
Published: (2025)
'Studies for': A Human-AI Co-Creative Sound Artwork Using a Real-time Multi-channel Sound Generation Model
by: Nagashima, Chihiro, et al.
Published: (2025)
by: Nagashima, Chihiro, et al.
Published: (2025)
CycleGuardian: A Framework for Automatic RespiratorySound classification Based on Improved Deep clustering and Contrastive Learning
by: Chu, Yun, et al.
Published: (2025)
by: Chu, Yun, et al.
Published: (2025)
Contrastive Learning with Spectrum Information Augmentation in Abnormal Sound Detection
by: Meng, Xinxin, et al.
Published: (2025)
by: Meng, Xinxin, et al.
Published: (2025)
UniWhisper: Efficient Continual Multi-task Training for Robust Universal Audio Representation
by: Chen, Yuxuan, et al.
Published: (2026)
by: Chen, Yuxuan, et al.
Published: (2026)
TopSeg: A Multi-Scale Topological Framework for Data-Efficient Heart Sound Segmentation
by: Zhang, Peihong, et al.
Published: (2025)
by: Zhang, Peihong, et al.
Published: (2025)
Fun-Audio-Chat Technical Report
by: Tongyi Fun Team, et al.
Published: (2025)
by: Tongyi Fun Team, et al.
Published: (2025)
MFF-EINV2: Multi-scale Feature Fusion across Spectral-Spatial-Temporal Domains for Sound Event Localization and Detection
by: Mu, Da, et al.
Published: (2024)
by: Mu, Da, et al.
Published: (2024)
Enhanced Sound Event Localization and Detection in Real 360-degree audio-visual soundscapes
by: Roman, Adrian S., et al.
Published: (2024)
by: Roman, Adrian S., et al.
Published: (2024)
Joint Learning of Emotions in Music and Generalized Sounds
by: Simonetta, Federico, et al.
Published: (2024)
by: Simonetta, Federico, et al.
Published: (2024)
Unifying Speech Editing Detection and Content Localization via Prior-Enhanced Audio LLMs
by: Xue, Jun, et al.
Published: (2026)
by: Xue, Jun, et al.
Published: (2026)
One Prompt, Many Sounds: Modeling Listener Variability in LLM-Based Equalization
by: Stylianou, Ioannis, et al.
Published: (2026)
by: Stylianou, Ioannis, et al.
Published: (2026)
Formula-Supervised Sound Event Detection: Pre-Training Without Real Data
by: Shibata, Yuto, et al.
Published: (2025)
by: Shibata, Yuto, et al.
Published: (2025)
Latent-Mark: An Audio Watermark Robust to Neural Resynthesis
by: Chen, Yen-Shan, et al.
Published: (2026)
by: Chen, Yen-Shan, et al.
Published: (2026)
FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion
by: Chen, Shunian, et al.
Published: (2025)
by: Chen, Shunian, et al.
Published: (2025)
SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models
by: Yin, Han, et al.
Published: (2025)
by: Yin, Han, et al.
Published: (2025)
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing
by: Anastassiou, Philip, et al.
Published: (2024)
by: Anastassiou, Philip, et al.
Published: (2024)
Similar Items
-
Analytic Class Incremental Learning for Sound Source Localization with Privacy Protection
by: Qian, Xinyuan, et al.
Published: (2024) -
SAV-SE: Scene-aware Audio-Visual Speech Enhancement with Selective State Space Model
by: Qian, Xinyuan, et al.
Published: (2024) -
SELD-Mamba: Selective State-Space Model for Sound Event Localization and Detection with Source Distance Estimation
by: Mu, Da, et al.
Published: (2024) -
MARS-Sep: Multimodal-Aligned Reinforced Sound Separation
by: Zhang, Zihan, et al.
Published: (2025) -
AV-SSAN: Audio-Visual Selective DoA Estimation through Explicit Multi-Band Semantic-Spatial Alignment
by: Chen, Yu, et al.
Published: (2025)