Saved in:
| Main Authors: | Bai, Jisheng, Liu, Haohe, Wang, Mou, Shi, Dongyuan, Wang, Wenwu, Plumbley, Mark D., Gan, Woon-Seng, Chen, Jianfeng |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.18953 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AudioLog: LLMs-Powered Long Audio Logging with Hybrid Token-Semantic Contrastive Learning
by: Bai, Jisheng, et al.
Published: (2023)
by: Bai, Jisheng, et al.
Published: (2023)
Efficient Audio Captioning with Encoder-Level Knowledge Distillation
by: Xu, Xuenan, et al.
Published: (2024)
by: Xu, Xuenan, et al.
Published: (2024)
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
by: Mei, Xinhao, et al.
Published: (2023)
by: Mei, Xinhao, et al.
Published: (2023)
Sub-band and Full-band Interactive U-Net with DPRNN for Demixing Cross-talk Stereo Music
by: Yin, Han, et al.
Published: (2024)
by: Yin, Han, et al.
Published: (2024)
AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion
by: Zhao, Junqi, et al.
Published: (2025)
by: Zhao, Junqi, et al.
Published: (2025)
DreamAudio: Customized Text-to-Audio Generation with Diffusion Models
by: Yuan, Yi, et al.
Published: (2025)
by: Yuan, Yi, et al.
Published: (2025)
Retrieval-Augmented Text-to-Audio Generation
by: Yuan, Yi, et al.
Published: (2023)
by: Yuan, Yi, et al.
Published: (2023)
Learning Temporal Resolution in Spectrogram for Audio Classification
by: Liu, Haohe, et al.
Published: (2022)
by: Liu, Haohe, et al.
Published: (2022)
Towards Generating Diverse Audio Captions via Adversarial Training
by: Mei, Xinhao, et al.
Published: (2022)
by: Mei, Xinhao, et al.
Published: (2022)
Region-Specific Audio Tagging for Spatial Sound
by: Zhao, Jinzheng, et al.
Published: (2025)
by: Zhao, Jinzheng, et al.
Published: (2025)
Leveraging Pre-trained AudioLDM for Sound Generation: A Benchmark Study
by: Yuan, Yi, et al.
Published: (2023)
by: Yuan, Yi, et al.
Published: (2023)
Description on IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift
by: Bai, Jisheng, et al.
Published: (2024)
by: Bai, Jisheng, et al.
Published: (2024)
WavCraft: Audio Editing and Generation with Large Language Models
by: Liang, Jinhua, et al.
Published: (2024)
by: Liang, Jinhua, et al.
Published: (2024)
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound
by: Liu, Haohe, et al.
Published: (2024)
by: Liu, Haohe, et al.
Published: (2024)
Zero-Shot Audio Captioning Using Soft and Hard Prompts
by: Zhang, Yiming, et al.
Published: (2024)
by: Zhang, Yiming, et al.
Published: (2024)
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
by: Liu, Haohe, et al.
Published: (2023)
by: Liu, Haohe, et al.
Published: (2023)
Sound-VECaps: Improving Audio Generation with Visual Enhanced Captions
by: Yuan, Yi, et al.
Published: (2024)
by: Yuan, Yi, et al.
Published: (2024)
T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining
by: Yuan, Yi, et al.
Published: (2024)
by: Yuan, Yi, et al.
Published: (2024)
SyncFlow: Toward Temporally Aligned Joint Audio-Video Generation from Text
by: Liu, Haohe, et al.
Published: (2024)
by: Liu, Haohe, et al.
Published: (2024)
Discrete Audio Representations for Automated Audio Captioning
by: Tian, Jingguang, et al.
Published: (2025)
by: Tian, Jingguang, et al.
Published: (2025)
EnvSDD: Benchmarking Environmental Sound Deepfake Detection
by: Yin, Han, et al.
Published: (2025)
by: Yin, Han, et al.
Published: (2025)
Universal Sound Separation with Self-Supervised Audio Masked Autoencoder
by: Zhao, Junqi, et al.
Published: (2024)
by: Zhao, Junqi, et al.
Published: (2024)
ASiT: Local-Global Audio Spectrogram vIsion Transformer for Event Classification
by: Atito, Sara, et al.
Published: (2022)
by: Atito, Sara, et al.
Published: (2022)
Exploring the User Experience of AI-Assisted Sound Searching Systems for Creative Workflows
by: Liu, Haohe, et al.
Published: (2025)
by: Liu, Haohe, et al.
Published: (2025)
FlowSep: Language-Queried Sound Separation with Rectified Flow Matching
by: Yuan, Yi, et al.
Published: (2024)
by: Yuan, Yi, et al.
Published: (2024)
Audio-Language Models for Audio-Centric Tasks: A Systematic Survey
by: Su, Yi, et al.
Published: (2025)
by: Su, Yi, et al.
Published: (2025)
Mixed-gradients Distributed Filtered Reference Least Mean Square Algorithm -- A Robust Distributed Multichannel Active Noise Control Algorithm
by: Ji, Junwei, et al.
Published: (2025)
by: Ji, Junwei, et al.
Published: (2025)
Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding
by: Liu, Jizhong, et al.
Published: (2024)
by: Liu, Jizhong, et al.
Published: (2024)
CosyAudio: Improving Audio Generation with Confidence Scores and Synthetic Captions
by: Zhu, Xinfa, et al.
Published: (2025)
by: Zhu, Xinfa, et al.
Published: (2025)
The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event Detection
by: Bibbó, Gabriel, et al.
Published: (2024)
by: Bibbó, Gabriel, et al.
Published: (2024)
MACE: Leveraging Audio for Evaluating Audio Captioning Systems
by: Dixit, Satvik, et al.
Published: (2024)
by: Dixit, Satvik, et al.
Published: (2024)
Enhancing Situational Awareness in Wearable Audio Devices Using a Lightweight Sound Event Localization and Detection System
by: Yeow, Jun-Wei, et al.
Published: (2025)
by: Yeow, Jun-Wei, et al.
Published: (2025)
Exploring Text-Queried Sound Event Detection with Audio Source Separation
by: Yin, Han, et al.
Published: (2024)
by: Yin, Han, et al.
Published: (2024)
Measuring Audio's Impact on Correctness: Audio-Contribution-Aware Post-Training of Large Audio Language Models
by: He, Haolin, et al.
Published: (2025)
by: He, Haolin, et al.
Published: (2025)
MiDashengLM: Efficient Audio Understanding with General Audio Captions
by: Dinkel, Heinrich, et al.
Published: (2025)
by: Dinkel, Heinrich, et al.
Published: (2025)
Sound event localization and classification using WASN in Outdoor Environment
by: Zhang, Dongzhe, et al.
Published: (2024)
by: Zhang, Dongzhe, et al.
Published: (2024)
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning
by: Kim, Jaeyeon, et al.
Published: (2024)
by: Kim, Jaeyeon, et al.
Published: (2024)
Self-Boosted Weight-Constrained FxLMS: A Robustness Distributed Active Noise Control Algorithm Without Internode Communication
by: Ji, Junwei, et al.
Published: (2025)
by: Ji, Junwei, et al.
Published: (2025)
CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer
by: Takeuchi, Daiki, et al.
Published: (2025)
by: Takeuchi, Daiki, et al.
Published: (2025)
Squeeze-and-Excite ResNet-Conformers for Sound Event Localization, Detection, and Distance Estimation for DCASE 2024 Challenge
by: Yeow, Jun Wei, et al.
Published: (2024)
by: Yeow, Jun Wei, et al.
Published: (2024)
Similar Items
-
AudioLog: LLMs-Powered Long Audio Logging with Hybrid Token-Semantic Contrastive Learning
by: Bai, Jisheng, et al.
Published: (2023) -
Efficient Audio Captioning with Encoder-Level Knowledge Distillation
by: Xu, Xuenan, et al.
Published: (2024) -
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
by: Mei, Xinhao, et al.
Published: (2023) -
Sub-band and Full-band Interactive U-Net with DPRNN for Demixing Cross-talk Stereo Music
by: Yin, Han, et al.
Published: (2024) -
AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion
by: Zhao, Junqi, et al.
Published: (2025)